Sunday, November 17, 2019

012 - CBOE Data Frame Processing

12_cboe_df_processing

CBOE Data Frame Processing

Summary

What follows is a long script. In it I methodically step through a 'data reduction' process. The goals of this process are (i) to cut out superflous and/or noisy data, (ii) to append information that will be useful in subsequent analyses, and (iii) to reformat the data such that both call & put information sit side-by-side on a single row (instead of on their own lines).

This is a very linear process. I basically whittle-away at the data frame until I reach a point where I think I've distilled the raw data into a useable dataset. I will need to conduct additional analyses (e.g., doing a sanity check of the greeks), but that is a more involved series of tests. This script is really about reducing the data to its purest form.

Note: Below I make reference to a set of interest rate data. I append those data to the raw CBOE file. I will discuss the steps I took to 'construct' that file in a later post.

In [41]:
# Import modules
import os
import pandas as pd
from datetime import timedelta

# =============================================================================
# Set-up
# =============================================================================

# Alter pandas display settings
pd.set_option('display.max_rows', 200)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 500)

# Location of the processed CBOE csv files
proc_dir = r'/Users/alexstephens/data/cboe/proc/1911_SPX'

# Define the cleaned, concatenated pickle file
clean_pkl = r'cboe_spx_1911_clean.pkl'
clean_pkl_file = os.path.join(proc_dir, clean_pkl)

# Define the raw dataframe
raw_pkl = r'cboe_spx_1911_clean_df_raw.pkl'
raw_pkl_file = os.path.join(proc_dir, raw_pkl)

# Define the reduced dataframe
red_pkl = r'cboe_spx_1911_clean_df_red.pkl'
red_pkl_file = os.path.join(proc_dir, red_pkl)

# Define the merged dataframe
mrg_pkl = r'cboe_spx_1911_clean_df_mrg.pkl'
mrg_pkl_file = os.path.join(proc_dir, mrg_pkl)

# Location of the interest rate data
fred_dir = r'/Users/alexstephens/data/fred/IRTCM_csv_2/data'

# Define the interest rate dataframe
ir_pkl = r'fred_ir_01M_03M_01Y.pkl'
ir_pkl_file = os.path.join(fred_dir, ir_pkl)

Pull in the raw CBOE data and the interest rate data we'll merge onto the option data.

In [42]:
# =============================================================================
# Main
# =============================================================================

# Read the .pkl file
df = pd.read_pickle(clean_pkl_file)

# (12762972, 34)
df.shape

# Read the interest rate .pkl file
df_ir = pd.read_pickle(ir_pkl_file)

# (12762972, 34)
df_ir.shape
Out[42]:
(14878, 5)
In [43]:
# =============================================================================
# Remove the end-of-day and delivery code columns
# =============================================================================

# Define columns to drop
drop_columns = ['bid_size_eod', 'bid_eod',
                'ask_size_eod', 'ask_eod',
                'underlying_bid_eod', 'underlying_ask_eod',
                'delivery_code',
                ]

df.drop(drop_columns, axis=1, inplace=True)

# Echo the first four rows (transposed)
df.head(4).transpose()
Out[43]:
92 93 94 95
underlying_symbol ^SPX ^SPX ^SPX ^SPX
quote_date 2004-01-02 00:00:00 2004-01-02 00:00:00 2004-01-02 00:00:00 2004-01-02 00:00:00
root SYG SYG SYG SYG
expiration 2004-01-17 00:00:00 2004-01-17 00:00:00 2004-01-17 00:00:00 2004-01-17 00:00:00
strike 600 600 625 625
option_type C P C P
open 0 0 0 0
high 0 0 0 0
low 0 0 0 0
close 0 0 0 0
trade_volume 0 0 0 0
bid_size_1545 1 0 1 0
bid_1545 506.5 0 481.5 0
ask_size_1545 1 10 1 10
ask_1545 508.5 0.05 483.5 0.05
underlying_bid_1545 1107.61 1107.61 1107.61 1107.61
underlying_ask_1545 1108.06 1108.06 1108.06 1108.06
implied_underlying_price_1545 1107.25 1107.25 1107.25 1107.25
active_underlying_price_1545 1107.8 1107.8 1107.8 1107.8
implied_volatility_1545 1.0222 0.9665 0.9432 0.9056
delta_1545 0.9992 -0.0004 0.9993 -0.0005
gamma_1545 0 0 0 0
theta_1545 -0.0359 -0.0117 -0.0332 -0.0116
vega_1545 0.0059 0.0034 0.0054 0.0036
rho_1545 23.0216 -0.0195 23.9857 -0.0207
vwap 0 0 0 0
open_interest 1 131 0 100
In [44]:
# =============================================================================
# Add / Modify Columns
# =============================================================================

# Apply upper-case to the option_type field
df['option_type'] = df['option_type'].str.upper()

# Add the day name to ease visualization of the expiration day
df['ex_dayname'] = df['expiration'].dt.day_name()

# Compute the number of days-to-expiration (dte); Use the so-called
# dt() accessor for series to extract the numerical days to expiry
df['dte'] = (df['expiration'] - df['quote_date']).dt.days

# Compute the underlying mid-price (mup)
df['mup_1545'] = 0.5*(df['underlying_bid_1545'] + df['underlying_ask_1545'])

# Echo the first four rows (transposed)
df.head(4).transpose()
Out[44]:
92 93 94 95
underlying_symbol ^SPX ^SPX ^SPX ^SPX
quote_date 2004-01-02 00:00:00 2004-01-02 00:00:00 2004-01-02 00:00:00 2004-01-02 00:00:00
root SYG SYG SYG SYG
expiration 2004-01-17 00:00:00 2004-01-17 00:00:00 2004-01-17 00:00:00 2004-01-17 00:00:00
strike 600 600 625 625
option_type C P C P
open 0 0 0 0
high 0 0 0 0
low 0 0 0 0
close 0 0 0 0
trade_volume 0 0 0 0
bid_size_1545 1 0 1 0
bid_1545 506.5 0 481.5 0
ask_size_1545 1 10 1 10
ask_1545 508.5 0.05 483.5 0.05
underlying_bid_1545 1107.61 1107.61 1107.61 1107.61
underlying_ask_1545 1108.06 1108.06 1108.06 1108.06
implied_underlying_price_1545 1107.25 1107.25 1107.25 1107.25
active_underlying_price_1545 1107.8 1107.8 1107.8 1107.8
implied_volatility_1545 1.0222 0.9665 0.9432 0.9056
delta_1545 0.9992 -0.0004 0.9993 -0.0005
gamma_1545 0 0 0 0
theta_1545 -0.0359 -0.0117 -0.0332 -0.0116
vega_1545 0.0059 0.0034 0.0054 0.0036
rho_1545 23.0216 -0.0195 23.9857 -0.0207
vwap 0 0 0 0
open_interest 1 131 0 100
ex_dayname Saturday Saturday Saturday Saturday
dte 15 15 15 15
mup_1545 1107.84 1107.84 1107.84 1107.84
In [45]:
# =============================================================================
# Generate date vectors / cutoffs
# =============================================================================

# Collect min/max quote dates
min_date = df['quote_date'].min()
max_date = df['quote_date'].max()

# Since we may want to compute the VIX up-to the last quote day in the
# history, pull expirations that extend out 37 days past the max_date
vix_date = max_date + timedelta(days=38)
In [46]:
# =============================================================================
# Sort the option chain
# =============================================================================

# Sort by: quote_date > expiration > strike
df.sort_values(['quote_date', 'expiration', 'strike'],
               ascending=['True', 'True', 'True'],
               inplace=True)

This step is important because it isolate options that expire on Fridays. This should be simple, but the early history contains options that have a Saturday expiration. In addition, certain holidays land on a Friday. In those cases, the expiration date is moved to the preceeding Thursday. So the step below uses the timedelta function to identify the Thursday and Saturday surrounding each expiration Friday.

In [47]:
# =============================================================================
# Isolate friday expirations (+/- 1 day)
# =============================================================================

# daily date vector spanning the quote_date range
quote_range = pd.date_range(start=min_date, end=vix_date, freq='d')

# Monthly date vector spanning the quote_date range (but end-of-month)
eom_range = list(pd.date_range(start=min_date, end=vix_date, freq='M'))

# Create a pandas dataframe using the vector of dates
qr_df = pd.DataFrame(quote_range, columns=['date'])

# Append the day name for readability
qr_df['day_name'] = qr_df['date'].dt.day_name()

# Append the week number
qr_df['week_num'] = qr_df['date'].apply(lambda d: (d.day-1) // 7 + 1)

# Isolate all Fridays of every month in the series (+/- 1 day)
all_fri = qr_df[(qr_df.day_name == 'Friday')].date
all_thu = all_fri + timedelta(-1)
all_sat = all_fri + timedelta(+1)

# Create a list combining all three date lists as a global 'end-of-week' filter
all_eow = sorted(list(all_thu) + list(all_fri) + list(all_sat))

# Identify the 3rd Friday of every month
third_fri = qr_df[(qr_df.day_name == 'Friday') & (qr_df.week_num == 3)].date
third_thu = third_fri + timedelta(-1)
third_sat = third_fri + timedelta(+1)

# Create a list combining all 3rd-week eow dates
third_eow = sorted(list(third_thu) + list(third_fri) + list(third_sat))
In [48]:
# =============================================================================
# Isolate unique expiration dates
# =============================================================================

# Create a pandas dataframe using the vector of expiration dates
ex_date = sorted(df.expiration.unique())
ex_df = pd.DataFrame(pd.to_datetime(ex_date), columns=['ex_date'])

# Create a day number field
ex_df['ex_day_num'] = ex_df['ex_date'].dt.day

# Append the day name for readability
ex_df['ex_day_name'] = ex_df['ex_date'].dt.day_name()

# Append the week number
ex_df['ex_week_num'] = ex_df['ex_date'].apply(lambda d: (d.day-1) // 7 + 1)

Just renaming some columns to save on screen space.

In [49]:
# =============================================================================
# Define filters below - use a copy of the original data frame
# =============================================================================

# Make a copy of the data frame; shape = (12762972, 30)
df.shape
df_red = df.copy()
df_red.shape

# Rename columns
df_red.rename(columns={'underlying_symbol': 'u',
                       'trade_volume': 'tvol',
                       'underlying_bid_1545': 'u_bid_1545',
                       'underlying_ask_1545': 'u_ask_1545',
                       'implied_underlying_price_1545': 'iup_1545',
                       'active_underlying_price_1545': 'aup_1545',
                       'implied_volatility_1545': 'iv_1545',
                       'open_interest': 'oi',
                       }, inplace=True)

Here is where I start to chip-away at the raw data frame. In each step I split the dataframe into (i) the slimmed-down data set and (ii) the data I exclude. I wanted to be able to review those data if something looked peculiar. In each step I add the resulting shape as an inline comment to be able to track the repeatability of these steps.

In [50]:
# =============================================================================
# Apply date-based filters
# =============================================================================

# Retain only expiration cycles that happen before the vix_date
excl_01 = df_red[(df_red.expiration > vix_date)]
df_red = df_red[(df_red.expiration <= vix_date)]  # (12181994, 30)
In [51]:
# =============================================================================
# Apply non-standard-root-based filters
# =============================================================================

# Remove low volume roots; Those with fewer that 250 observations
# across the historical window.  Adding manually because I want to
# see what is in the data.  (Note: These did not appear in the 1905 data)
low_volume_roots = ['AD',  'BJH', 'BNK', 'BQA', 'DFD', 'DFN', 'EOP', 'FSL',
                    'GEU', 'HBQ', 'HGC', 'IIQ', 'LZX', 'MLS', 'MYB', 'MZS',
                    'NXL', 'OBI', 'OSA', 'OSZ', 'QGR', 'QJV', 'QLJ', 'QNE',
                    'RDA', 'RXA', 'RXS', 'SBO', 'SFU', 'SOY', 'SQR', 'SXX',
                    'TMQ', 'TXF', 'UMO', 'USL', 'UWB', 'VAS', 'VOP', 'WCO',
                    'XAV', 'XGI', 'YDS', 'YFS', 'YKH', 'YRN', 'YXD', 'YXI',
                    'YZP', 'ZPO', ]

excl_02 = df_red[(df_red.root.isin(low_volume_roots))]
df_red = df_red[~(df_red.root.isin(low_volume_roots))]  # (12180688, 30)

# Remove quarterly expirations
quarterly_roots = ['SPXQ', 'SPXPM', 'SAQ', 'SKQ', 'SLQ', 'QSE', 'QSZ', 'QZQ',
                   'SQG', 'SQP', 'SZQ', ]

excl_03 = df_red[(df_red.root.isin(quarterly_roots))]
df_red = df_red[~(df_red.root.isin(quarterly_roots))]  # (11621334, 30)

# Remove binary options (B*), Variance Strip (VSTRP), Range Option (SRO)
# (Note: These no longer appear in the 1911 data)
binary_options = ['BSZ', 'BSF', 'BSK', 'SRO', 'VSTRP']

excl_04 = df_red[(df_red.root.isin(binary_options))]
df_red = df_red[~(df_red.root.isin(binary_options))]  # (11621334, 30)
In [52]:
# =============================================================================
# Isolate all Friday expirations (+/- 1 day)
# =============================================================================

# Isolate all Friday expirations (+/- 1 day)
excl_05 = df_red[~df_red['expiration'].isin(all_eow)]
df_red = df_red[df_red['expiration'].isin(all_eow)]  # (10019993, 30)
In [53]:
# =============================================================================
# Exclude weeklies with a PM-settlment on the third Friday of the month
# =============================================================================

# Remove the PM expirations that occur on the third Friday of the month
weekly_roots = ['JXA', 'JXB', 'JXD', 'JXE', 'SPXW']

excl_06 = df_red[(df_red['expiration'].isin(third_eow) &
                  df_red['root'].isin(weekly_roots))]

df_red = df_red[~(df_red['expiration'].isin(third_eow) &
                  df_red['root'].isin(weekly_roots))]  # (9560129, 30)
In [54]:
# =============================================================================
# Exclude weeklies with an end-of-month settlement date on a Thursday
# =============================================================================

# Remove the weeklies that land on EOM & Thursday
excl_07 = df_red[((df_red['expiration'].isin(eom_range)) &
                  (df_red['ex_dayname'] == 'Thursday'))]

df_red = df_red[~((df_red['expiration'].isin(eom_range)) &
                  (df_red['ex_dayname'] == 'Thursday'))]  # (9318255, 30)
In [55]:
# =============================================================================
# Remove options where the number of days-to-expiry are > 360
# =============================================================================

# Remove options with an expiry > 1 year (~ 360 days)
excl_08 = df_red[(df_red['dte'] > 360)]

df_red = df_red[(df_red['dte'] <= 360)]  # (8291264, 30)

This step is a little different. I use the groupby() method to tally the number of rows with a given combination of ('quote_date', 'expiration', 'option_type', 'strike', and 'days-to-expiration' (or dte). In theory, we should have one row for each combination. However, there are a couple of collisions in the data. It appears that multiple roots will appear for the combo listed above. This will require some additional cleansing to de-duplicate these rows.

In [56]:
# =============================================================================
# Review of duplicates (quote_date, expiration, option_type, strike)
# =============================================================================

# First isolate the columns to group by
dd = df_red[['quote_date', 'expiration', 'option_type',
             'strike', 'dte', 'root']]

# Isolate the unique (non-root) columns
uniq_dt = dd.groupby(['quote_date', 'expiration', 'option_type',
                      'strike', 'dte']).size()

# Fill-in the group-by matrix
uniq_dt = uniq_dt.reset_index().rename(columns={0: 'cnt'})
uniq_dt = uniq_dt[uniq_dt.cnt > 1]

# Save results to an offline file
uniq_csv = r'xtab_duplicate_rows.csv'
uniq_csv_file = os.path.join(proc_dir, uniq_csv)
uniq_dt.to_csv(uniq_csv_file)  # 1694 rows
In [57]:
# Print the head() of the options exhibiting multiple rows
uniq_dt.head(10)
Out[57]:
quote_date expiration option_type strike dte cnt
56426 2004-06-23 2005-06-18 C 1500.0 360 2
56455 2004-06-23 2005-06-18 P 1500.0 360 2
56918 2004-06-24 2005-06-18 C 1500.0 359 2
56947 2004-06-24 2005-06-18 P 1500.0 359 2
57412 2004-06-25 2005-06-18 C 1500.0 358 2
57441 2004-06-25 2005-06-18 P 1500.0 358 2
57906 2004-06-28 2005-06-18 C 1500.0 355 2
57935 2004-06-28 2005-06-18 P 1500.0 355 2
58400 2004-06-29 2005-06-18 C 1500.0 354 2
58429 2004-06-29 2005-06-18 P 1500.0 354 2
In [58]:
# Print the tail() of the options exhibiting multiple rows
uniq_dt.tail(10)
Out[58]:
quote_date expiration option_type strike dte cnt
2040698 2012-07-23 2012-08-10 C 1365.0 18 2
2040762 2012-07-23 2012-08-10 P 1365.0 18 2
2044262 2012-07-25 2012-08-10 C 1365.0 16 2
2044326 2012-07-25 2012-08-10 P 1365.0 16 2
2046062 2012-07-26 2012-08-10 C 1365.0 15 2
2046128 2012-07-26 2012-08-10 P 1365.0 15 2
2047870 2012-07-27 2012-08-10 C 1365.0 14 2
2047940 2012-07-27 2012-08-10 P 1365.0 14 2
2049682 2012-07-30 2012-08-10 C 1365.0 11 2
2049752 2012-07-30 2012-08-10 P 1365.0 11 2

In this step I exploit the fact that (apart from the small number of examples shown above) each combination of 'quote_date', 'expiration', 'strike', and 'root' should have two entries in the data. One is a call option associated with that combo of attributes, and the other is a put option.

To be able to identify each call and put, I'm constructing a join/merge key using those fields. This may not be the most efficient method, but I'm basically creating a string that is the concatenation of each rows 'quote_date', 'expiration', 'strike', and 'root'. I will then use this key to merge call and put data onto a single row in the history (along with a set of common columns that are not option-specific).

In [59]:
# =============================================================================
# reshape the reduced dataframe into one where the call and put data for a
# give (quote_date, expiration, root, strike are on the same row)
# =============================================================================

# =============================================================================
# Build join key(s)
# =============================================================================

# quote_date string
qd_dd = [str(day) if (day > 9) else ('0' + str(day))
         for day in df_red['quote_date'].dt.day]

qd_mm = [str(month) if (month > 9) else ('0' + str(month))
         for month in df_red['quote_date'].dt.month]

qd_yy = df_red['quote_date'].dt.year.map(str)

# key string
qd_str = (qd_yy + qd_mm + qd_dd)

# expiration string
ex_dd = [str(day) if (day > 9) else ('0' + str(day))
         for day in df_red['expiration'].dt.day]

ex_mm = [str(month) if (month > 9) else ('0' + str(month))
         for month in df_red['expiration'].dt.month]

ex_yy = df_red['expiration'].dt.year.map(str)

# srike * 100 string
strike_str = (df_red.strike*100)
strike_str = strike_str.astype(int).astype(str)

# root string
root_str = df_red.root.astype(str)

# key string
key_str = (qd_yy + qd_mm + qd_dd + "_" + ex_yy + ex_mm + ex_dd + "_" +
           strike_str + "_" + root_str)

# Append the key(s) to the data frame via an assign()
df_red = df_red.assign(key_str=list(key_str))
df_red = df_red.assign(qd_str=list(qd_str))

In theory each combo of 'quote_date', 'expiration', 'strike', and 'root' should have 2 entries in the data. However, I need to check for cases where a key appears only once. I'll prune those 'stragglers' from the data.

In [60]:
# =============================================================================
# Check for straggler rows
# =============================================================================

# Check that for each key we should have a call and put option chain
key_check = df_red.groupby(['key_str']).size()
key_check.reset_index().rename(columns={1: 'Freq'})

# Isolate the keys with single entries
single_keys = list(key_check[key_check < 2].index)  # 156

# =============================================================================
# Exclude straggler rows
# =============================================================================

excl_09 = df_red[(df_red.key_str.isin(single_keys))]
df_red = df_red[~(df_red.key_str.isin(single_keys))]  # (8291108, 32)

df_red.reset_index(inplace=True)
In [61]:
# Print some examples of keys that appear only once.  Note that I multiplied the strike by 100 when creating the key
single_keys[:10]
Out[61]:
['20041227_20051217_160000_SYZ',
 '20050927_20060916_70000_SPZ',
 '20060130_20060422_142500_SXZ',
 '20060324_20070317_150000_SXM',
 '20060601_20060916_108000_SPQ',
 '20060612_20060722_118000_SPT',
 '20060817_20061021_142500_SXZ',
 '20061023_20070120_127500_SZP',
 '20070329_20080322_160000_SPB',
 '20080123_20080216_60000_SZJ']

Here is where I join historical interest rate data onto the CBOE file. The simpleminded bucketing scheme described in the comments is my approach. I've appended these data to enable a later VIX-style calculation for each option chain in the history. I will describe the sourcing of these data in a later post.

In [62]:
# =============================================================================
# Merge interest rates onto file
# =============================================================================

# Assign an interest rate to each row using a simple bucketing scheme.  If the
# days-to-expiration (dte) are [0, 60] then use the 1-month rate.  If the dte
# are (60, 120] then use the 3-month rate.  If the dte are (120, 360] then use
# the 1-year rate.

# Merge the interest rate data
df_red = df_red.merge(df_ir[['qd_str', 'R01M_cln', 'R03M_cln', 'R01Y_cln']],
                      how='left', suffixes=('', ''),
                      left_on='qd_str', right_on='qd_str')

# initialize the column
df_red = df_red.assign(ir=0.00)

# assign the interest rate
df_red.loc[(df_red.dte <= 60), 'ir'] = df_red['R01M_cln']
df_red.loc[(df_red.dte > 60) & (df_red.dte <= 120), 'ir'] = df_red['R03M_cln']
df_red.loc[(df_red.dte > 120) & (df_red.dte <= 360), 'ir'] = df_red['R01Y_cln']

Here I split the data into calls and puts.

In [63]:
# =============================================================================
# Isolate call-only and put-only data frames
# =============================================================================

# Note: After the filer, the number of rows is equal
df_c = df_red[(df_red.option_type == 'C')].copy()  # (4145554, 32)
df_p = df_red[(df_red.option_type == 'P')].copy()  # (4145554, 32)

# Sort and reset the index of each call & put data frame
# Sort by: quote_date > expiration > strike > root
df_c.sort_values(['quote_date', 'expiration', 'strike', 'root'],
                 ascending=['True', 'True', 'True', 'True'],
                 inplace=True)

df_c.reset_index(inplace=True)

df_p.sort_values(['quote_date', 'expiration', 'strike', 'root'],
                 ascending=['True', 'True', 'True', 'True'],
                 inplace=True)

df_p.reset_index(inplace=True)

I now define common columns (those that are not option-specific) and option-specific columns. I'll join the common columns, the call data, and the put data using the key described above. This should result in a data frame where each row contains both call and put prices for a given combination of 'quote_date', 'expiration', 'strike', and 'root'.

In [64]:
# =============================================================================
# Merge the call and put data onto single rows
#   - Define common columns & option-specific columns (+ the key)
#   - Isolate call- and put-only common columns and option-specific columns
#   - Merge the option-specific data first (with '_c' and '_p" suffixes)
#   - Merge the common data and the option-specific data
# =============================================================================

# Define common columns (those not specific to a given call/put option)
common_cols = ['u', 'quote_date', 'root', 'expiration', 'strike',
               'u_bid_1545', 'u_ask_1545', 'mup_1545',
               'iup_1545', 'aup_1545',
               'ex_dayname', 'dte',  'ir', 'key_str']

# Define columns specific to a given call/put option
option_cols = ['option_type', 'open', 'high', 'low', 'close', 'tvol',
               'bid_size_1545', 'bid_1545',
               'ask_size_1545', 'ask_1545', 'iv_1545', 'delta_1545',
               'gamma_1545', 'theta_1545', 'vega_1545', 'rho_1545', 'vwap',
               'oi', 'key_str']

# Pull the  call- and put-specific common and option data
df_c_common = df_c[common_cols]
df_p_common = df_p[common_cols]

df_c_option = df_c[option_cols]
df_p_option = df_p[option_cols]

# Merge the option-specific data first
df_mrg_option = df_c_option.merge(df_p_option, how='outer',
                                  suffixes=('_c', '_p'),
                                  left_on='key_str',
                                  right_on='key_str')

# Then merge the option-specific data onto the common columns
df_mrg = df_c_common.merge(df_mrg_option, how='outer',
                           left_on='key_str', right_on='key_str')
# reindex
df_mrg.reset_index(inplace=True)

Here is what the resulting data frame looks like.

In [65]:
# Echo the first four rows (transposed)
df_mrg.head(4).transpose()
Out[65]:
0 1 2 3
index 0 1 2 3
u ^SPX ^SPX ^SPX ^SPX
quote_date 2004-01-02 00:00:00 2004-01-02 00:00:00 2004-01-02 00:00:00 2004-01-02 00:00:00
root SYG SYG SYG SYG
expiration 2004-01-17 00:00:00 2004-01-17 00:00:00 2004-01-17 00:00:00 2004-01-17 00:00:00
strike 600 625 650 675
u_bid_1545 1107.61 1107.61 1107.61 1107.61
u_ask_1545 1108.06 1108.06 1108.06 1108.06
mup_1545 1107.84 1107.84 1107.84 1107.84
iup_1545 1107.25 1107.25 1107.25 1107.25
aup_1545 1107.8 1107.8 1107.8 1107.8
ex_dayname Saturday Saturday Saturday Saturday
dte 15 15 15 15
ir 0.0088 0.0088 0.0088 0.0088
key_str 20040102_20040117_60000_SYG 20040102_20040117_62500_SYG 20040102_20040117_65000_SYG 20040102_20040117_67500_SYG
option_type_c C C C C
open_c 0 0 0 0
high_c 0 0 0 0
low_c 0 0 0 0
close_c 0 0 0 0
tvol_c 0 0 0 0
bid_size_1545_c 1 1 1 1
bid_1545_c 506.5 481.5 456.5 431.5
ask_size_1545_c 1 1 1 1
ask_1545_c 508.5 483.5 458.5 433.5
iv_1545_c 1.0222 0.9432 0.8663 0.7901
delta_1545_c 0.9992 0.9993 0.9994 0.9995
gamma_1545_c 0 0 0 0
theta_1545_c -0.0359 -0.0332 -0.0305 -0.0277
vega_1545_c 0.0059 0.0054 0.0048 0.0041
rho_1545_c 23.0216 23.9857 24.9504 25.9159
vwap_c 0 0 0 0
oi_c 1 0 0 0
option_type_p P P P P
open_p 0 0 0 0
high_p 0 0 0 0
low_p 0 0 0 0
close_p 0 0 0 0
tvol_p 0 0 0 0
bid_size_1545_p 0 0 0 0
bid_1545_p 0 0 0 0
ask_size_1545_p 10 10 10 10
ask_1545_p 0.05 0.05 0.05 0.05
iv_1545_p 0.9665 0.9056 0.8465 0.7901
delta_1545_p -0.0004 -0.0005 -0.0005 -0.0005
gamma_1545_p 0 0 0 0
theta_1545_p -0.0117 -0.0116 -0.0115 -0.0114
vega_1545_p 0.0034 0.0036 0.0038 0.0041
rho_1545_p -0.0195 -0.0207 -0.0219 -0.0235
vwap_p 0 0 0 0
oi_p 131 100 675 0
In [67]:
# Echo the last four rows (transposed)
df_mrg.tail(4).transpose()
Out[67]:
4145550 4145551 4145552 4145553
index 4145550 4145551 4145552 4145553
u ^SPX ^SPX ^SPX ^SPX
quote_date 2018-05-29 00:00:00 2018-05-29 00:00:00 2018-05-29 00:00:00 2018-05-29 00:00:00
root SPXW SPXW SPXW SPXW
expiration 2018-07-06 00:00:00 2018-07-06 00:00:00 2018-07-06 00:00:00 2018-07-06 00:00:00
strike 3000 3050 3100 3200
u_bid_1545 2688.58 2688.58 2688.58 2688.58
u_ask_1545 2689.25 2689.25 2689.25 2689.25
mup_1545 2688.91 2688.91 2688.91 2688.91
iup_1545 2684.03 2684.03 2684.03 2684.03
aup_1545 2688.89 2688.89 2688.89 2688.89
ex_dayname Friday Friday Friday Friday
dte 38 38 38 38
ir 0.0177 0.0177 0.0177 0.0177
key_str 20180529_20180706_300000_SPXW 20180529_20180706_305000_SPXW 20180529_20180706_310000_SPXW 20180529_20180706_320000_SPXW
option_type_c C C C C
open_c 0 0 0 0
high_c 0 0 0 0
low_c 0 0 0 0
close_c 0 0 0 0
tvol_c 0 0 0 0
bid_size_1545_c 457 0 0 0
bid_1545_c 0.05 0 0 0
ask_size_1545_c 542 461 844 790
ask_1545_c 0.2 0.15 0.15 0.1
iv_1545_c 0.1258 0.1358 0.1516 0.1757
delta_1545_c 0.0038 0.0022 0.002 0.0012
gamma_1545_c 0.0001 0.0001 0 0
theta_1545_c -0.0166 -0.011 -0.0111 -0.0081
vega_1545_c 0.0973 0.0597 0.0545 0.0342
rho_1545_c 1.0406 0.6055 0.547 0.3274
vwap_c 0 0 0 0
oi_c 49 38 13 0
option_type_p P P P P
open_p 0 0 0 0
high_p 0 0 0 0
low_p 0 0 0 0
close_p 0 0 0 0
tvol_p 0 0 0 0
bid_size_1545_p 70 70 70 70
bid_1545_p 307.4 357.3 407.1 506.9
ask_size_1545_p 70 70 70 70
ask_1545_p 314.6 367.4 416.6 516.4
iv_1545_p 0.1586 0.2085 0.2232 0.2638
delta_1545_p -0.9826 -0.9668 -0.9736 -0.9772
gamma_1545_p 0.0003 0.0004 0.0003 0.0002
theta_1545_p 0 -0.0246 -0.0011 -0.0025
vega_1545_p 0.3734 0.6404 0.5298 0.4683
rho_1545_p -307.19 -308.145 -315.196 -326.604
vwap_p 0 0 0 0
oi_p 0 0 0 0

Save the results.

In [66]:
# =============================================================================
# Save the various ouput files
# =============================================================================
df_raw = df.copy()
df_raw.to_pickle(raw_pkl_file)
df_red.to_pickle(red_pkl_file)
df_mrg.to_pickle(mrg_pkl_file)

No comments:

Post a Comment