Augmenting the CBOE file¶
Summary¶
There are a couple of additions I would like to make to the scrubbed CBOE data frame. These additions should (in theory) make it easier to isolate relevant data.
The additions I'd like to make include (this list may grow over time):
- Capitalize the option type
- Create a key using the root, a YYYYMMDD string, the put/call key, and the strike
- Create a similar key using only th YYYYMMDD string, the put/call key, and the strike
- Compute the number of days that will elapse from the quote_date to expiration
- Add an expiration day of week field (i.e., Friday, Saturday, etc.)
- Add an expiration week of month (i.e., the 3rd week in the month)
At this stage, I would like to retain as much data in its raw form as is practical. In the future I will start to prune the data frame.
To start, load the raw data frame.
import os
import pandas as pd
# Alter display settings
pd.set_option('display.max_rows', 50)
pd.set_option('display.max_columns', 50)
pd.set_option('display.width', 500)
# Directory & file that houses the concatenated CBOE csv data
proc_dir = r"/Users/alexstephens/data/cboe/proc"
pkl_file = os.path.join(proc_dir, r"cboe_mmyy_all_clean_df.pkl")
# Read the .pkl file
df = pd.read_pickle(pkl_file)
# Echo the columns
list(df.columns.values)
Start to work through the punch-list. Here I upper-case the put/call field, and I use list comprehensions to create a two character representation of the day number and month number. That is, I want a day number like 9 to read as '09'. Finally I concatenate the fields to create the YYYYMMDD string.
# Apply upper-case to the option_type field
df['option_type'] = df['option_type'].str.upper()
# Use a list comprehension to create a 2 character expiry day & month field
df['ex_dd'] = [str(day) if (day > 9) else ('0' + str(day))
for day in df['expiration'].dt.day]
df['ex_mm'] = [str(month) if (month > 9) else ('0' + str(month))
for month in df['expiration'].dt.month]
# Use a map method to create the four character expiry YYYY field
df['ex_yy'] = df['expiration'].dt.year.map(str)
# Create an expiration YYYYMMDD field
df['ex_yymmdd'] = df['ex_yy'] + df['ex_mm'] + df['ex_dd']
# Echo progress
df.head(2).transpose()
Now I add the day name and the number of weeks into the month the expiration date represents.
I need to create a string version of an integer version of the floating point strike. This is what the 'strike_str' represents.
Then I create a couple of string-based keys. Lastly, I compute the number of days between expiration and the quote_date.
# Add the day name to ease visualization of the expiration day
df['ex_dayname'] = df['expiration'].dt.day_name()
# Compute the number of weeks elapsed in the month by expiration
df['ex_weeknum'] = df['expiration'].apply(lambda d: (d.day-1) // 7 + 1)
# Create a strike string field; I have to convert the float to
# integer before converting to a string. There is a single strike
# with a non-integer value (8.75)
df['strike_str'] = df.strike.astype(int).astype(str)
# Create an option series key
df['series_key'] = (df['root'] + df['ex_yymmdd'] + df['option_type'] +
df['strike_str'])
# Create strike and expiration date keys
df['dt_typ_stk_key'] = df['ex_yymmdd'] + df['option_type'] + df['strike_str']
df['dt_typ_key'] = df['ex_yymmdd'] + df['option_type']
# Compute the number of days-to-expiration (dte); Use the so-called
# dt() accessor for series to extract the numerical days to expiry
df['dte'] = (df['expiration'] - df['quote_date']).dt.days
# Keep only the date component of the 'quote_date' and 'expiration'
df['quote_date'] = df['quote_date'].dt.date
df['expiration'] = df['expiration'].dt.date
# Compute the underlying mip-price
df['underlying_mid_1545'] = 0.5*(df['underlying_bid_1545'] +
df['underlying_ask_1545'])
# Turns out the underlying bid/ask spread is 0.0 in all cases
# df['underlying_dif_1545'] = abs(df['underlying_bid_1545'] -
# df['underlying_ask_1545'])
# Echo progress
df.head(2).transpose()
I want to print out some diagnostics to better characterize the underlying data. In particular, I still want to review what the 'root' field shows, as I am not totally comforatable with the definitions of those fields.
# A sanity check of the week number calculation
pd.crosstab(df.ex_dd, df.ex_weeknum)
# When does a Saturday expiration occur
pd.crosstab(df.ex_dd, df.ex_dayname)
# A cross table of the day name and week number
pd.crosstab(df.ex_dayname, df.ex_weeknum)
# A cross table of the root and the week number
pd.crosstab(df.root, df.ex_weeknum)
# A cross table of the root and the day name
pd.crosstab(df.root, df.ex_dayname)
I will absorb the above information and use it to help with the filtering step in the next post. For now, save this data frame to a file. (NB. It is up to about 5 Gb in size).
# Save the output
out_file = os.path.join(proc_dir, r"cboe_mmyy_aug_clean_df.pkl")
df.to_pickle(out_file)
No comments:
Post a Comment