Reviewing the 'root' field in the CBOE data¶
Overview¶
In the previous analysis post, I created a data frame that houses the entire historical SPX option chain dataset.
For the prototype backtest calculator, I want to use the standard or regular SPX option series. That is the series that expires in the morning (i.e., AM-settled) on the 3rd Friday of each month. In addition, I will want to extract the other weekly series with a (PM) Friday expiration. The reason for choosing these series is that I can use them to compute a daily estimate of the VIX volatility index.
We can distinguish between series using the 'root' field. Unfortunately, it appears that several roots can represent a given type of option series. That is several series in the historical data may expire on the same day.
In addition, the root fields that map to a series may change over time. Specifically, the CBOE made an effort to simplify (or clean-up) the root fields at certain points in the past. So, we need to understand what each 'root' code means if we want to conduct backtests using self-consistent data.
I'll start be re-loading the saved data frame and examining the most- and least-frequently occurring root values.
import os
import pandas as pd
# Directory & file that houses the concatenated CBOE csv data
proc_dir = r"/Users/alexstephens/data/cboe/proc"
pkl_file = os.path.join(proc_dir, r"cboe_mmyy_all_clean_df.pkl")
# Read the .pkl file
df = pd.read_pickle(pkl_file)
# Head of the root field counts
df['root'].value_counts().head(10)
In the above the root 'SPXW', 'SPX', and 'SPXQ' appear the most often. It is tempting to assume that these roots represent the enire set of weekly, regular (i.e., 3rd Friday), and quarterly series. Unfortunately, we will find that the naming convention is not that simple. Or, at least, those roots may not represent the entire set of weekly, regular, and quarterly series.
It is not clear to me where to find definitive information about these option chain categories. The CBOE has good information on the currently traded options; however, information about the earliest data in the history is not readily available.
For example, if you review the first rows in the data, you will see that the SPXW and SPXQ did not exist at all ... and the SPX root was just one of many available roots.
A relatively recent paper on implied volatility provides a plain english explanation of some of the roots.
- Andersen, Torben G. and Bondarenko, Oleg and Gonzalez-Perez, Maria T., Exploring Return Dynamics via Corridor Implied Volatility (February 1, 2015). Review of Financial Studies, Vol. 28 (10), pp. 2902-2945, 2015.
The authors state:
We only consider options that the CBOE actually used in their computation of the VIX over our sample period, namely those in the SPB, SPQ, SPT, SPV, SPX, SPZ, SVP, SXB, SXM, SXY, SXZ, SYG, SYU, SYV and SZP categories. The latter are generally known as SPX equity options and they mature on the Saturday immediately following the third Friday of the expiration month. The series are based on MDR (Market Data Retrieval) quotes captured by CBOE’s internal system. The underlying tick-by-tick data cover the period June 2, 2008 – June 30, 2010
The only problem is that their data only cover a small slice of the history.
I compared the above with a table from the Options Clearing Corporation (OCC) that lists Production Symbols with AM/PM Settlement Differences As Of 12/4/2009. The naming convention used by the OCC is consistent with the Andersen et al. summary,
Lastly, some of the more esoteric roots were discussed in a 2005 CBOE Information Circular on weekly options. So, this reference is useful in identifying some of the earliest weekly options.
# Compute a cross-tab of expiration and root
root_crosstab_ex = pd.crosstab(df.expiration, df.root)
Below I will echo the head and tail of the cross tab, filtering only those roots that are likely associated with AM-settled SPX options. We see that the earliest data contain a diverse array of roots. At the end of the historical data, the root 'SPX' is the only one that remains in use.
# Define a list of candidate AM-settled series roots
am_roots = ['SPB','SYG','SPQ','SPT','SPX','SPZ','SXB','SXM','SXY','SZP','SYU','SYV','SZU','SVP','SPV','SXZ']
# Echo the head of the (root x expiration) cross-tab for likely AM-settled roots
root_crosstab_ex[am_roots].head(10)
# Echo the tail of the (root x expiration) cross-tab for likely AM-settled roots
root_crosstab_ex[am_roots].tail(10)
It is worth verifying that the expiration dates actually align with the third Friday of every month. It is possible that some of these series do not, so we will need to filter by expiration date in subsequent steps.
To do this, I used the cross-tab index to define several date-based columns that I will add to the data.
I'm not really sure if this is a Pythonic approach. I basically need to find a book on Pandas that I can go through systematically.
# Continue to use the (root x expiration) cross-tab
# Isolate the likely AM-settled SPX roots
root_crosstab_ex_idx = root_crosstab_ex.index
spx_crosstab_ex = root_crosstab_ex.loc[root_crosstab_ex_idx, am_roots]
# Use the index to create various date-based DatetimeIndex objects
date_idx = pd.to_datetime(spx_crosstab_ex.index)
date_bom = date_idx - pd.to_timedelta(date_idx.day - 1, unit='d')
date_dif = (date_idx - date_bom)
# Append date data onto the crosstab data frame
spx_crosstab_ex['day_name'] = date_idx.day_name() # Day of the week
spx_crosstab_ex['date'] = date_idx # Date
spx_crosstab_ex['bom'] = date_bom # Beginning of the month (bom)
spx_crosstab_ex['mday'] = date_dif # Days elapsed from bom to expiration
# Compute the number of days in the month and the week of the expiration date;
# Using integer division (//) to get the week
spx_crosstab_ex['days'] = spx_crosstab_ex['mday'].dt.days + 1
spx_crosstab_ex['week'] = ((spx_crosstab_ex['days'] - 1) // 7) + 1
# Echo the tail of the (root x expiration) augmented cross-tab for SPX
spx_crosstab_ex[['SPX','day_name','date','bom','mday','days','week']].tail(10)
What we see in the above tail of the data is that the SPX root only has expirations that land on the 3rd Friday of the month (i.e., the count in the SPX column is > 0, the day_name == 'Friday', and the week == 3)
# Echo the head of the (root x expiration) augmented cross-tab for SPX
spx_crosstab_ex[['SPX','day_name','date','bom','mday','days','week']].head(10)
Now we see some of the complications that the early history presents. Here the SPX series all appear to expire on a Saturday. In addition, at least one of the expiration is associated with week 4 in the month.
# Echo the head of the (root x expiration) cross-tab for likely AM-settled roots
spx_crosstab_ex[am_roots + ['day_name','date','bom','mday','days','week']].head(20)
Now that we have a better feel for the types of roots that we want to include in the prototype calculator, we will move on to developing some filtering logic that isolates the relevant data.
No comments:
Post a Comment