Friday, November 1, 2019

005 - Reviewing the CBOE 'root' Field

05_cboe_root_review

Reviewing the 'root' field in the CBOE data

Overview

In the previous analysis post, I created a data frame that houses the entire historical SPX option chain dataset.

For the prototype backtest calculator, I want to use the standard or regular SPX option series. That is the series that expires in the morning (i.e., AM-settled) on the 3rd Friday of each month. In addition, I will want to extract the other weekly series with a (PM) Friday expiration. The reason for choosing these series is that I can use them to compute a daily estimate of the VIX volatility index.

We can distinguish between series using the 'root' field. Unfortunately, it appears that several roots can represent a given type of option series. That is several series in the historical data may expire on the same day.

In addition, the root fields that map to a series may change over time. Specifically, the CBOE made an effort to simplify (or clean-up) the root fields at certain points in the past. So, we need to understand what each 'root' code means if we want to conduct backtests using self-consistent data.

I'll start be re-loading the saved data frame and examining the most- and least-frequently occurring root values.

In [10]:
import os
import pandas as pd

# Directory & file that houses the concatenated CBOE csv data
proc_dir = r"/Users/alexstephens/data/cboe/proc"
pkl_file = os.path.join(proc_dir, r"cboe_mmyy_all_clean_df.pkl")

# Read the .pkl file
df = pd.read_pickle(pkl_file)

# Head of the root field counts
df['root'].value_counts().head(10)
Out[10]:
SPXW    6006050
SPX     4791910
SPXQ     372106
BSZ      190474
SZP      161795
SPT      158242
SXY      130466
SPQ      125913
SXB      108431
SXZ       98516
Name: root, dtype: int64

In the above the root 'SPXW', 'SPX', and 'SPXQ' appear the most often. It is tempting to assume that these roots represent the enire set of weekly, regular (i.e., 3rd Friday), and quarterly series. Unfortunately, we will find that the naming convention is not that simple. Or, at least, those roots may not represent the entire set of weekly, regular, and quarterly series.

It is not clear to me where to find definitive information about these option chain categories. The CBOE has good information on the currently traded options; however, information about the earliest data in the history is not readily available.

For example, if you review the first rows in the data, you will see that the SPXW and SPXQ did not exist at all ... and the SPX root was just one of many available roots.

A relatively recent paper on implied volatility provides a plain english explanation of some of the roots.

The authors state:

We only consider options that the CBOE actually used in their computation of the VIX over our sample period, namely those in the SPB, SPQ, SPT, SPV, SPX, SPZ, SVP, SXB, SXM, SXY, SXZ, SYG, SYU, SYV and SZP categories. The latter are generally known as SPX equity options and they mature on the Saturday immediately following the third Friday of the expiration month. The series are based on MDR (Market Data Retrieval) quotes captured by CBOE’s internal system. The underlying tick-by-tick data cover the period June 2, 2008 – June 30, 2010

The only problem is that their data only cover a small slice of the history.

I compared the above with a table from the Options Clearing Corporation (OCC) that lists Production Symbols with AM/PM Settlement Differences As Of 12/4/2009. The naming convention used by the OCC is consistent with the Andersen et al. summary,

Lastly, some of the more esoteric roots were discussed in a 2005 CBOE Information Circular on weekly options. So, this reference is useful in identifying some of the earliest weekly options.

In [11]:
# Compute a cross-tab of expiration and root
root_crosstab_ex = pd.crosstab(df.expiration, df.root)

Below I will echo the head and tail of the cross tab, filtering only those roots that are likely associated with AM-settled SPX options. We see that the earliest data contain a diverse array of roots. At the end of the historical data, the root 'SPX' is the only one that remains in use.

In [12]:
# Define a list of candidate AM-settled series roots
am_roots = ['SPB','SYG','SPQ','SPT','SPX','SPZ','SXB','SXM','SXY','SZP','SYU','SYV','SZU','SVP','SPV','SXZ']

# Echo the head of the (root x expiration) cross-tab for likely AM-settled roots
root_crosstab_ex[am_roots].head(10)
Out[12]:
root SPB SYG SPQ SPT SPX SPZ SXB SXM SXY SZP SYU SYV SZU SVP SPV SXZ
expiration
2004-01-17 0 88 352 298 88 88 198 0 0 44 0 0 0 0 0 0
2004-02-21 0 272 690 1012 272 272 408 0 0 416 0 0 0 0 0 0
2004-03-20 0 432 1106 1344 432 432 648 108 148 396 0 0 0 0 0 40
2004-04-17 0 472 924 1274 496 496 620 0 0 456 0 0 0 0 0 0
2004-05-22 0 496 862 1494 512 504 640 0 0 488 0 0 0 0 0 0
2004-06-19 0 464 1626 2098 928 696 1248 232 696 894 0 0 0 0 0 464
2004-07-17 0 496 974 1372 496 496 702 32 0 572 0 0 0 0 0 0
2004-08-21 0 248 1260 1630 496 248 726 52 112 364 0 0 0 0 0 0
2004-09-18 0 1404 2614 2672 1342 1430 1872 358 716 1290 0 0 0 0 0 716
2004-10-16 0 256 1274 1412 512 256 652 42 0 384 0 0 0 0 0 0
In [13]:
# Echo the tail of the (root x expiration) cross-tab for likely AM-settled roots
root_crosstab_ex[am_roots].tail(10)
Out[13]:
root SPB SYG SPQ SPT SPX SPZ SXB SXM SXY SZP SYU SYV SZU SVP SPV SXZ
expiration
2018-09-28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2018-10-31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2018-12-21 0 0 0 0 119746 0 0 0 0 0 0 0 0 0 0 0
2018-12-31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2019-01-18 0 0 0 0 29112 0 0 0 0 0 0 0 0 0 0 0
2019-03-15 0 0 0 0 18916 0 0 0 0 0 0 0 0 0 0 0
2019-03-29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2019-06-21 0 0 0 0 39196 0 0 0 0 0 0 0 0 0 0 0
2019-12-20 0 0 0 0 67632 0 0 0 0 0 0 0 0 0 0 0
2020-12-18 0 0 0 0 22280 0 0 0 0 0 0 0 0 0 0 0

It is worth verifying that the expiration dates actually align with the third Friday of every month. It is possible that some of these series do not, so we will need to filter by expiration date in subsequent steps.

To do this, I used the cross-tab index to define several date-based columns that I will add to the data.

I'm not really sure if this is a Pythonic approach. I basically need to find a book on Pandas that I can go through systematically.

In [14]:
# Continue to use the (root x expiration) cross-tab

# Isolate the likely AM-settled SPX roots
root_crosstab_ex_idx = root_crosstab_ex.index
spx_crosstab_ex = root_crosstab_ex.loc[root_crosstab_ex_idx, am_roots]

# Use the index to create various date-based DatetimeIndex objects
date_idx = pd.to_datetime(spx_crosstab_ex.index)
date_bom = date_idx - pd.to_timedelta(date_idx.day - 1, unit='d')
date_dif = (date_idx - date_bom)

# Append date data onto the crosstab data frame
spx_crosstab_ex['day_name'] = date_idx.day_name()  # Day of the week
spx_crosstab_ex['date'] = date_idx  # Date
spx_crosstab_ex['bom'] = date_bom  # Beginning of the month (bom)
spx_crosstab_ex['mday'] = date_dif # Days elapsed from bom to expiration

# Compute the number of days in the month and the week of the expiration date;
# Using integer division (//) to get the week
spx_crosstab_ex['days'] = spx_crosstab_ex['mday'].dt.days + 1
spx_crosstab_ex['week'] = ((spx_crosstab_ex['days'] - 1) // 7) + 1
In [15]:
# Echo the tail of the (root x expiration) augmented cross-tab for SPX
spx_crosstab_ex[['SPX','day_name','date','bom','mday','days','week']].tail(10)
Out[15]:
root SPX day_name date bom mday days week
expiration
2018-09-28 0 Friday 2018-09-28 2018-09-01 27 days 28 4
2018-10-31 0 Wednesday 2018-10-31 2018-10-01 30 days 31 5
2018-12-21 119746 Friday 2018-12-21 2018-12-01 20 days 21 3
2018-12-31 0 Monday 2018-12-31 2018-12-01 30 days 31 5
2019-01-18 29112 Friday 2019-01-18 2019-01-01 17 days 18 3
2019-03-15 18916 Friday 2019-03-15 2019-03-01 14 days 15 3
2019-03-29 0 Friday 2019-03-29 2019-03-01 28 days 29 5
2019-06-21 39196 Friday 2019-06-21 2019-06-01 20 days 21 3
2019-12-20 67632 Friday 2019-12-20 2019-12-01 19 days 20 3
2020-12-18 22280 Friday 2020-12-18 2020-12-01 17 days 18 3

What we see in the above tail of the data is that the SPX root only has expirations that land on the 3rd Friday of the month (i.e., the count in the SPX column is > 0, the day_name == 'Friday', and the week == 3)

In [16]:
# Echo the head of the (root x expiration) augmented cross-tab for SPX
spx_crosstab_ex[['SPX','day_name','date','bom','mday','days','week']].head(10)
Out[16]:
root SPX day_name date bom mday days week
expiration
2004-01-17 88 Saturday 2004-01-17 2004-01-01 16 days 17 3
2004-02-21 272 Saturday 2004-02-21 2004-02-01 20 days 21 3
2004-03-20 432 Saturday 2004-03-20 2004-03-01 19 days 20 3
2004-04-17 496 Saturday 2004-04-17 2004-04-01 16 days 17 3
2004-05-22 512 Saturday 2004-05-22 2004-05-01 21 days 22 4
2004-06-19 928 Saturday 2004-06-19 2004-06-01 18 days 19 3
2004-07-17 496 Saturday 2004-07-17 2004-07-01 16 days 17 3
2004-08-21 496 Saturday 2004-08-21 2004-08-01 20 days 21 3
2004-09-18 1342 Saturday 2004-09-18 2004-09-01 17 days 18 3
2004-10-16 512 Saturday 2004-10-16 2004-10-01 15 days 16 3

Now we see some of the complications that the early history presents. Here the SPX series all appear to expire on a Saturday. In addition, at least one of the expiration is associated with week 4 in the month.

In [17]:
# Echo the head of the (root x expiration) cross-tab for likely AM-settled roots
spx_crosstab_ex[am_roots + ['day_name','date','bom','mday','days','week']].head(20)
Out[17]:
root SPB SYG SPQ SPT SPX SPZ SXB SXM SXY SZP ... SZU SVP SPV SXZ day_name date bom mday days week
expiration
2004-01-17 0 88 352 298 88 88 198 0 0 44 ... 0 0 0 0 Saturday 2004-01-17 2004-01-01 16 days 17 3
2004-02-21 0 272 690 1012 272 272 408 0 0 416 ... 0 0 0 0 Saturday 2004-02-21 2004-02-01 20 days 21 3
2004-03-20 0 432 1106 1344 432 432 648 108 148 396 ... 0 0 0 40 Saturday 2004-03-20 2004-03-01 19 days 20 3
2004-04-17 0 472 924 1274 496 496 620 0 0 456 ... 0 0 0 0 Saturday 2004-04-17 2004-04-01 16 days 17 3
2004-05-22 0 496 862 1494 512 504 640 0 0 488 ... 0 0 0 0 Saturday 2004-05-22 2004-05-01 21 days 22 4
2004-06-19 0 464 1626 2098 928 696 1248 232 696 894 ... 0 0 0 464 Saturday 2004-06-19 2004-06-01 18 days 19 3
2004-07-17 0 496 974 1372 496 496 702 32 0 572 ... 0 0 0 0 Saturday 2004-07-17 2004-07-01 16 days 17 3
2004-08-21 0 248 1260 1630 496 248 726 52 112 364 ... 0 0 0 0 Saturday 2004-08-21 2004-08-01 20 days 21 3
2004-09-18 0 1404 2614 2672 1342 1430 1872 358 716 1290 ... 0 0 0 716 Saturday 2004-09-18 2004-09-01 17 days 18 3
2004-10-16 0 256 1274 1412 512 256 652 42 0 384 ... 0 0 0 0 Saturday 2004-10-16 2004-10-01 15 days 16 3
2004-11-20 0 0 1382 1708 496 346 850 0 0 514 ... 0 0 0 0 Saturday 2004-11-20 2004-11-01 19 days 20 3
2004-12-18 0 972 3944 6064 1944 1458 3230 972 1404 3016 ... 0 0 0 972 Saturday 2004-12-18 2004-12-01 17 days 18 3
2005-01-22 0 268 770 1620 536 268 670 70 98 1230 ... 0 0 0 0 Saturday 2005-01-22 2005-01-01 21 days 22 4
2005-02-19 0 0 578 1496 496 496 620 0 100 1130 ... 0 0 0 0 Saturday 2005-02-19 2005-02-01 18 days 19 3
2005-03-19 0 1410 2222 3008 1992 1990 2506 322 1156 2982 ... 0 0 0 492 Saturday 2005-03-19 2005-03-01 18 days 19 3
2005-04-16 84 0 586 1376 232 232 580 66 296 1196 ... 0 0 0 0 Saturday 2005-04-16 2005-04-01 15 days 16 3
2005-05-21 0 0 1060 1424 352 252 504 44 230 1322 ... 0 0 0 0 Saturday 2005-05-21 2005-05-01 20 days 21 3
2005-06-18 0 500 2552 3404 1500 1000 2500 500 1748 3092 ... 0 0 0 1244 Saturday 2005-06-18 2005-06-01 17 days 18 3
2005-07-16 0 0 678 1578 504 252 504 0 252 1578 ... 0 0 0 46 Saturday 2005-07-16 2005-07-01 15 days 16 3
2005-08-20 0 0 710 1504 305 252 504 0 454 1954 ... 0 0 0 0 Saturday 2005-08-20 2005-08-01 19 days 20 3

20 rows × 22 columns

Now that we have a better feel for the types of roots that we want to include in the prototype calculator, we will move on to developing some filtering logic that isolates the relevant data.

No comments:

Post a Comment