Augmenting the CBOE file¶

Summary¶

There are a couple of additions I would like to make to the scrubbed CBOE data frame. These additions should (in theory) make it easier to isolate relevant data.

The additions I'd like to make include (this list may grow over time):

Capitalize the option type
Create a key using the root, a YYYYMMDD string, the put/call key, and the strike
Create a similar key using only th YYYYMMDD string, the put/call key, and the strike
Compute the number of days that will elapse from the quote_date to expiration
Add an expiration day of week field (i.e., Friday, Saturday, etc.)
Add an expiration week of month (i.e., the 3rd week in the month)

At this stage, I would like to retain as much data in its raw form as is practical. In the future I will start to prune the data frame.

To start, load the raw data frame.

import os
import pandas as pd

# Alter display settings
pd.set_option('display.max_rows', 50)
pd.set_option('display.max_columns', 50)
pd.set_option('display.width', 500)

# Directory & file that houses the concatenated CBOE csv data
proc_dir = r"/Users/alexstephens/data/cboe/proc"
pkl_file = os.path.join(proc_dir, r"cboe_mmyy_all_clean_df.pkl")

# Read the .pkl file
df = pd.read_pickle(pkl_file)

# Echo the columns
list(df.columns.values)

['underlying_symbol',
 'quote_date',
 'root',
 'expiration',
 'strike',
 'option_type',
 'open',
 'high',
 'low',
 'close',
 'trade_volume',
 'bid_size_1545',
 'bid_1545',
 'ask_size_1545',
 'ask_1545',
 'underlying_bid_1545',
 'underlying_ask_1545',
 'implied_underlying_price_1545',
 'active_underlying_price_1545',
 'implied_volatility_1545',
 'delta_1545',
 'gamma_1545',
 'theta_1545',
 'vega_1545',
 'rho_1545',
 'bid_size_eod',
 'bid_eod',
 'ask_size_eod',
 'ask_eod',
 'underlying_bid_eod',
 'underlying_ask_eod',
 'vwap',
 'open_interest',
 'delivery_code']

Start to work through the punch-list. Here I upper-case the put/call field, and I use list comprehensions to create a two character representation of the day number and month number. That is, I want a day number like 9 to read as '09'. Finally I concatenate the fields to create the YYYYMMDD string.

# Apply upper-case to the option_type field
df['option_type'] = df['option_type'].str.upper()

# Use a list comprehension to create a 2 character expiry day & month field
df['ex_dd'] = [str(day) if (day > 9) else ('0' + str(day))
               for day in df['expiration'].dt.day]

df['ex_mm'] = [str(month) if (month > 9) else ('0' + str(month))
               for month in df['expiration'].dt.month]

# Use a map method to create the four character expiry YYYY field
df['ex_yy'] = df['expiration'].dt.year.map(str)

# Create an expiration YYYYMMDD field
df['ex_yymmdd'] = df['ex_yy'] + df['ex_mm'] + df['ex_dd']

# Echo progress
df.head(2).transpose()

Now I add the day name and the number of weeks into the month the expiration date represents.

I need to create a string version of an integer version of the floating point strike. This is what the 'strike_str' represents.

Then I create a couple of string-based keys. Lastly, I compute the number of days between expiration and the quote_date.

# Add the day name to ease visualization of the expiration day
df['ex_dayname'] = df['expiration'].dt.day_name()

# Compute the number of weeks elapsed in the month by expiration
df['ex_weeknum'] = df['expiration'].apply(lambda d: (d.day-1) // 7 + 1)

# Create a strike string field; I have to convert the float to
# integer before converting to a string.  There is a single strike
# with a non-integer value (8.75)
df['strike_str'] = df.strike.astype(int).astype(str)

# Create an option series key
df['series_key'] = (df['root'] + df['ex_yymmdd'] + df['option_type'] +
                    df['strike_str'])

# Create strike and expiration date keys
df['dt_typ_stk_key'] = df['ex_yymmdd'] + df['option_type'] + df['strike_str']
df['dt_typ_key'] = df['ex_yymmdd'] + df['option_type']

# Compute the number of days-to-expiration (dte); Use the so-called
# dt() accessor for series to extract the numerical days to expiry
df['dte'] = (df['expiration'] - df['quote_date']).dt.days

# Keep only the date component of the 'quote_date' and 'expiration'
df['quote_date'] = df['quote_date'].dt.date
df['expiration'] = df['expiration'].dt.date

# Compute the underlying mip-price
df['underlying_mid_1545'] = 0.5*(df['underlying_bid_1545'] +
                                 df['underlying_ask_1545'])

# Turns out the underlying bid/ask spread is 0.0 in all cases
# df['underlying_dif_1545'] = abs(df['underlying_bid_1545'] -
#                                df['underlying_ask_1545'])

# Echo progress
df.head(2).transpose()

I want to print out some diagnostics to better characterize the underlying data. In particular, I still want to review what the 'root' field shows, as I am not totally comforatable with the definitions of those fields.

# A sanity check of the week number calculation
pd.crosstab(df.ex_dd, df.ex_weeknum)

# When does a Saturday expiration occur
pd.crosstab(df.ex_dd, df.ex_dayname)

# A cross table of the day name and week number
pd.crosstab(df.ex_dayname, df.ex_weeknum)

# A cross table of the root and the week number
pd.crosstab(df.root, df.ex_weeknum)

# A cross table of the root and the day name
pd.crosstab(df.root, df.ex_dayname)

I will absorb the above information and use it to help with the filtering step in the next post. For now, save this data frame to a file. (NB. It is up to about 5 Gb in size).

# Save the output 
out_file = os.path.join(proc_dir, r"cboe_mmyy_aug_clean_df.pkl")
df.to_pickle(out_file)

	0	1
underlying_symbol	^SPX	^SPX
quote_date	2004-01-30 00:00:00	2004-01-30 00:00:00
root	SYG	SYG
expiration	2004-02-21 00:00:00	2004-02-21 00:00:00
strike	600	600
option_type	C	P
open	0	0
high	0	0
low	0	0
close	0	0
trade_volume	0	0
bid_size_1545	50	0
bid_1545	529.6	0
ask_size_1545	50	10
ask_1545	531.6	0.05
underlying_bid_1545	1132.04	1132.04
underlying_ask_1545	1132.04	1132.04
implied_underlying_price_1545	1129.63	1129.63
active_underlying_price_1545	1129.63	1129.63
implied_volatility_1545	0	0.8085
delta_1545	1	-0.0004
gamma_1545	0	6e-06
theta_1545	-0.039398	-0.007394
vega_1545	0	0.003898
rho_1545	34.4217	-0.027385
bid_size_eod	50	0
bid_eod	529.9	0
ask_size_eod	50	10
ask_eod	531.9	0.05
underlying_bid_eod	1131.13	1131.13
underlying_ask_eod	1131.13	1131.13
vwap	0	0
open_interest	0	40
delivery_code	0	0
ex_dd	21	21
ex_mm	02	02
ex_yy	2004	2004
ex_yymmdd	20040221	20040221

	0	1
underlying_symbol	^SPX	^SPX
quote_date	2004-01-30	2004-01-30
root	SYG	SYG
expiration	2004-02-21	2004-02-21
strike	600	600
option_type	C	P
open	0	0
high	0	0
low	0	0
close	0	0
trade_volume	0	0
bid_size_1545	50	0
bid_1545	529.6	0
ask_size_1545	50	10
ask_1545	531.6	0.05
underlying_bid_1545	1132.04	1132.04
underlying_ask_1545	1132.04	1132.04
implied_underlying_price_1545	1129.63	1129.63
active_underlying_price_1545	1129.63	1129.63
implied_volatility_1545	0	0.8085
delta_1545	1	-0.0004
gamma_1545	0	6e-06
theta_1545	-0.039398	-0.007394
vega_1545	0	0.003898
rho_1545	34.4217	-0.027385
bid_size_eod	50	0
bid_eod	529.9	0
ask_size_eod	50	10
ask_eod	531.9	0.05
underlying_bid_eod	1131.13	1131.13
underlying_ask_eod	1131.13	1131.13
vwap	0	0
open_interest	0	40
delivery_code	0	0
ex_dd	21	21
ex_mm	02	02
ex_yy	2004	2004
ex_yymmdd	20040221	20040221
ex_dayname	Saturday	Saturday
ex_weeknum	3	3
strike_str	600	600
series_key	SYG20040221C600	SYG20040221P600
dt_typ_stk_key	20040221C600	20040221P600
dt_typ_key	20040221C	20040221P
dte	22	22
underlying_mid_1545	1132.04	1132.04

ex_weeknum	1	2	3	4	5
ex_dd
01	139728	0	0	0	0
02	209944	0	0	0	0
03	128896	0	0	0	0
04	171556	0	0	0	0
05	173302	0	0	0	0
06	200170	0	0	0	0
07	150846	0	0	0	0
08	0	166732	0	0	0
09	0	168755	0	0	0
10	0	151210	0	0	0
11	0	182048	0	0	0
12	0	168318	0	0	0
13	0	204096	0	0	0
14	0	137114	0	0	0
15	0	0	698462	0	0
16	0	0	986936	0	0
17	0	0	967102	0	0
18	0	0	917650	0	0
19	0	0	1027930	0	0
20	0	0	1109663	0	0
21	0	0	1011259	0	0
22	0	0	0	735190	0
23	0	0	0	173718	0
24	0	0	0	208007	0
25	0	0	0	128994	0
26	0	0	0	170828	0
27	0	0	0	207723	0
28	0	0	0	315181	0
29	0	0	0	0	425000
30	0	0	0	0	799494
31	0	0	0	0	821806

ex_dayname	Friday	Monday	Saturday	Thursday	Tuesday	Wednesday
ex_dd
01	117128	4706	0	130	0	17764
02	132108	11830	0	42862	5908	17236
03	87072	12204	0	9858	4806	14956
04	141466	10782	0	0	0	19308
05	129408	22584	0	684	5668	14958
06	160574	15566	0	0	2280	21750
07	106504	16234	0	0	0	28108
08	138190	10398	0	0	0	18144
09	133684	16070	0	0	0	19001
10	123662	12402	0	0	0	15146
11	147934	14878	0	0	8	19228
12	128526	24502	0	0	0	15290
13	154320	15614	0	13532	0	20630
14	93520	15920	0	0	0	27674
15	674738	4924	0	0	0	18800
16	517456	12492	432970	0	5752	18266
17	347266	12236	588744	0	4012	14844
18	341240	13642	542890	0	0	19878
19	319200	18050	675340	0	0	15340
20	492332	11030	578559	0	7116	20626
21	433064	16232	528461	480	4936	28086
22	131932	11258	572996	0	0	19004
23	138226	16842	0	356	0	18294
24	128409	12400	0	52948	0	14250
25	102322	6834	0	0	0	19838
26	130066	20508	0	0	5932	14322
27	167727	15542	0	0	4610	19844
28	207023	9378	0	28158	26892	43730
29	316516	40086	0	47606	6642	14150
30	295693	152336	0	123142	111099	117224
31	144524	170344	0	180454	178069	148415

ex_weeknum	1	2	3	4	5
ex_dayname
Friday	874260	919836	3125296	1005705	756733
Monday	93906	109784	88606	92762	362766
Saturday	0	0	3346964	572996	0
Thursday	53534	13532	480	81462	351202
Tuesday	18662	8	21816	37434	295810
Wednesday	134080	135113	135840	149282	279789

python-backtest

Saturday, November 2, 2019

007 - Augmenting the Raw CBOE File

Augmenting the CBOE file¶

Summary¶

No comments:

Post a Comment

ex_weeknum	1	2	3	4	5
root
BSF	0	0	11124	954	0
BSK	0	0	44763	2880	0
BSZ	0	0	174054	16420	0
JXA	5418	0	50	0	0
JXB	0	5790	38	0	0
JXD	0	0	0	5452	0
JXE	40	0	0	0	1953
QSE	0	0	1	0	19058
QSZ	0	0	0	0	13313
QZQ	0	0	0	0	15581
SAQ	0	0	16	1232	16821
SKQ	0	0	0	0	6163
SLQ	0	0	0	1270	16144
SPB	0	0	24554	9332	0
SPL	0	0	35968	2744	0
SPQ	0	0	111605	14308	0
SPT	0	0	137612	20630	0
SPV	0	0	15898	772	0
SPX	1680	1322	4422746	365596	566
SPXPM	0	0	0	0	59798
SPXQ	20	20	0	81638	290428
SPXW	1167284	1171141	830946	1274129	1562550
SPZ	0	0	70581	8134	0
SQG	0	0	0	0	2760
SQP	0	0	0	1248	14343
SRO	0	0	28186	6889	0
SVP	0	0	19706	4386	0
SXB	0	0	96817	11614	0
SXG	0	0	60828	2752	0
SXJ	0	0	2204	1354	0
SXM	0	0	54088	12494	0
SXY	0	0	109220	21246	0
SXZ	0	0	81345	17171	0
SYF	0	0	6442	126	0
SYG	0	0	56182	6502	0
SYU	0	0	28673	2872	0
SYV	0	0	4910	0	0
SYZ	0	0	24140	1784	0
SZD	0	0	12216	0	0
SZJ	0	0	30853	4360	0
SZP	0	0	138221	23574	0
SZQ	0	0	33	184	26822
SZT	0	0	44792	10880	0
SZU	0	0	11366	704	0
SZV	0	0	27122	2930	0
VSTRP	0	0	1702	1080	0

ex_dayname	Friday	Monday	Saturday	Thursday	Tuesday	Wednesday
root
BSF	0	0	12078	0	0	0
BSK	0	0	47643	0	0	0
BSZ	0	0	190474	0	0	0
JXA	5288	0	50	130	0	0
JXB	5790	0	38	0	0	0
JXD	5452	0	0	0	0	0
JXE	1993	0	0	0	0	0
QSE	789	264	1	5360	3869	8776
QSZ	733	0	0	3520	2464	6596
QZQ	745	0	0	4536	2792	7508
SAQ	2551	5028	16	1756	4252	4466
SKQ	260	1306	0	568	2021	2008
SLQ	2588	4894	0	1752	4282	3898
SPB	0	0	33886	0	0	0
SPL	0	0	38712	0	0	0
SPQ	0	0	125913	0	0	0
SPT	0	0	158242	0	0	0
SPV	0	0	16670	0	0	0
SPX	2546116	0	2245758	36	0	0
SPXPM	23378	0	0	33540	0	2880
SPXQ	123676	138462	0	33290	62398	14280
SPXW	3958771	588836	0	410200	280820	767423
SPZ	0	0	78715	0	0	0
SQG	0	0	0	0	864	1896
SQP	2639	4760	0	1246	3062	3884
SRO	0	0	35075	0	0	0
SVP	0	0	24092	0	0	0
SXB	0	0	108431	0	0	0
SXG	0	0	63580	0	0	0
SXJ	0	0	3558	0	0	0
SXM	0	0	66582	0	0	0
SXY	0	0	130466	0	0	0
SXZ	0	0	98516	0	0	0
SYF	0	0	6568	0	0	0
SYG	0	0	62684	0	0	0
SYU	0	0	31545	0	0	0
SYV	0	0	4910	0	0	0
SYZ	0	0	25924	0	0	0
SZD	0	0	12216	0	0	0
SZJ	0	0	35213	0	0	0
SZP	0	0	161795	0	0	0
SZQ	1061	4274	33	4276	6906	10489
SZT	0	0	55672	0	0	0
SZU	0	0	12070	0	0	0
SZV	0	0	30052	0	0	0
VSTRP	0	0	2782	0	0	0