Friday, November 1, 2019

006 - Reviewing the CBOE 'vwap' field

06_cboe_vwap_review

Reviewing the VWAP field

Overview

This is just a quick post to summarize a brief review of the VWAP field. Once again, load the full CBOE data frame.

In [11]:
import os
import pandas as pd

# Alter display settings
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

# Directory & file that houses the concatenated CBOE csv data
proc_dir = r"/Users/alexstephens/data/cboe/proc"
pkl_file = os.path.join(proc_dir, r"cboe_mmyy_all_clean_df.pkl")

# Read the .pkl file
df = pd.read_pickle(pkl_file)

First, let's just count the number of unqiue entires.

In [12]:
# There are 55460 unique entries for VWAP
print(len(df['vwap'].value_counts()))
55460
In [13]:
# Most are 0.00
df['vwap'].value_counts().head(10)
Out[13]:
0.00    9498430
0.05     126018
0.10      58145
0.15      36589
0.20      31364
0.25      24232
0.30      22380
0.50      17983
0.40      17272
0.35      16833
Name: vwap, dtype: int64

The problem is that many of the entries are extremely large positive and negative values.

In [14]:
# A handful have extremely large positive values
df[['vwap']].loc[(df['vwap'] >= 1e200)]
Out[14]:
vwap
380601 4.251098e+228
423116 8.158370e+276
466914 1.216283e+200
470052 4.916715e+257
470053 1.063048e+224
529463 4.916715e+257
614508 4.916715e+257
690292 5.868195e+250
736041 1.746767e+243
768919 6.574880e+244
910151 1.437005e+294
919991 5.878036e+250
1165312 4.916715e+257
1177963 1.063048e+224
1493913 1.371174e+241
1520032 1.063048e+224
1520042 4.916715e+257
1522506 4.916715e+257
1523002 1.063048e+224
1543700 1.063048e+224
1543703 4.916715e+257
1720727 3.974446e+234
1789850 8.019584e+283
In [15]:
# There are also a handful of extremely large negative values
df[['vwap']].loc[(df['vwap'] <= -1e200)]
Out[15]:
vwap
447371 -1.662828e+305
447383 -4.447256e+304
499532 -4.447256e+304
790925 -4.447256e+304
1103344 -1.662828e+305
1155828 -1.662828e+305
1157903 -4.447256e+304
1329069 -1.662828e+305
1337114 -1.662828e+305
1520456 -1.662828e+305
1522819 -4.447256e+304
1544100 -4.447256e+304
1789835 -1.825828e+259
1881938 -1.825828e+259
1888249 -4.436878e+261
1890027 -2.460067e+260

Given the above, I assumed that I'd read the csv incorrectly. But when I go back to the original (raw) csv files, I see that there are actually rows that contain gargantuan values in the VWAP field

^SPX,2010-05-07,JXB,2010-05-14,1225.000,p,0.00,0.00,0.00,0.00,0,11, 113.30,11,120.70,1109.46,1109.46,0.00,1109.46,0.4758,-0.9289,0.001858, -0.672528,0.208984,-22.052792,11,114.30,11,118.10,1110.87,1110.87, 801958377085365210000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000 00000000000000.00,0,0

This field is not relevant to the strategy backtesting exercise. It also doesn't appear to be relevant to the VIX calculation, so I will likely drop this column from the data when we start the data reduction process.

No comments:

Post a Comment