Reviewing the VWAP field¶

Overview¶

This is just a quick post to summarize a brief review of the VWAP field. Once again, load the full CBOE data frame.

import os
import pandas as pd

# Alter display settings
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

# Directory & file that houses the concatenated CBOE csv data
proc_dir = r"/Users/alexstephens/data/cboe/proc"
pkl_file = os.path.join(proc_dir, r"cboe_mmyy_all_clean_df.pkl")

# Read the .pkl file
df = pd.read_pickle(pkl_file)

First, let's just count the number of unqiue entires.

# There are 55460 unique entries for VWAP
print(len(df['vwap'].value_counts()))

55460

# Most are 0.00
df['vwap'].value_counts().head(10)

0.00    9498430
0.05     126018
0.10      58145
0.15      36589
0.20      31364
0.25      24232
0.30      22380
0.50      17983
0.40      17272
0.35      16833
Name: vwap, dtype: int64

The problem is that many of the entries are extremely large positive and negative values.

# A handful have extremely large positive values
df[['vwap']].loc[(df['vwap'] >= 1e200)]

# There are also a handful of extremely large negative values
df[['vwap']].loc[(df['vwap'] <= -1e200)]

Given the above, I assumed that I'd read the csv incorrectly. But when I go back to the original (raw) csv files, I see that there are actually rows that contain gargantuan values in the VWAP field

^SPX,2010-05-07,JXB,2010-05-14,1225.000,p,0.00,0.00,0.00,0.00,0,11, 113.30,11,120.70,1109.46,1109.46,0.00,1109.46,0.4758,-0.9289,0.001858, -0.672528,0.208984,-22.052792,11,114.30,11,118.10,1110.87,1110.87, 801958377085365210000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000 00000000000000.00,0,0

This field is not relevant to the strategy backtesting exercise. It also doesn't appear to be relevant to the VIX calculation, so I will likely drop this column from the data when we start the data reduction process.

	vwap
380601	4.251098e+228
423116	8.158370e+276
466914	1.216283e+200
470052	4.916715e+257
470053	1.063048e+224
529463	4.916715e+257
614508	4.916715e+257
690292	5.868195e+250
736041	1.746767e+243
768919	6.574880e+244
910151	1.437005e+294
919991	5.878036e+250
1165312	4.916715e+257
1177963	1.063048e+224
1493913	1.371174e+241
1520032	1.063048e+224
1520042	4.916715e+257
1522506	4.916715e+257
1523002	1.063048e+224
1543700	1.063048e+224
1543703	4.916715e+257
1720727	3.974446e+234
1789850	8.019584e+283

	vwap
447371	-1.662828e+305
447383	-4.447256e+304
499532	-4.447256e+304
790925	-4.447256e+304
1103344	-1.662828e+305
1155828	-1.662828e+305
1157903	-4.447256e+304
1329069	-1.662828e+305
1337114	-1.662828e+305
1520456	-1.662828e+305
1522819	-4.447256e+304
1544100	-4.447256e+304
1789835	-1.825828e+259
1881938	-1.825828e+259
1888249	-4.436878e+261
1890027	-2.460067e+260

python-backtest

Friday, November 1, 2019

006 - Reviewing the CBOE 'vwap' field

Reviewing the VWAP field¶

Overview¶

No comments:

Post a Comment

About Me

Blog Archive