import os
import pandas as pd
# Alter display settings
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
# Directory & file that houses the concatenated CBOE csv data
proc_dir = r"/Users/alexstephens/data/cboe/proc"
pkl_file = os.path.join(proc_dir, r"cboe_mmyy_all_clean_df.pkl")
# Read the .pkl file
df = pd.read_pickle(pkl_file)
First, let's just count the number of unqiue entires.
# There are 55460 unique entries for VWAP
print(len(df['vwap'].value_counts()))
# Most are 0.00
df['vwap'].value_counts().head(10)
The problem is that many of the entries are extremely large positive and negative values.
# A handful have extremely large positive values
df[['vwap']].loc[(df['vwap'] >= 1e200)]
# There are also a handful of extremely large negative values
df[['vwap']].loc[(df['vwap'] <= -1e200)]
Given the above, I assumed that I'd read the csv incorrectly. But when I go back to the original (raw) csv files, I see that there are actually rows that contain gargantuan values in the VWAP field
^SPX,2010-05-07,JXB,2010-05-14,1225.000,p,0.00,0.00,0.00,0.00,0,11, 113.30,11,120.70,1109.46,1109.46,0.00,1109.46,0.4758,-0.9289,0.001858, -0.672528,0.208984,-22.052792,11,114.30,11,118.10,1110.87,1110.87, 801958377085365210000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000 00000000000000.00,0,0
This field is not relevant to the strategy backtesting exercise. It also doesn't appear to be relevant to the VIX calculation, so I will likely drop this column from the data when we start the data reduction process.
No comments:
Post a Comment