Sourcing and Cleaning Interest Rate Data¶

Summary¶

In the previous script, I joined daily interest rate data onto the cleaned / processed / reduced CBOE data file. However, I did not describe the source of those data.

I downloaded historical constant maturity rates for US Treasury bills / notes / bonds from the St. Louis Fed's data repository. The data can be pulled from here.

The file includes daily interest rate data for many different tenors, but I am only interested in those that span the expiration period of the options in the CBOE data. In the previous post, I excluded any options with greater than 360 days-to-expiration (dte). So, I am only concerned with 1-month (1M), 3-month (3M), and 1-year (1Y) rates. The relevant files in the data I downloaded include the 'DGS1MO.csv', 'DGS3MO.csv', and 'DGS1.csv' files.

I did not spend a lot of time optimizing this script. I wanted to join the interest rate data onto the CBOE data with minimal effort, and the script reflects some of this laziness. For example, I make three repeated calls to the Pandas read_csv command, when that could have been loaded into a simple function.

I needed to clean some of the files, so I did write a function for that task. But it only really requires that I replace a missing value with some substitute rate. That is accomplished, ultimately, by averaging the rates from surrounding dates.

In other words, I'm including this script for transparency but it is not my best work.

# Import modules
import os
import pandas as pd


# =============================================================================
# Functions
# =============================================================================
def clean_ir_file(input_file, output_file):
    """
    """
    with open(output_file, 'wt') as outfile:
        with open(input_file, 'r') as infile:
            for i, line in enumerate(infile):
                if (i == 0):
                    outfile.write(line.lower())
                else:
                    date, rate = line.rstrip().split(sep=',')
                    if (rate in ['.', '']):
                        rate = '0.00'
                    line = ''.join(date + ',' + rate + '\n')
                    outfile.write(line)


# =============================================================================
# Set-up
# =============================================================================

# Alter pandas display settings
pd.set_option('display.max_rows', 200)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 500)

# Location of the Fred interest rate data
fred_dir = r'/Users/alexstephens/data/fred/IRTCM_csv_2/data'

# Define the output interest rate pickle file
ir_pkl = r'fred_ir_01M_03M_01Y.pkl'
ir_pkl_file = os.path.join(fred_dir, ir_pkl)

A minor innovation here. Trying the use of the zip() generator to zip the input and output files together.

# =============================================================================
# Read / clean the data
# =============================================================================

# Define input / output interest rate files
input_file_list = ['DGS1MO.csv', 'DGS3MO.csv', 'DGS1.csv']
output_file_list = ['DGS1MO_clean.csv', 'DGS3MO_clean.csv', 'DGS1_clean.csv']

# Clean the files
for in_item, out_item in zip(input_file_list, output_file_list):
    in_file = os.path.join(fred_dir, in_item)
    out_file = os.path.join(fred_dir, out_item)
    clean_ir_file(in_file, out_file)

In this step, I read the cleaned interest rate data into distinct dataframes -- one for each tenor of interest (1M, 3M, & 1Y)

# =============================================================================
# Read data frames
# =============================================================================

# Read 1M rates
dgs1mo = pd.read_csv(os.path.join(fred_dir, 'DGS1MO_clean.csv'),
                     skiprows=0,
                     delimiter=",",
                     header=0,
                     names=['date', 'R01M'],
                     dtype={'date': str, 'R01M': float},
                     parse_dates=['date'],
                     infer_datetime_format=True,)
dgs1mo = dgs1mo.assign(R01M=(dgs1mo.R01M/100))

# Read 3M rates
dgs3mo = pd.read_csv(os.path.join(fred_dir, 'DGS3MO_clean.csv'),
                     skiprows=0,
                     delimiter=",",
                     header=0,
                     names=['date', 'R03M'],
                     dtype={'date': str, 'R03M': float},
                     parse_dates=['date'],
                     infer_datetime_format=True,)
dgs3mo = dgs3mo.assign(R03M=(dgs3mo.R03M/100))

# Read 1Y rates
dgs1yr = pd.read_csv(os.path.join(fred_dir, 'DGS1_clean.csv'),
                     skiprows=0,
                     delimiter=",",
                     header=0,
                     names=['date', 'R01Y'],
                     dtype={'date': str, 'R01Y': float},
                     parse_dates=['date'],
                     infer_datetime_format=True,)
dgs1yr = dgs1yr.assign(R01Y=(dgs1yr.R01Y/100))

Doing a left join of the 1M and 3M data onto the 1Y data.

# =============================================================================
# Merge the interest rate (ir) data into a single data frame (df)
# =============================================================================

df_ir = dgs1yr
df_ir = df_ir.merge(dgs3mo, how='left', left_on='date', right_on='date')
df_ir = df_ir.merge(dgs1mo, how='left', left_on='date', right_on='date')
df_ir = df_ir[['date', 'R01M', 'R03M', 'R01Y']]

# =============================================================================
# Do simple interpolation to clean missing rates
# =============================================================================

# Indentify dates where all rates are zero
zero_idx = ((df_ir.R01M == 0.00) & (df_ir.R03M == 0.00) & (df_ir.R01Y == 0.00))

# Compute a simple average of (i-1) and (i+1) observations to
# create a way to fill-in zeros
df_ir['R01M_avg'] = 0.5*(df_ir.R01M.shift(-1) + df_ir.R01M.shift(+1))
df_ir['R03M_avg'] = 0.5*(df_ir.R03M.shift(-1) + df_ir.R03M.shift(+1))
df_ir['R01Y_avg'] = 0.5*(df_ir.R01Y.shift(-1) + df_ir.R01Y.shift(+1))

# Create the cleaned rate vectors
df_ir.loc[zero_idx, 'R01M_cln'] = df_ir['R01M_avg']
df_ir.loc[~zero_idx, 'R01M_cln'] = df_ir['R01M']

df_ir.loc[zero_idx, 'R03M_cln'] = df_ir['R03M_avg']
df_ir.loc[~zero_idx, 'R03M_cln'] = df_ir['R03M']

df_ir.loc[zero_idx, 'R01Y_cln'] = df_ir['R01Y_avg']
df_ir.loc[~zero_idx, 'R01Y_cln'] = df_ir['R01Y']

# =============================================================================
# Define a date string
# =============================================================================

# quote_date string
qd_dd = [str(day) if (day > 9) else ('0' + str(day))
         for day in df_ir['date'].dt.day]

qd_mm = [str(month) if (month > 9) else ('0' + str(month))
         for month in df_ir['date'].dt.month]

qd_yy = df_ir['date'].dt.year.map(str)

# append a date key string to the data frame
qd_str = (qd_yy + qd_mm + qd_dd)
df_ir = df_ir.assign(qd_str=list(qd_str))

For some reason the FRED interest rate file has not been updated since 2019-01-13 10:54 AM CST. That is ok, since the last observation in our data occurs in July 2018.

# Echo the first four rows (transposed)
df_ir.tail(4)

# Describe the data
df_ir[['R01M_cln', 'R03M_cln', 'R01Y_cln']].describe()

# Any instances where all three rates are simultaneously zero? (Ans. No)
df_ir[(df_ir.R01M_cln == 0.0) & (df_ir.R03M_cln == 0.0) & (df_ir.R01Y_cln == 0.0)]

Are there cases where the individual rates are zero? The answer is 'yes'. Those zero rates occur in the aftermath of the credit crisis, and they occur most frequently in the 1M tenor. This is consistent with the actions of the FRB following the marked meltdown.

df_ir.loc[(df_ir.R01M_cln == 0.0) & (df_ir.date.dt.year > 2003),['date']].describe()

df_ir.loc[(df_ir.R03M_cln == 0.0) & (df_ir.date.dt.year > 2003),['date']].describe()

df_ir.loc[(df_ir.R01Y_cln == 0.0) & (df_ir.date.dt.year > 2003),['date']].describe()

# =============================================================================
# Save the results as a pickle
# =============================================================================
df_ir = df_ir[['date', 'qd_str', 'R01M_cln', 'R03M_cln', 'R01Y_cln']]
df_ir.to_pickle(ir_pkl_file)

	date	R01M	R03M	R01Y	R01M_avg	R03M_avg	R01Y_avg	R01M_cln	R03M_cln	R01Y_cln	qd_str
14874	2019-01-07	0.0242	0.0245	0.0258	0.0240	0.02440	0.02585	0.0242	0.0245	0.0258	20190107
14875	2019-01-08	0.0240	0.0246	0.0260	0.0241	0.02450	0.02585	0.0240	0.0246	0.0260	20190108
14876	2019-01-09	0.0240	0.0245	0.0259	0.0241	0.02445	0.02595	0.0240	0.0245	0.0259	20190109
14877	2019-01-10	0.0242	0.0243	0.0259	NaN	NaN	NaN	0.0242	0.0243	0.0259	20190110

	R01M_cln	R03M_cln	R01Y_cln
count	4553.000000	9659.000000	14878.000000
mean	0.012513	0.038119	0.049490
std	0.015096	0.031578	0.034688
min	0.000000	0.000000	0.000000
25%	0.000600	0.007600	0.022100
50%	0.007700	0.038500	0.050200
75%	0.018100	0.058100	0.069100
max	0.052700	0.154900	0.173100

	date
count	90
unique	90
top	2015-10-06 00:00:00
freq	1
first	2008-12-10 00:00:00
last	2015-10-15 00:00:00

python-backtest

Sunday, November 17, 2019

013 - Sourcing Interest Rate Data

Sourcing and Cleaning Interest Rate Data¶

Summary¶

No comments:

Post a Comment

About Me

Blog Archive

	date
count	19
unique	19
top	2015-10-14 00:00:00
freq	1
first	2008-12-10 00:00:00
last	2015-10-22 00:00:00