Sunday, November 17, 2019

013 - Sourcing Interest Rate Data

13_interest_rates

Sourcing and Cleaning Interest Rate Data

Summary

In the previous script, I joined daily interest rate data onto the cleaned / processed / reduced CBOE data file. However, I did not describe the source of those data.

I downloaded historical constant maturity rates for US Treasury bills / notes / bonds from the St. Louis Fed's data repository. The data can be pulled from here.

The file includes daily interest rate data for many different tenors, but I am only interested in those that span the expiration period of the options in the CBOE data. In the previous post, I excluded any options with greater than 360 days-to-expiration (dte). So, I am only concerned with 1-month (1M), 3-month (3M), and 1-year (1Y) rates. The relevant files in the data I downloaded include the 'DGS1MO.csv', 'DGS3MO.csv', and 'DGS1.csv' files.

I did not spend a lot of time optimizing this script. I wanted to join the interest rate data onto the CBOE data with minimal effort, and the script reflects some of this laziness. For example, I make three repeated calls to the Pandas read_csv command, when that could have been loaded into a simple function.

I needed to clean some of the files, so I did write a function for that task. But it only really requires that I replace a missing value with some substitute rate. That is accomplished, ultimately, by averaging the rates from surrounding dates.

In other words, I'm including this script for transparency but it is not my best work.

In [29]:
# Import modules
import os
import pandas as pd


# =============================================================================
# Functions
# =============================================================================
def clean_ir_file(input_file, output_file):
    """
    """
    with open(output_file, 'wt') as outfile:
        with open(input_file, 'r') as infile:
            for i, line in enumerate(infile):
                if (i == 0):
                    outfile.write(line.lower())
                else:
                    date, rate = line.rstrip().split(sep=',')
                    if (rate in ['.', '']):
                        rate = '0.00'
                    line = ''.join(date + ',' + rate + '\n')
                    outfile.write(line)


# =============================================================================
# Set-up
# =============================================================================

# Alter pandas display settings
pd.set_option('display.max_rows', 200)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 500)

# Location of the Fred interest rate data
fred_dir = r'/Users/alexstephens/data/fred/IRTCM_csv_2/data'

# Define the output interest rate pickle file
ir_pkl = r'fred_ir_01M_03M_01Y.pkl'
ir_pkl_file = os.path.join(fred_dir, ir_pkl)

A minor innovation here. Trying the use of the zip() generator to zip the input and output files together.

In [30]:
# =============================================================================
# Read / clean the data
# =============================================================================

# Define input / output interest rate files
input_file_list = ['DGS1MO.csv', 'DGS3MO.csv', 'DGS1.csv']
output_file_list = ['DGS1MO_clean.csv', 'DGS3MO_clean.csv', 'DGS1_clean.csv']

# Clean the files
for in_item, out_item in zip(input_file_list, output_file_list):
    in_file = os.path.join(fred_dir, in_item)
    out_file = os.path.join(fred_dir, out_item)
    clean_ir_file(in_file, out_file)

In this step, I read the cleaned interest rate data into distinct dataframes -- one for each tenor of interest (1M, 3M, & 1Y)

In [31]:
# =============================================================================
# Read data frames
# =============================================================================

# Read 1M rates
dgs1mo = pd.read_csv(os.path.join(fred_dir, 'DGS1MO_clean.csv'),
                     skiprows=0,
                     delimiter=",",
                     header=0,
                     names=['date', 'R01M'],
                     dtype={'date': str, 'R01M': float},
                     parse_dates=['date'],
                     infer_datetime_format=True,)
dgs1mo = dgs1mo.assign(R01M=(dgs1mo.R01M/100))

# Read 3M rates
dgs3mo = pd.read_csv(os.path.join(fred_dir, 'DGS3MO_clean.csv'),
                     skiprows=0,
                     delimiter=",",
                     header=0,
                     names=['date', 'R03M'],
                     dtype={'date': str, 'R03M': float},
                     parse_dates=['date'],
                     infer_datetime_format=True,)
dgs3mo = dgs3mo.assign(R03M=(dgs3mo.R03M/100))

# Read 1Y rates
dgs1yr = pd.read_csv(os.path.join(fred_dir, 'DGS1_clean.csv'),
                     skiprows=0,
                     delimiter=",",
                     header=0,
                     names=['date', 'R01Y'],
                     dtype={'date': str, 'R01Y': float},
                     parse_dates=['date'],
                     infer_datetime_format=True,)
dgs1yr = dgs1yr.assign(R01Y=(dgs1yr.R01Y/100))

Doing a left join of the 1M and 3M data onto the 1Y data.

In [32]:
# =============================================================================
# Merge the interest rate (ir) data into a single data frame (df)
# =============================================================================

df_ir = dgs1yr
df_ir = df_ir.merge(dgs3mo, how='left', left_on='date', right_on='date')
df_ir = df_ir.merge(dgs1mo, how='left', left_on='date', right_on='date')
df_ir = df_ir[['date', 'R01M', 'R03M', 'R01Y']]
In [33]:
# =============================================================================
# Do simple interpolation to clean missing rates
# =============================================================================

# Indentify dates where all rates are zero
zero_idx = ((df_ir.R01M == 0.00) & (df_ir.R03M == 0.00) & (df_ir.R01Y == 0.00))

# Compute a simple average of (i-1) and (i+1) observations to
# create a way to fill-in zeros
df_ir['R01M_avg'] = 0.5*(df_ir.R01M.shift(-1) + df_ir.R01M.shift(+1))
df_ir['R03M_avg'] = 0.5*(df_ir.R03M.shift(-1) + df_ir.R03M.shift(+1))
df_ir['R01Y_avg'] = 0.5*(df_ir.R01Y.shift(-1) + df_ir.R01Y.shift(+1))

# Create the cleaned rate vectors
df_ir.loc[zero_idx, 'R01M_cln'] = df_ir['R01M_avg']
df_ir.loc[~zero_idx, 'R01M_cln'] = df_ir['R01M']

df_ir.loc[zero_idx, 'R03M_cln'] = df_ir['R03M_avg']
df_ir.loc[~zero_idx, 'R03M_cln'] = df_ir['R03M']

df_ir.loc[zero_idx, 'R01Y_cln'] = df_ir['R01Y_avg']
df_ir.loc[~zero_idx, 'R01Y_cln'] = df_ir['R01Y']
In [34]:
# =============================================================================
# Define a date string
# =============================================================================

# quote_date string
qd_dd = [str(day) if (day > 9) else ('0' + str(day))
         for day in df_ir['date'].dt.day]

qd_mm = [str(month) if (month > 9) else ('0' + str(month))
         for month in df_ir['date'].dt.month]

qd_yy = df_ir['date'].dt.year.map(str)

# append a date key string to the data frame
qd_str = (qd_yy + qd_mm + qd_dd)
df_ir = df_ir.assign(qd_str=list(qd_str))

For some reason the FRED interest rate file has not been updated since 2019-01-13 10:54 AM CST. That is ok, since the last observation in our data occurs in July 2018.

In [35]:
# Echo the first four rows (transposed)
df_ir.tail(4)
Out[35]:
date R01M R03M R01Y R01M_avg R03M_avg R01Y_avg R01M_cln R03M_cln R01Y_cln qd_str
14874 2019-01-07 0.0242 0.0245 0.0258 0.0240 0.02440 0.02585 0.0242 0.0245 0.0258 20190107
14875 2019-01-08 0.0240 0.0246 0.0260 0.0241 0.02450 0.02585 0.0240 0.0246 0.0260 20190108
14876 2019-01-09 0.0240 0.0245 0.0259 0.0241 0.02445 0.02595 0.0240 0.0245 0.0259 20190109
14877 2019-01-10 0.0242 0.0243 0.0259 NaN NaN NaN 0.0242 0.0243 0.0259 20190110
In [36]:
# Describe the data
df_ir[['R01M_cln', 'R03M_cln', 'R01Y_cln']].describe()
Out[36]:
R01M_cln R03M_cln R01Y_cln
count 4553.000000 9659.000000 14878.000000
mean 0.012513 0.038119 0.049490
std 0.015096 0.031578 0.034688
min 0.000000 0.000000 0.000000
25% 0.000600 0.007600 0.022100
50% 0.007700 0.038500 0.050200
75% 0.018100 0.058100 0.069100
max 0.052700 0.154900 0.173100
In [37]:
# Any instances where all three rates are simultaneously zero? (Ans. No)
df_ir[(df_ir.R01M_cln == 0.0) & (df_ir.R03M_cln == 0.0) & (df_ir.R01Y_cln == 0.0)]
Out[37]:
date R01M R03M R01Y R01M_avg R03M_avg R01Y_avg R01M_cln R03M_cln R01Y_cln qd_str

Are there cases where the individual rates are zero? The answer is 'yes'. Those zero rates occur in the aftermath of the credit crisis, and they occur most frequently in the 1M tenor. This is consistent with the actions of the FRB following the marked meltdown.

In [38]:
df_ir.loc[(df_ir.R01M_cln == 0.0) & (df_ir.date.dt.year > 2003),['date']].describe()
Out[38]:
date
count 90
unique 90
top 2015-10-06 00:00:00
freq 1
first 2008-12-10 00:00:00
last 2015-10-15 00:00:00
In [39]:
df_ir.loc[(df_ir.R03M_cln == 0.0) & (df_ir.date.dt.year > 2003),['date']].describe()
Out[39]:
date
count 19
unique 19
top 2015-10-14 00:00:00
freq 1
first 2008-12-10 00:00:00
last 2015-10-22 00:00:00
In [40]:
df_ir.loc[(df_ir.R01Y_cln == 0.0) & (df_ir.date.dt.year > 2003),['date']].describe()
Out[40]:
date
count 0
unique 0
In [41]:
# =============================================================================
# Save the results as a pickle
# =============================================================================
df_ir = df_ir[['date', 'qd_str', 'R01M_cln', 'R03M_cln', 'R01Y_cln']]
df_ir.to_pickle(ir_pkl_file)

No comments:

Post a Comment