Getting St. Louis FRED Data in Google Colab for Python Analysis

At times when creating trading strategies using big data you need access to historical economic data. One of the best sources of data is the Economic Research branch of the St. Louis Federal Reserve or FRED. Today I’m going to show you how to pull that data into a dataframe so that you can analyze it using machine learning or AI.

The first step is to import Pandas datareader. What this piece of code does is it downloads all the data for Corporate AAA bond yields. Every dataset in FRED has a symobl. In this case it’s DAAA.

import pandas_datareader.data as web
import datetime

today = pd.to_datetime("today")
start = datetime.datetime(1900, 1, 1)
end = today

df_Corp_AAA_yield = web.DataReader(['DAAA'], 'fred', start, end)

# not working - Corp_AAA_yield = web.DataReader(['DAAA'], 'fred', start, end)
df_Corp_AAA_yield.head()
df_Corp_AAA_yield.tail()

We can now visualize our dataframe by plotting it.

df_Corp_AAA_yield.plot(grid=True)

Normalizing Stock data for Machine Learning

When analyzing historical time frame data in machine learning it needs to be normalized. In this code example I will show how to get S&P data then convert it to a percent of daily increase/decrease as well as a logarithmic daily increase/decrease.

Colab can be found here – https://colab.research.google.com/drive/1Wvh7mRX0PUthBhXw1HD5tJAC8mQr6DYH?usp=sharing

The first part of this code will use yfinance as our datasource.

#we're going to use yfinance as our data source
!pip install yfinance --upgrade --no-cache-dir

import pandas as pd
import numpy as np
import yfinance as yf

Next we’re going to create a dataframe called df and download SPY data from 2000 to it. Finally we’ll print the result of df so you can get an idea of what is inside of it.

#Here we're creating a dataframe for spy data from 2000-current
df = yf.download('spy',
  start='2000-01-01',
  end='2020-08-21',
  progress=True)
  #dauto_adjust=True,
  #actions='inline',) #adjust for stock splits and dividends
#print the dataframe to see what lives in it
print(df)

We’re going to drop all the columns except Adj Close. Then we’ll rename it adj_close. Next we’ll create a column labeled simple_rtn. This is the daily simple return or percent increase/decrease. The next line of code creates a logarithmic increase/decrease. Logarithmic gives equal bearing to the Y axis and can be defined as follows, “A logarithmic price scale uses the percentage of change to plot data points, so, the scale prices are not positioned equidistantly. A linear price scale uses an equal value between price scales providing an equal distance between values.”

#only keep adj close
df = df.loc[:, ['Adj Close']]
df.rename(columns={'Adj Close':'adj_close'}, inplace=True)
#create column simple return
df['simple_rtn'] = df.adj_close.pct_change()
#create column logrithmic returns
df['log_rtn'] = np.log(df.adj_close/df.adj_close.shift(1))
print(df)

This next command just analyzes the data so you can spot check what you’ve created.

#here we can analyze our data
df.info()

This next section describes what the daily increase/decrease of the SPY looks like. You can see didn’t statistically relevant information about S&P here.

#get statistical data on the data frame
df.describe()

Next we can see a distribution of adjustable close, logarithmic return, and simple return.

#view chart of data to get an overview of what lives in the data
import matplotlib.pyplot as plt
df.hist(bins=50, figsize=(20,15))
plt.show()

This is all for data normalization. You can now apply different algorithmic analysis to the data.