When analyzing historical time frame data in machine learning it needs to be normalized. In this code example, I will show how to get S&P data then convert it to a percent of daily increase/decrease as well as a logarithmic daily increase/decrease.
The first part of this code will use yfinance as our datasource.
#we're going to use yfinance as our data source
!pip install yfinance --upgrade --no-cache-dir
import pandas as pd
import numpy as np
import yfinance as yf
Next, we’re going to create a dataframe called df and download SPY data from 2000 to it.
#Here we're creating a dataframe for spy data from 2000-current
df = yf.download('spy',
start='2000-01-01',
end='2020-08-21',
progress=True)
#dauto_adjust=True,
#actions='inline',) #adjust for stock splits and dividends
#print the dataframe to see what lives in it
print(df)
Finally, we’ll print the result of df so you can get an idea of what is inside of it.
![](https://jeremywhittaker.com/wp-content/uploads/2020/08/image-7.png)
We’re going to drop all the columns except Adj Close. Then we’ll rename it adj_close. Next, we’ll create a column labeled simple_rtn. This is the daily simple return or percent increase/decrease. The next line of code creates a logarithmic increase/decrease. Logarithmic gives equal bearing to the Y-axis and can be defined as follows, “A logarithmic price scale uses the percentage of change to plot data points, so, the scale prices are not positioned equidistantly. A linear price scale uses an equal value between price scales providing an equal distance between values.”
#only keep adj close
df = df.loc[:, ['Adj Close']]
df.rename(columns={'Adj Close':'adj_close'}, inplace=True)
#create column simple return
df['simple_rtn'] = df.adj_close.pct_change()
#create column logrithmic returns
df['log_rtn'] = np.log(df.adj_close/df.adj_close.shift(1))
print(df)
![](https://jeremywhittaker.com/wp-content/uploads/2020/08/image-8.png)
This next command just analyzes the data so you can spot-check what you’ve created.
#here we can analyze our data
df.info()
![](https://jeremywhittaker.com/wp-content/uploads/2020/08/image-9.png)
This next section describes what the daily increase/decrease of the SPY looks like. You can see statistically relevant information about S&P here.
#get statistical data on the data frame
df.describe()
![](https://jeremywhittaker.com/wp-content/uploads/2020/08/image-10.png)
Next, we can see a distribution of adjustable close, logarithmic return, and simple return.
#view chart of data to get an overview of what lives in the data
import matplotlib.pyplot as plt
df.hist(bins=50, figsize=(20,15))
plt.show()
![](https://jeremywhittaker.com/wp-content/uploads/2020/08/image-11-1024x724.png)
This is all for data normalization. You can now apply different algorithmic analyses to the data.