Retrieving historical data from FRED or Yahoo Finance using Python

If you spend a lot of time analyzing big data then data retrieval is paramount. I’ve written some Python code that I use daily to retrieve data from Federal Reserve Bank of St. Louis and finance.

The first part of this code simply mounts Google drive so you can interface files on your local computer.

working_folder = 'M2 Correlation'

# <----- Make sure to mount drive in Colab first 

import os
import sys
if 'google.colab' in str(get_ipython()):
  print('Running on CoLab')
  root_dir='/content/drive/My Drive/Colab Notebooks/' + working_folder
  if os.path.isdir(root_dir):
    %cd $root_dir
    print('Check your working_folder or if Google drive is mounted')

  %cd $root_dir

  from IPython.display import clear_output 
  print('Colab code has been executed')
  print('Not running on Colab')

Next, we’re going to install some dependencies.

!pip install yfinance
!pip install pandas-datareader

Next, we’ll import some functions

import yfinance as yf
import pandas_datareader as pdr
import pandas as pd
from datetime import datetime, timedelta
from time import time
from tqdm import tqdm
from pathlib import Path
import plotly.graph_objects as go
import as px
import logging
import sys
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
import os

In this next part of our code, we will specify which assets we want data on from FRED and the beginning and end date. Keep in mind some data will not go back as far as you specify. So you’ll have to deal with these blank rows at a later point in the data analysis.

beginning_date = '1900-01-01'
date_today =
end_date = date_today

fred_assets = {
    # 'global_fish':'PSALMUSDA', causing issues

#potential regressors of m2

#potential regressors


yfinance_assets = {
      'SP 500':'^GSPC',
      'Dow 30':'^DJI',
      'Cboe UK 100':'^BUK100P',
      'Russell 2000':'^RUT',
      'CBOE Volatility Index':'^VIX',
      'FTSE 100':'^FTSE',
      'CAC 40':'^FCHI',
      'ESTX 50 PR.EUR':'^STOXX50E',
      'Euronext 100 Index':'^N100',
      'BEL 20':'^BFX',
      'MOEX Russia Index':'IMOEX.ME',
      'Nikkei 225':'^N225',
      'SSE Composite Index':'000001.SS',
      'Shenzhen Component':'399001.SZ',
      'STI Index':'^STI',
      'SP ASX 200':'^AXJO',
      'Jakarta Composite Index':'^JKSE',
      'FTSE Bursa Malaysia KLCI':'^KLSE',
      'SP NZX 50 INDEX GROSS':'^NZ50',
      'KOSPI Composite Index':'^KS11',
      'TSEC weighted index':'^TWII',
      'SP TSX Composite index':'^GSPTSE',
      'IPC MEXICO':'^MXX',
      'SP CLX IPSA':'^IPSA',
      'EGX 30 Price Return Index':'^CASE30',
      'Top 40 USD Net TRI Index':'^JN0U.JO',


      '10-Year T-Note Futures,Mar-2022':'ZN=F',
      'E-Mini SP 500 Mar 22':'ES=F',
      'Five-Year US Treasury Note Futu':'ZF=F',
      'Nasdaq 100 Mar 22':'NQ=F',
      'Crude Oil':'CL=F',
      '2-Year T-Note Futures,Mar-2022':'ZT=F',
      'U.S. Treasury Bond Futures,Mar-':'ZB=F',
      'Natural Gas Mar 22':'NG=F',
      'E-mini Russell 2000 Index Futur':'RTY=F',
      'Mini Dow Jones Indus.-$5 Mar 22':'YM=F',
      'Corn Futures,May-2022':'ZC=F',
      'Sugar #11 May 22':'SB=F',
      'Soybean Futures,May-2022':'ZS=F',
      'Copper Mar 22':'HG=F',
      'RBOB Gasoline Mar 22':'RB=F',
      'Heating Oil Mar 22':'HO=F',
      'Soybean Oil Futures,May-2022':'ZL=F',
      'Brent Crude Oil Last Day Financ':'BZ=F',
      'Soybean Meal Futures,May-2022':'ZM=F',
      'Micro Gold Futures,Apr-2022':'MGC=F',
      'Cocoa May 22':'CC=F',
      'KC HRW Wheat Futures,Mar-2022':'KE=F',
      'Coffee May 22':'KC=F',
      'Lean Hogs Futures,Apr-2022':'HE=F',
      'Cotton May 22':'CT=F',
      'Platinum Apr 22':'PL=F',
      'Live Cattle Futures,Apr-2022':'LE=F',
      'Feeder Cattle Futures,Mar-2022':'GF=F',
      'Micro Silver Futures,Mar-2022':'SIL=F',
      'Palladium Mar 22':'PA=F',
      'Orange Juice May 22':'OJ=F',
      'Rough Rice Futures,May-2022':'ZR=F',
      'Oat Futures,May-2022':'ZO=F',
      'Random Length Lumber Futures':'LBS=F',
      'Mont Belvieu LDH Propane (OPIS)':'B0=F',

# #etf
      'SPDR SP 500 ETF Trust':'SPY',
      'iShares Core SP 500 ETF':'IVV',
      'Vanguard Total Stock Market ETF':'VTI',
      'Vanguard SP 500 ETF':'VOO',
      'Invesco QQQ Trust':'QQQ',
      'Vanguard FTSE Developed Markets ETF':'VEA',
      'iShares Core MSCI EAFE ETF':'IEFA',
      'Vanguard Value ETF':'VTV',
      'iShares Core U.S. Aggregate Bond ETF':'AGG',
      'Vanguard Total Bond Market ETF':'BND',
      'Vanguard FTSE Emerging Markets ETF':'VWO',
      'Vanguard Growth ETF':'VUG',
      'iShares Core MSCI Emerging Markets ETF':'IEMG',
      'iShares Core SP Small-Cap ETF':'IJR',
      'iShares Russell 1000 Growth ETF':'IWF',
      'Vanguard Dividend Appreciation ETF':'VIG',
      'iShares Core SP Mid-Cap ETF':'IJH',
      'iShares Russell 2000 ETF':'IWM',
      'SPDR Gold Shares':'GLD',
      'iShares Russell 1000 Value ETF':'IWD',
      'iShares MSCI EAFE ETF':'EFA',
      'Vanguard Total International Stock ETF':'VXUS',
      'Vanguard Mid-Cap ETF':'VO',
      'Vanguard Information Technology ETF':'VGT',
      'Financial Select Sector SPDR Fund':'XLF',
      'Vanguard Total International Bond ETF':'BNDX',
      'Vanguard Intermediate-Term Corporate Bond ETF':'VCIT',
      'Vanguard Real Estate ETF':'VNQ',
      'Technology Select Sector SPDR Fund':'XLK',
      'Vanguard Small Cap ETF':'VB',
      'iShares Core SP Total U.S. Stock Market ETF':'ITOT',
      'Vanguard High Dividend Yield Index ETF':'VYM',
      'Vanguard Short-Term Bond ETF':'BSV',
      'Vanguard Short-Term Corporate Bond ETF':'VCSH',
      'iShares TIPS Bond ETF':'TIP',
      'iShares iBoxx $ Investment Grade Corporate Bond ETF':'LQD',
      'Vanguard FTSE All-World ex-US Index Fund':'VEU',
      'iShares SP 500 Growth ETF':'IVW',
      'Health Care Select Sector SPDR Fund':'XLV',
      'Energy Select Sector SPDR Fund':'XLE',
      'Schwab U.S. Large-Cap ETF':'SCHX',
      'Schwab US Dividend Equity ETF':'SCHD',
      'Invesco SP 500® Equal Weight ETF':'RSP',
      'iShares Core MSCI Total International Stock ETF':'IXUS',
      'SPDR Dow Jones Industrial Average ETF Trust':'DIA',
      'iShares Russell 1000 ETF':'IWB',
      'iShares Gold Trust':'IAU',
      'iShares MSCI Emerging Markets ETF':'EEM',
      'iShares Russell Midcap ETF':'IWR',
      'Schwab International Equity ETF':'SCHF',
      'iShares MSCI USA Min Vol Factor ETF':'USMV',
      'Vanguard Large Cap ETF':'VV',
      'Vanguard Small Cap Value ETF':'VBR',
      'iShares National Muni Bond ETF':'MUB',
      'iShares MBS ETF':'MBB',
      'Vanguard Total World Stock ETF':'VT',
      'iShares SP 500 Value ETF':'IVE',
      'iShares ESG Aware MSCI USA ETF':'ESGU',
      'iShares 1-5 Year Investment Grade Corporate Bond ETF':'IGSB',
      'iShares Core Dividend Growth ETF':'DGRO',
      'iShares MSCI USA Quality Factor ETF':'QUAL',
      'Schwab U.S. Broad Market ETF':'SCHB',
      'Vanguard FTSE Europe ETF':'VGK',
      'iShares 1-3 Year Treasury Bond ETF':'SHY',
      'Schwab U.S. TIPS ETF':'SCHP',
      'SPDR SP Dividend ETF':'SDY',
      'iShares Select Dividend ETF':'DVY',
      'Consumer Discretionary Select Sector SPDR Fund':'XLY',
      'Vanguard Short-Term Inflation-Protected Securities ETF':'VTIP',
      'SPDR SP Midcap 400 ETF Trust':'MDY',
      'iShares Preferred Income Securities ETF':'PFF',
      'ProShares UltraPro QQQ':'TQQQ',
      'iShares J.P. Morgan USD Emerging Markets Bond ETF':'EMB',
      'JPMorgan Ultra-Short Income ETF':'JPST',
      'iShares iBoxx $ High Yield Corporate Bond ETF':'HYG',
      'iShares MSCI ACWI ETF':'ACWI',
      'Industrial Select Sector SPDR Fund':'XLI',
      'iShares 7-10 Year Treasury Bond ETF':'IEF',
      'iShares Core Total USD Bond Market ETF':'IUSB',
      'iShares MSCI EAFE Value ETF':'EFV',
      'iShares U.S. Treasury Bond ETF':'GOVT',
      'Vanguard Health Care ETF':'VHT',
      'Vanguard Mid-Cap Value ETF':'VOE',
      'iShares 20+ Year Treasury Bond ETF':'TLT',
      'Schwab U.S. Small-Cap ETF':'SCHA',
      'Schwab U.S. Large-Cap Growth ETF':'SCHG',
      'iShares MSCI USA Value Factor ETF':'VLUE',
      'Vanguard Mortgage-Backed Securities ETF':'VMBS',
      'Vanguard Extended Market ETF':'VXF',
      'iShares Russell 2000 Value ETF':'IWN',
      'Consumer Staples Select Sector SPDR Fund':'XLP',
      'Vanguard Tax-Exempt Bond ETF':'VTEB',
      'iShares Russell Mid-Cap Value ETF':'IWS',
      'SPDR Bloomberg 1-3 Month T-Bill ETF':'BIL',
      'Dimensional U.S. Core Equity 2 ETF':'DFAC',
      'iShares MSCI EAFE Small-Cap ETF':'SCZ',
      'Vanguard Intermediate-Term Bond ETF':'BIV',
      'Vanguard Short-Term Treasury ETF':'VGSH',
      'Vanguard Small Cap Growth ETF':'VBK',
      'iShares Short Treasury Bond ETF':'SHV',

# stocks
      'Berkshire Hathaway':'brk-a',
      'Johnson and Johnson':'jnj',

Retrieving code from yfinance is slightly different so we’ll specify those assets next. You can download almost any stock, index, or ETF you want from Yahoo Finance.

yfinance_assets = {

# #etf
      'etf_djia':'DIA', #mega
      'etf_large_cap':'SPY', #large cap
      'etf_mid_cap':'IJH', #mid cap
      'etf_small_cap':'IWO', #small cap
      'etf_mall_cap_value':'VBR', #small cap value
      'etf_small_cap_growth':'VBK', #small cap growth
      'etf_micro_cap':'IWM', #micro cap
      'etf_emerging_markets':'VEIEX', #emerging markets
      'etf_reit':'IYR', #reit 
      'etf_high_yield_bonds':'HYG', #high yield bonds
      'etf_investment_grade_bonds':'LQD', #investment grade bonds
      'etf_gld':'GLD', #gold etf
      'etf_nasdaq':'QQQ', #nasdaq etf
      'etf_total_market':'VTI', #etf total market


I don’t like downloading all my data every day. So this function I wrote backs up your dataframe.

def save_df(temp_df, filename):

This section of code will not only plot the asset so you can visually diagnose it but it will save it to your Google drive in a subfolder.

def plot_and_save(df_temp,folder_name):
  for col in df_temp.columns:
    df_temp2 = pd.DataFrame()
    df_temp2 = df_temp[f'{col}']
    df_temp2.dropna(axis='rows', inplace=True)
    fig = px.line(df_temp2, x=df_temp2.index, y=f'{col}', color_discrete_sequence=['red'])
    #comment out to save memory

Here’s the code to retrieve all your FRED data.

def get_fred_prices(assets):
  df_temp = pd.DataFrame(index=pd.date_range(start=beginning_date,end=end_date,freq='D'))
  # df_yearly = pd.DataFrame(index=pd.date_range(start=beginning_date,end=end_date,freq='A'))
  for key, value in assets.items():
    print('Getting {} with symbol {}'.format(key, value))
      df_temp[f'{key}'] = pdr.DataReader(f'{value}','fred',beginning_date,date_today)
      print('Error getting {}'.format(value))

Here is the code to retrieve all your yfinance data.

def get_yfinance_prices(assets):
  df_temp = pd.DataFrame(index=pd.date_range(start=beginning_date,end=end_date,freq='D'))
  for key, value in assets.items():
    print('Getting {} with symbol {}'.format(key, value))
      df_temp[f'{key}'] ='{value}',beginning_date,date_today,progress=True).drop(columns=['Open','High','Low','Close','Volume'])
      print('Error getting {}'.format(value))

And finally, a function to bring all your data together.

def get_asset_prices(filename):
  df_fred = get_fred_prices(fred_assets)
  df_yfinance = get_yfinance_prices(yfinance_assets)
  df = pd.concat([df_fred, df_yfinance], axis=1)

This is the code that will actually retrieve your data. It will also drop all the lines that are completely blank. Then it saves the dataframe and proceeds with plotting the data

df = get_asset_prices('asset_prices')
df.dropna(how='all', inplace=True)
save_df(df, 'asset_prices')
df = pd.read_pickle("asset_prices.pkl")

Here is a sample of one of the downloads:

Now that we’ve downloaded the data we want a visualization of the data so we can see chunks of missing data. missingno is a great tool for that. Keep in mind I forward filled the assets in a previous function. What this means is that if the stock market closes on Friday it will show the price of the asset to be the same on Saturday and Sunday.

So think of this chart below as the left being today and the right being 1900. You can see some data from these sources is only available for short time frames, shorter the bar. In this example, you can see there is complete data for producer price index for commodities.

import missingno as msno

Let’s take a quick look at the first few rows of the data below. You can see most of the data from 1913 is NaN. This means it doesn’t exist in your dataframe. I will talk about cutting out this data later so you don’t have a bunch of empty datasets.


And finally, we’ll look at the information of the dataframe

That’s it you’ve downloaded your asset prices and they are all contained in one dataframe and ready for further modification and analysis.

Do the poor own stocks?

I’m sitting here doing some completely unrelated research and I stumbled across this figure which I’ve seen multiple times in the past. It has always surprised me. As the markets become extremely volatile this month I wonder, what is the impact on the bottom 50% of Americans? If you look at this chart it barely even registers when stacked with other assets. Real estate is their primary asset.

What’s interesting is if you look at this breakdown by percentile historically. The top 10% own 89% of all equities. While the entire bottom 50% own just .6%.

Now let’s take a look at the top 50% vs. the bottom 50%. We already know the bottom 50% of Americans own just .6% of equities. The top 50% own 99.4% of the balance of equities.

If you’re looking for the source of political turmoil in our country I don’t think we need to look much further than the wealth gap. Half of our population has no vested interest in corporations in America. I can’t imagine turmoil subsides any time soon for our country until we address the wealth gap.

What’s a better investment stocks or real estate? Let’s write some Python code.

I was recently chatting with a buddy about investing in real estate or stocks and which was better. I decided to use Python to analyze what that investment would look like. I used data from FRED and yfinance which is one of my favorite sources for historical data. Although they do not offer smaller timeframes.

The first thing I’m going to do is set the dates I’m interested in.

beginning_date = '1900-01-01'
date_today =
end_date = date_today

I’m not going to use FRED and YFinance which are two Python plugins to download historical data.

df ='^gspc',beginning_date,date_today,progress=True).drop(columns=['Open','High','Low','Close','Volume']).rename(columns={'Adj Close':'SP500'})
df = df.rename(columns={'Adj Close':'SP500'})

df['MSPUS'] = pdr.DataReader('MSPUS','fred',beginning_date,date_today)
df = df.rename(columns={'MSPUS':'Median_House_price'})

The stock market index symbol from Yahoo is ^gspc as you can see in the code above. The median house price is MSPUS from FRED.

df = df.ffill(axis=0)

df['Median_House_price_pct_change'] = df['Median_House_price'].pct_change()*100
df['SP500_pct_change'] = df['SP500'].pct_change()*100
df = df.dropna()

df['Median_House_price_pct_change_cumsum'] = df['Median_House_price_pct_change'].cumsum()
df['SP500_pct_change_cumsum'] = df['SP500_pct_change'].cumsum()
df = df.dropna()

df['spread'] = df['SP500_pct_change_cumsum'].sub(df['Median_House_price_pct_change_cumsum'])


The Median house price data is quarterly so I’m using ffill to forward fill the data.

What I’m essentially doing next is creating columns in my DataFrame for the following:

  • Percent change of asset * 100 to get the return as a percentage.
  • Cumultively summing all of the returns to get a percentage of return from a specific date.
  • Calculating the rolling spread between the two assets.

My DataFrame now looks like this.

If I plot the cumulative sum I get this.

fig1 = px.line(df, x=df.index, y='Median_House_price_pct_change_cumsum', color_discrete_sequence=['red'])
fig2 = px.line(df, x=df.index, y='SP500_pct_change_cumsum', color_discrete_sequence=['blue'])

fig3 = go.Figure( +

This shows stocks overall since 1963 have outperformed housing.

  fig1 = px.line(df, x=df.index, y='spread', color_discrete_sequence=['red'])

This code actually plots the spread between the two assets.

Now I’m going to create some more columns in my DataFrame

df_annual['SP500_pct_change'] = df['SP500_pct_change'].resample('1A').sum()
df_annual['Median_House_price_pct_change'] = df['Median_House_price_pct_change'].resample('1A').sum()
df_annual['SP500_pct_change_cumsum'] = df_annual['SP500_pct_change'].cumsum()
df_annual['Median_House_price_pct_change_cumsum'] =  df_annual['Median_House_price_pct_change'].cumsum()
df_annual['YoY_spread'] = df_annual['SP500_pct_change'].sub(df_annual['Median_House_price_pct_change'])
df_annual['cumulative_spread'] = df_annual['SP500_pct_change_cumsum'].sub(df_annual['Median_House_price_pct_change_cumsum'])


What I’m doing here is resampling my data to annual as it is broken down currently by quarter. The main difference in this code is I’m computing an annual year-over-year return of S&P 500 vs. median house price in the United States. Here is the new DataFrame header.

I’m going to replot this data using the annualized data. Should be almost identical to my first chart.

fig1 = px.line(df_annual, x=df_annual.index, y='Median_House_price_pct_change_cumsum', color_discrete_sequence=['red'])
fig2 = px.line(df_annual, x=df_annual.index, y='SP500_pct_change_cumsum', color_discrete_sequence=['blue'])

fig3 = go.Figure( +

And finally, I’m computing the annualized year-over-year spread between S&P 500 and median housing prices. This gives a better idea of the annual returns as opposed to the cumulative returns. Where the bar graph is above 0 indicates the stock market outperformed real estate that year. If the bar is below 0 that indicates the real estate market outperformed the stock market.

And finally the cumulative spread.