I was recently chatting with a buddy about investing in real estate or stocks and which was better. I decided to use Python to analyze what that investment would look like. I used data from FRED and yfinance which is one of my favorite sources for historical data. Although they do not offer smaller timeframes.
![](https://jeremywhittaker.com/wp-content/uploads/2022/01/fred-python-1.png)
The first thing I’m going to do is set the dates I’m interested in.
beginning_date = '1900-01-01'
date_today = datetime.now()
end_date = date_today
I’m not going to use FRED and YFinance which are two Python plugins to download historical data.
df = yf.download('^gspc',beginning_date,date_today,progress=True).drop(columns=['Open','High','Low','Close','Volume']).rename(columns={'Adj Close':'SP500'})
df = df.rename(columns={'Adj Close':'SP500'})
df['MSPUS'] = pdr.DataReader('MSPUS','fred',beginning_date,date_today)
df = df.rename(columns={'MSPUS':'Median_House_price'})
The stock market index symbol from Yahoo is ^gspc as you can see in the code above. The median house price is MSPUS from FRED.
df = df.ffill(axis=0)
df['Median_House_price_pct_change'] = df['Median_House_price'].pct_change()*100
df['SP500_pct_change'] = df['SP500'].pct_change()*100
df = df.dropna()
df['Median_House_price_pct_change_cumsum'] = df['Median_House_price_pct_change'].cumsum()
df['SP500_pct_change_cumsum'] = df['SP500_pct_change'].cumsum()
df = df.dropna()
df['spread'] = df['SP500_pct_change_cumsum'].sub(df['Median_House_price_pct_change_cumsum'])
df.tail(20)
The Median house price data is quarterly so I’m using ffill to forward fill the data.
What I’m essentially doing next is creating columns in my DataFrame for the following:
- Percent change of asset * 100 to get the return as a percentage.
- Cumultively summing all of the returns to get a percentage of return from a specific date.
- Calculating the rolling spread between the two assets.
My DataFrame now looks like this.
![](https://jeremywhittaker.com/wp-content/uploads/2022/01/image-1-1024x358.png)
If I plot the cumulative sum I get this.
fig1 = px.line(df, x=df.index, y='Median_House_price_pct_change_cumsum', color_discrete_sequence=['red'])
fig2 = px.line(df, x=df.index, y='SP500_pct_change_cumsum', color_discrete_sequence=['blue'])
fig3 = go.Figure(data=fig1.data + fig2.data)
fig3.show()
![](https://jeremywhittaker.com/wp-content/uploads/2022/01/image-2-1024x304.png)
This shows stocks overall since 1963 have outperformed housing.
fig1 = px.line(df, x=df.index, y='spread', color_discrete_sequence=['red'])
fig1.show()
This code actually plots the spread between the two assets.
![](https://jeremywhittaker.com/wp-content/uploads/2022/01/image-3-1024x297.png)
Now I’m going to create some more columns in my DataFrame
df_annual=pd.DataFrame()
df_annual['SP500_pct_change'] = df['SP500_pct_change'].resample('1A').sum()
df_annual['Median_House_price_pct_change'] = df['Median_House_price_pct_change'].resample('1A').sum()
df_annual['SP500_pct_change_cumsum'] = df_annual['SP500_pct_change'].cumsum()
df_annual['Median_House_price_pct_change_cumsum'] = df_annual['Median_House_price_pct_change'].cumsum()
df_annual.dropna(inplace=True)
df_annual['YoY_spread'] = df_annual['SP500_pct_change'].sub(df_annual['Median_House_price_pct_change'])
df_annual['cumulative_spread'] = df_annual['SP500_pct_change_cumsum'].sub(df_annual['Median_House_price_pct_change_cumsum'])
df_annual.drop(df_annual.index[-1],inplace=True)
What I’m doing here is resampling my data to annual as it is broken down currently by quarter. The main difference in this code is I’m computing an annual year-over-year return of S&P 500 vs. median house price in the United States. Here is the new DataFrame header.
![](https://jeremywhittaker.com/wp-content/uploads/2022/01/image-4-1024x272.png)
I’m going to replot this data using the annualized data. Should be almost identical to my first chart.
fig1 = px.line(df_annual, x=df_annual.index, y='Median_House_price_pct_change_cumsum', color_discrete_sequence=['red'])
fig2 = px.line(df_annual, x=df_annual.index, y='SP500_pct_change_cumsum', color_discrete_sequence=['blue'])
fig3 = go.Figure(data=fig1.data + fig2.data)
fig3.show()
![](https://jeremywhittaker.com/wp-content/uploads/2022/01/image-5-1024x292.png)
And finally, I’m computing the annualized year-over-year spread between S&P 500 and median housing prices. This gives a better idea of the annual returns as opposed to the cumulative returns. Where the bar graph is above 0 indicates the stock market outperformed real estate that year. If the bar is below 0 that indicates the real estate market outperformed the stock market.
![](https://jeremywhittaker.com/wp-content/uploads/2022/01/image-6-1024x406.png)
And finally the cumulative spread.
![](https://jeremywhittaker.com/wp-content/uploads/2022/01/image-7-1024x307.png)