In a previous post, I showed how to store symbol data from EODHD. The purpose of this code is to now iterate through all those symbols and grab the Open, High, Low, Close, Adjustable Close, and Volume data from EODHD using their API.
Output
This code will contain data for each stock symbol similar to this example for TSLA below for all 48,292 stocks.
INFO:root:Processing 48930 symbols
INFO:root: Open High Low Close Adjusted_close Volume
Date
2010-06-29 18.9999 24.9999 17.5401 23.8899 1.5927 281749173
2010-06-30 25.7901 30.4191 23.3001 23.8299 1.5887 257915884
2010-07-01 24.9999 25.9200 20.2701 21.9600 1.4640 123447940
2010-07-02 23.0001 23.1000 18.7101 19.2000 1.2800 77127102
2010-07-06 20.0001 20.0001 15.8301 16.1100 1.0740 103189435
... ... ... ... ... ... ...
2023-10-17 250.1000 257.1830 247.0800 254.8500 254.8500 93562914
2023-10-18 252.7000 254.6300 242.0800 242.6800 242.6800 125147797
2023-10-19 225.9500 230.6100 216.7800 220.1100 220.1100 170772703
2023-10-20 217.0100 218.8606 210.4200 211.9900 211.9900 138010094
2023-10-23 210.0000 216.9800 202.5100 212.0800 212.0800 150215161
Overview of the Script
The script performs the following tasks:
- Reads a list of stock symbols from an HDF5 file that we created previously which contains all the US stock symbols.
- Fetches the EOD stock price data for each symbol via an API call.
- Stores the fetched data in an HDF5 file.
- Keeps track of the last downloaded time for each symbol in a JSON log file.
- Allows for skipping previously downloaded data.
- Plots a simple candlestick chart of the data.
Deep Dive into the Code
Let’s dissect the code step by step, explaining each function’s purpose and the underlying logic.
- Initial Setup
Before diving into the functions, you’ve set up some important libraries and configurations:
- Logging: Helps in tracking the flow and potential issues in the code.
- Data Directory: A designated path where you store or read data.
- API Key: Securely imported from a separate keys.py file to maintain security and modularity.
- Function: get_symbols
This function is designed to fetch stock symbols from an existing HDF5 file.
It first checks if the file exists. If not, it logs an informative message.
Then, it attempts to read the HDF5 file into a DataFrame.
The function ensures that the ‘Code’ column exists before extracting the list of symbols.
- Function: fetch_eod_price_data
This is the core function of your script. Its main responsibilities include:
- Fetching the EOD stock price data for each symbol.
- Skipping data that has recently been downloaded, using logs.
- Storing the fetched data in an HDF5 file.
- The function maintains a log of the last downloaded time for each symbol. This helps in efficiently updating your data without redundant API calls.
- Main Execution
After defining the utility functions, the main execution of the script begins:
- A list of stock symbols is fetched using the get_symbols function.
- The fetch_eod_price_data function is called to update the EOD price data.
- Function: plot_candlestick_with_volume
This function is designed to visually represent the stock data:
- It fetches the data for a particular stock from the HDF5 file.
- Filters the data based on the provided start and end dates.
- Plots a candlestick chart representing the stock’s price movements.
- On the same chart, it overlays a volume bar chart to indicate trading volumes on respective dates.
- Visualization
Finally, a candlestick chart with volume bars for the stock ‘TSLA’ is plotted for a specific date range.
Conclusion
This script provides a streamlined approach to fetch, store, and visualize EOD stock price data. The modular design makes it easy to adapt and extend the functionalities as needed. Whether you’re a seasoned trader or just someone intrigued by stock market data, this script offers a solid foundation to build upon.
In future posts, we’ll use Facebook’s Prophet to forward forecast this data into the future.
Code
import pandas as pd
import requests
from io import StringIO
from pathlib import Path
import logging
import json
from datetime import datetime, timedelta
from tqdm import tqdm
import matplotlib.pyplot as plt
from mplfinance.original_flavor import candlestick_ohlc
import matplotlib.dates as mdates
logging.basicConfig(level=logging.DEBUG)
data_dir = '/home/shared/algos/data/'
log_file_path = "symbol_price_data_time_log.json"
from keys import api_token
def get_symbols(h5_file_path, key='US'):
"""
Open an HDF5 file and return a list of symbols from the 'Code' column.
Parameters:
h5_file_path (str): The path to the HDF5 file.
key (str): The key to use when reading the HDF5 file. Default is 'exchanges'.
Returns:
list: A list of symbols if successful; otherwise, an empty list.
"""
h5_file_path = Path(h5_file_path)
# Check if the file exists
if not h5_file_path.exists():
logging.info(f"The file {h5_file_path} does not exist.")
return []
try:
# Read the DataFrame from the HDF5 file
df = pd.read_hdf(h5_file_path, key=key)
# Check if 'Code' column exists
if 'Code' not in df.columns:
logging.info(f"The 'Code' column does not exist in the DataFrame.")
return []
# Get the list of symbols from the 'Code' column
symbols = df['Code'].tolist()
return symbols
except Exception as e:
logging.error(f"An error occurred: {e}")
return []
def fetch_eod_price_data(stock_symbols, data_dir, log_file, skip_existing=True, hours_to_skip=24):
# Load existing logs if available
try:
with open(log_file, 'r') as f:
symbol_time_dict = json.load(f)
except FileNotFoundError:
symbol_time_dict = {}
all_data = {}
count = 0
# Create the directory if it doesn't exist
Path(data_dir).mkdir(parents=True, exist_ok=True)
hdf5_file_path = f"{data_dir}/eod_price_data.h5"
existing_symbols = set()
if skip_existing:
with pd.HDFStore(hdf5_file_path, 'r') as store:
existing_symbols = {key[1:] for key in store.keys()}
for symbol in tqdm(stock_symbols, desc='Fetching and storing EOD price data'):
last_downloaded_time = symbol_time_dict.get(symbol, None)
if skip_existing and symbol in existing_symbols:
continue
if last_downloaded_time:
last_downloaded_time = datetime.fromisoformat(last_downloaded_time)
time_since_last_download = datetime.now() - last_downloaded_time
if time_since_last_download < timedelta(hours=hours_to_skip):
logging.info(f"Data for symbol {symbol} was downloaded recently. Skipping...")
continue
logging.info(f"{symbol}: Downloading from EODHD...")
url = f'https://eodhd.com/api/eod/{symbol}.US?api_token={api_token}&fmt=csv'
response = requests.get(url)
if response.status_code != 200:
logging.error(f"Failed to fetch data for symbol {symbol}. HTTP Status Code: {response.status_code}")
continue
csv_data = StringIO(response.text)
df = pd.read_csv(csv_data)
if 'Date' not in df.columns:
logging.error(f"No 'Date' column in the data for symbol {symbol}. Skipping...")
logging.error(f"Data received: {response.text}")
continue
# Set 'Date' as DateTime index
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
# Add the DataFrame to the all_data dictionary
all_data[symbol] = df
# Update log
symbol_time_dict[symbol] = datetime.now().isoformat()
with open(log_file, 'w') as f:
json.dump(symbol_time_dict, f)
count += 1
# Update the HDF5 file every 100 symbols
if count % 100 == 0:
with pd.HDFStore(hdf5_file_path, mode='a') as store:
for symbol, data in all_data.items():
store.put(symbol, data)
logging.info(f"Saved {list(all_data.keys())} to HDF5 file.")
all_data = {}
# Save remaining data to HDF5
if all_data:
with pd.HDFStore(hdf5_file_path, mode='a') as store:
for symbol, data in all_data.items():
store.put(symbol, data)
logging.info(f"Saved remaining {list(all_data.keys())} to HDF5 file.")
logging.info("Finished updating HDF5 file.")
symbols = get_symbols(data_dir + 'symbols.h5', key='US')
logging.info(f'Processing {len(symbols)} symbols')
fetch_eod_price_data(symbols, data_dir, log_file_path, skip_existing=True, hours_to_skip=72)
symbol = 'TSLA'
hdf5_file_path = f"{data_dir}/eod_price_data.h5"
def plot_candlestick_with_volume(symbol, hdf5_file_path, start_date, end_date):
with pd.HDFStore(hdf5_file_path, 'r') as store:
if f'/{symbol}' in store.keys():
df = store.get(symbol)
logging.info("Data fetched successfully.")
# Filter the DataFrame based on the start and end dates
df = df[start_date:end_date]
# Ensure the DataFrame is sorted by date
df = df.sort_index()
# Convert date to a format that can be used by candlestick_ohlc
df['Date'] = df.index.map(mdates.date2num)
# Create subplots: one for candlestick, one for volume
fig, ax1 = plt.subplots(figsize=(10, 6))
ax2 = ax1.twinx()
# Plot candlestick chart
ohlc = df[['Date', 'Open', 'High', 'Low', 'Adjusted_close']].copy()
candlestick_ohlc(ax1, ohlc.values, width=0.6, colorup='green', colordown='red')
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
ax1.set_xlabel('Date')
ax1.set_ylabel('Price')
ax1.set_title(f'Candlestick chart for {symbol} ({start_date} to {end_date})')
ax1.grid(True)
# Plot volume
ax2.bar(df.index, df['Volume'], color='gray', alpha=0.3)
ax2.set_ylabel('Volume')
plt.show()
else:
logging.info(f"No data found for symbol: {symbol}")
symbol = 'TSLA'
hdf5_file_path = f"{data_dir}/eod_price_data.h5"
start_date = '2023-01-01'
end_date = '2023-10-23'
plot_candlestick_with_volume(symbol, hdf5_file_path, start_date, end_date)