How to Store Stock Market Data Locally with Python: A Step-by-Step Guide to Saving OHLCV Data from EODHD

In a previous post, I showed how to store symbol data from EODHD. The purpose of this code is to now iterate through all those symbols and grab the Open, High, Low, Close, Adjustable Close, and Volume data from EODHD using their API.

Output

This code will contain data for each stock symbol similar to this example for TSLA below for all 48,292 stocks.

INFO:root:Processing 48930 symbols
INFO:root:                Open      High       Low     Close  Adjusted_close     Volume
Date                                                                         
2010-06-29   18.9999   24.9999   17.5401   23.8899          1.5927  281749173
2010-06-30   25.7901   30.4191   23.3001   23.8299          1.5887  257915884
2010-07-01   24.9999   25.9200   20.2701   21.9600          1.4640  123447940
2010-07-02   23.0001   23.1000   18.7101   19.2000          1.2800   77127102
2010-07-06   20.0001   20.0001   15.8301   16.1100          1.0740  103189435
...              ...       ...       ...       ...             ...        ...
2023-10-17  250.1000  257.1830  247.0800  254.8500        254.8500   93562914
2023-10-18  252.7000  254.6300  242.0800  242.6800        242.6800  125147797
2023-10-19  225.9500  230.6100  216.7800  220.1100        220.1100  170772703
2023-10-20  217.0100  218.8606  210.4200  211.9900        211.9900  138010094
2023-10-23  210.0000  216.9800  202.5100  212.0800        212.0800  150215161

Overview of the Script


The script performs the following tasks:

  • Reads a list of stock symbols from an HDF5 file that we created previously which contains all the US stock symbols.
  • Fetches the EOD stock price data for each symbol via an API call.
  • Stores the fetched data in an HDF5 file.
  • Keeps track of the last downloaded time for each symbol in a JSON log file.
  • Allows for skipping previously downloaded data.
  • Plots a simple candlestick chart of the data.

Deep Dive into the Code

Let’s dissect the code step by step, explaining each function’s purpose and the underlying logic.

  1. Initial Setup

Before diving into the functions, you’ve set up some important libraries and configurations:

  • Logging: Helps in tracking the flow and potential issues in the code.
  • Data Directory: A designated path where you store or read data.
  • API Key: Securely imported from a separate keys.py file to maintain security and modularity.
  1. Function: get_symbols

This function is designed to fetch stock symbols from an existing HDF5 file.

It first checks if the file exists. If not, it logs an informative message.
Then, it attempts to read the HDF5 file into a DataFrame.
The function ensures that the ‘Code’ column exists before extracting the list of symbols.

  1. Function: fetch_eod_price_data

This is the core function of your script. Its main responsibilities include:

  • Fetching the EOD stock price data for each symbol.
  • Skipping data that has recently been downloaded, using logs.
  • Storing the fetched data in an HDF5 file.
  • The function maintains a log of the last downloaded time for each symbol. This helps in efficiently updating your data without redundant API calls.
  1. Main Execution

After defining the utility functions, the main execution of the script begins:

  • A list of stock symbols is fetched using the get_symbols function.
  • The fetch_eod_price_data function is called to update the EOD price data.
  1. Function: plot_candlestick_with_volume

This function is designed to visually represent the stock data:

  • It fetches the data for a particular stock from the HDF5 file.
  • Filters the data based on the provided start and end dates.
  • Plots a candlestick chart representing the stock’s price movements.
  • On the same chart, it overlays a volume bar chart to indicate trading volumes on respective dates.
  1. Visualization

Finally, a candlestick chart with volume bars for the stock ‘TSLA’ is plotted for a specific date range.

Conclusion

This script provides a streamlined approach to fetch, store, and visualize EOD stock price data. The modular design makes it easy to adapt and extend the functionalities as needed. Whether you’re a seasoned trader or just someone intrigued by stock market data, this script offers a solid foundation to build upon.

In future posts, we’ll use Facebook’s Prophet to forward forecast this data into the future.

Code

import pandas as pd
import requests
from io import StringIO
from pathlib import Path
import logging
import json
from datetime import datetime, timedelta
from tqdm import tqdm
import matplotlib.pyplot as plt
from mplfinance.original_flavor import candlestick_ohlc
import matplotlib.dates as mdates




logging.basicConfig(level=logging.DEBUG)

data_dir = '/home/shared/algos/data/'
log_file_path = "symbol_price_data_time_log.json"

from keys import api_token

def get_symbols(h5_file_path, key='US'):
    """
    Open an HDF5 file and return a list of symbols from the 'Code' column.

    Parameters:
        h5_file_path (str): The path to the HDF5 file.
        key (str): The key to use when reading the HDF5 file. Default is 'exchanges'.

    Returns:
        list: A list of symbols if successful; otherwise, an empty list.
    """

    h5_file_path = Path(h5_file_path)

    # Check if the file exists
    if not h5_file_path.exists():
        logging.info(f"The file {h5_file_path} does not exist.")
        return []

    try:
        # Read the DataFrame from the HDF5 file
        df = pd.read_hdf(h5_file_path, key=key)

        # Check if 'Code' column exists
        if 'Code' not in df.columns:
            logging.info(f"The 'Code' column does not exist in the DataFrame.")
            return []

        # Get the list of symbols from the 'Code' column
        symbols = df['Code'].tolist()

        return symbols

    except Exception as e:
        logging.error(f"An error occurred: {e}")
        return []


def fetch_eod_price_data(stock_symbols, data_dir, log_file, skip_existing=True, hours_to_skip=24):
    # Load existing logs if available
    try:
        with open(log_file, 'r') as f:
            symbol_time_dict = json.load(f)
    except FileNotFoundError:
        symbol_time_dict = {}

    all_data = {}
    count = 0

    # Create the directory if it doesn't exist
    Path(data_dir).mkdir(parents=True, exist_ok=True)
    hdf5_file_path = f"{data_dir}/eod_price_data.h5"

    existing_symbols = set()

    if skip_existing:
        with pd.HDFStore(hdf5_file_path, 'r') as store:
            existing_symbols = {key[1:] for key in store.keys()}


    for symbol in tqdm(stock_symbols, desc='Fetching and storing EOD price data'):
        last_downloaded_time = symbol_time_dict.get(symbol, None)

        if skip_existing and symbol in existing_symbols:
            continue

        if last_downloaded_time:
            last_downloaded_time = datetime.fromisoformat(last_downloaded_time)
            time_since_last_download = datetime.now() - last_downloaded_time
            if time_since_last_download < timedelta(hours=hours_to_skip):
                logging.info(f"Data for symbol {symbol} was downloaded recently. Skipping...")
                continue

        logging.info(f"{symbol}: Downloading from EODHD...")
        url = f'https://eodhd.com/api/eod/{symbol}.US?api_token={api_token}&fmt=csv'
        response = requests.get(url)
        if response.status_code != 200:
            logging.error(f"Failed to fetch data for symbol {symbol}. HTTP Status Code: {response.status_code}")
            continue

        csv_data = StringIO(response.text)
        df = pd.read_csv(csv_data)


        if 'Date' not in df.columns:
            logging.error(f"No 'Date' column in the data for symbol {symbol}. Skipping...")
            logging.error(f"Data received: {response.text}")
            continue

        # Set 'Date' as DateTime index
        df['Date'] = pd.to_datetime(df['Date'])
        df.set_index('Date', inplace=True)

        # Add the DataFrame to the all_data dictionary
        all_data[symbol] = df

        # Update log
        symbol_time_dict[symbol] = datetime.now().isoformat()
        with open(log_file, 'w') as f:
            json.dump(symbol_time_dict, f)

        count += 1

        # Update the HDF5 file every 100 symbols
        if count % 100 == 0:
            with pd.HDFStore(hdf5_file_path, mode='a') as store:
                for symbol, data in all_data.items():
                    store.put(symbol, data)
            logging.info(f"Saved {list(all_data.keys())} to HDF5 file.")
            all_data = {}

    # Save remaining data to HDF5
    if all_data:
        with pd.HDFStore(hdf5_file_path, mode='a') as store:
            for symbol, data in all_data.items():
                store.put(symbol, data)
        logging.info(f"Saved remaining {list(all_data.keys())} to HDF5 file.")

    logging.info("Finished updating HDF5 file.")



symbols = get_symbols(data_dir + 'symbols.h5', key='US')
logging.info(f'Processing {len(symbols)} symbols')


fetch_eod_price_data(symbols, data_dir, log_file_path, skip_existing=True, hours_to_skip=72)


symbol = 'TSLA'
hdf5_file_path = f"{data_dir}/eod_price_data.h5"


def plot_candlestick_with_volume(symbol, hdf5_file_path, start_date, end_date):
    with pd.HDFStore(hdf5_file_path, 'r') as store:
        if f'/{symbol}' in store.keys():
            df = store.get(symbol)
            logging.info("Data fetched successfully.")

            # Filter the DataFrame based on the start and end dates
            df = df[start_date:end_date]

            # Ensure the DataFrame is sorted by date
            df = df.sort_index()

            # Convert date to a format that can be used by candlestick_ohlc
            df['Date'] = df.index.map(mdates.date2num)

            # Create subplots: one for candlestick, one for volume
            fig, ax1 = plt.subplots(figsize=(10, 6))
            ax2 = ax1.twinx()

            # Plot candlestick chart
            ohlc = df[['Date', 'Open', 'High', 'Low', 'Adjusted_close']].copy()
            candlestick_ohlc(ax1, ohlc.values, width=0.6, colorup='green', colordown='red')
            ax1.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
            ax1.set_xlabel('Date')
            ax1.set_ylabel('Price')
            ax1.set_title(f'Candlestick chart for {symbol} ({start_date} to {end_date})')
            ax1.grid(True)

            # Plot volume
            ax2.bar(df.index, df['Volume'], color='gray', alpha=0.3)
            ax2.set_ylabel('Volume')

            plt.show()

        else:
            logging.info(f"No data found for symbol: {symbol}")


symbol = 'TSLA'
hdf5_file_path = f"{data_dir}/eod_price_data.h5"
start_date = '2023-01-01'
end_date = '2023-10-23'
plot_candlestick_with_volume(symbol, hdf5_file_path, start_date, end_date)

Leave a Reply