Extracting text from a PDF using OCR, Python, and Google Colab for free

Posted on February 15, 2024 by JeremyWhittaker

If you’ve ever downloaded a scanned PDF and tried to search it you’ll quickly realize this isn’t possible. Here is how you can use Python to extract the text from and PDF file and make it searchable.

Machine Learning for Algorithmic Trading: Predictive models to extract signals from market and alternative data for systematic trading strategies with Python GPT

Posted on February 6, 2024 by JeremyWhittaker

One of my favorite books for quantitative investing is Machine Learning for Algorithmic Trading. However, a lot of these concepts might be difficult for someone who getting started in programming strategies. So I created this Machine Learning for Algorithmic Trading GPT which can answer any questions related to this book. While these aren’t perfect responses they will typically put you on the right path with a little input.

Sample Excerpt:

You:

Based on the references in this book where can I download stock data?

Machine Learning for Trading Tutor:

Based on the references in the book, here are the sources where you can download stock data:

Quandl: Quandl provides stock price datasets covering the US equity markets for the years 2000-2018. The book mentions loading these datasets using pandas for analysis.
yfinance: yfinance offers a way to download historical market data from Yahoo! Finance. This library, initially named fix-yahoo-finance, allows for the downloading of various data points including end-of-day and intraday prices. The book describes the process of using yfinance to download historical prices for various periods, at different frequencies, and adjusted OHLCV (Open, High, Low, Close, Volume) data at a one-minute frequency along with corporate actions.
AlgoSeek: For more detailed intraday data, AlgoSeek provides adjustment factors to correct pricing and volumes for stock splits, dividends, and other corporate actions. The book suggests AlgoSeek as a source for such data, emphasizing its use for accurate market data analysis.
SEC (U.S. Securities and Exchange Commission): While not directly a source of stock prices, the SEC’s EDGAR database can be used to download financial statements and related information, which can be valuable for fundamental analysis. The book provides guidance on using Python to extract data from EDGAR.
Pandas DataReader: For additional fundamental data sources, the book mentions that many are accessible using the pandas_datareader module, which facilitates the download of financial data from a variety of sources including Yahoo Finance, Google Finance, and others.
International equity data and macro fundamental data: Besides U.S. stock data, the book also references sources for international equity data like Stooq and various macro fundamental and benchmark data from the Federal Reserve, indicating the broad scope of data sources available for trading and analysis.

These sources provide a comprehensive toolkit for accessing both historical and real-time market data, fundamental data, and other financial information crucial for trading and investment analysis.

You:

Can you write me a Python function for each of these with sample code to download data from this source.

Machine Learning for Trading Tutor

Sure, I’ll provide you with Python functions for downloading stock data from each of the mentioned sources. Please note that these examples require you to have the respective libraries installed and, in some cases, to register for API keys where necessary.

1. Quandl

For Quandl, you’ll need to install the quandl Python package and obtain an API key from the Quandl website.

pythonCopy code

import quandl

def download_quandl_data(symbol, start_date, end_date, api_key):
    quandl.ApiConfig.api_key = api_key
    data = quandl.get(symbol, start_date=start_date, end_date=end_date)
    return data

# Example usage
# Replace 'YOUR_API_KEY' with your actual Quandl API key
# data = download_quandl_data('WIKI/AAPL', '2020-01-01', '2020-12-31', 'YOUR_API_KEY')

2. Yahoo! Finance

For Yahoo! Finance, use the yfinance package. This doesn’t require an API key.

pythonCopy code

import yfinance as yf

def download_yfinance_data(symbol, start_date, end_date):
    data = yf.download(symbol, start=start_date, end=end_date)
    return data

# Example usage
# data = download_yfinance_data('AAPL', '2020-01-01', '2020-12-31')

3. AlgoSeek

Downloading data from AlgoSeek would typically require accessing their proprietary database or API, for which you might need an account or subscription. Here’s a generic approach assuming you have access:

# Placeholder for AlgoSeek data downloading
# Please refer to AlgoSeek's documentation and your subscription details for specific access methods

# Placeholder for AlgoSeek data downloading # Please refer to AlgoSeek's documentation and your subscription details for specific access methods

4. pandas_datareader

To use pandas_datareader for downloading stock data, ensure you have this package installed.

import pandas_datareader.data as web

def download_pandas_datareader_data(symbol, start_date, end_date):
    data = web.DataReader(symbol, 'yahoo', start_date, end_date)
    return data

# Example usage
# data = download_pandas_datareader_data('AAPL', '2020-01-01', '2020-12-31')

5. SEC Filings (EDGAR)

Downloading data from the SEC’s EDGAR system for fundamental analysis typically involves parsing XBRL filings. This is more complex and requires specific libraries (e.g., python-edgar or similar). Here’s a basic structure:

pythonCopy code

# Placeholder for EDGAR data downloading
# This operation is complex and requires parsing XBRL files; refer to specialized libraries and SEC's documentation

These examples are meant to get you started with downloading stock data from various sources. Depending on your specific needs and the details of your project, you might need to adjust parameters, handle API limits, or process the data further after downloading it.

Automating 1099 statements from TreasuryDirect.gov

Posted on January 24, 2024 by JeremyWhittaker

Ok, so Ibonds had a huge rate of return last year. Perhaps you’re like me and you created.. a few hundred accounts. Well, now you have a major problem. You need to get your 1099 statement from each account. With the TreasuryDirect.gov OTP (one-time password) process this can be very time-consuming. Below is my treasury direct Python script. It is now modified to automatically download all of your 1099 documents.

You can see below this script will generate PDFs of all of your 1099 files.

This script requires an input file that contains a column labeled ‘Treasury Direct’ Every value under this column should be your Treasury Direct account numbers. You can read more about all the intricacies of this code from my previous post, automating multiple treasurydirect.gov accounts.

#main.py

from treasury_direct import process_account, close_chrome
import pandas as pd


def get_account_data(force_update_all=True):
    for index, row in df.iterrows():
        account_number = row['Treasury Direct']
        print(f'Working on account number {account_number}')

        # Skip empty or NaN account numbers
        if pd.isna(account_number) or account_number == '':
            print('Skipping empty account number.')
            continue

        # Check if the row is already complete
        if force_update_all == False:
            if not pd.isna(row['LLC Name']) and not pd.isna(row['Original Purchase Amount']) \
                    and not pd.isna(row['Current Value']) and not pd.isna(row['Issue Date']) \
                    and not pd.isna(row['Interest Rate']):
                print(f'Skipping account number {row["Treasury Direct"]} as it is already complete.')
                continue
            success = process_account(account_number, df, index, url, force_update_all=False, get_bonds=False, get_bank_info=False, redeem=False, save_1099=True)
        elif force_update_all == True:
            success = process_account(account_number, df, index, url, force_update_all=False, get_bonds=False, get_bank_info=False, redeem=False, save_1099=True)

if __name__ == '__main__':

    df = pd.read_csv('accounts.csv')
    url = "https://www.treasurydirect.gov/RS/UN-Display.do"

    close_chrome()
    get_account_data(force_update_all=True)

#treasury_direct.py

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException, TimeoutException
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.action_chains import ActionChains

import pyautogui
import time
import pandas as pd
import base64
import os

from gmail import get_otp, move_otp_emails_to_trash
import psutil
from config import password

def close_chrome():
    for process in psutil.process_iter(["name", "exe"]):
        try:
            if process.info["name"] == "chrome.exe" or (process.info["exe"] and "chrome" in process.info["exe"]):
                process.terminate()
        except (psutil.NoSuchProcess, psutil.AccessDenied, psutil.ZombieProcess):
            pass

# Close any running Chrome instances
def process_account(account_number, df, index, url, force_update_all=False, get_bonds=True, get_bank_info=False, redeem=False, save_1099=False):
    chrome_options = Options()
    # chrome_options.add_argument("user-data-dir=/home/jeremy/.config/google-chrome")
    # chrome_options.add_argument("--headless")
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--start-maximized")

    service = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=service, options=chrome_options)
    driver.get(url)

    wait = WebDriverWait(driver, 10)

    username_input = driver.find_element(By.NAME, "username")
    username_input.send_keys(account_number)


    submit_button = driver.find_element(By.NAME, "submit")
    submit_button.click()



    # Get the page source to check which text is present
    page_source = driver.page_source

    if 'OTP' in page_source:
        otp_skipped = False
    else:
        otp_skipped = True

    if not otp_skipped:
        start_time = time.time()
        otp_received = False

        while not otp_received:
            elapsed_time = time.time() - start_time

            # Exit the loop and move to the next account if the timer exceeds 10 minutes
            if elapsed_time > 5 * 60:
                print("Timed out waiting for OTP after 10 minutes. Moving to the next account.")
                driver.close()
                move_otp_emails_to_trash()
                return  # Continue with the next iteration of the loop (if inside a loop)

            otp = get_otp()

            if otp is not None:
                otp_received = True
            else:
                # Request a new OTP if the timer exceeds 5 minutes
                if elapsed_time > 3 * 60:
                    try:
                        resend_link = driver.find_element(By.XPATH, '//a[contains(@href, "/RS/OTP-New.do")]')
                        resend_link.click()
                    except:
                        print('Unable to click resend OTP link')
                        move_otp_emails_to_trash()
                        return

                # Sleep for 10 seconds before trying again
                time.sleep(10)

        otp_input = driver.find_element(By.NAME, "otp")

        otp_input.send_keys(otp)

        #old checkbox no longer exists
        # # Check the checkbox
        # try:
        #     checkbox = driver.find_element(By.NAME, "registerM2M")
        #     checkbox.click()
        # except:
        #     print('Unable to click the checkbox')

        otp_submit_button = driver.find_element(By.XPATH,
                                                '//input[@class="action" and @type="submit" and @name="enter.x"]')
        otp_submit_button.click()

    # Pause the script for 5 seconds
    time.sleep(3)

    try:
        password_field = driver.find_element(By.NAME, "password")
        password_field.send_keys(password)
    except:
        print('Unable to enter password')
        return

    try:
        submit_button = driver.find_element(By.NAME, "enter.x")
        submit_button.click()
    except:
        return

    # Get the page source to check which text is present
    page_source = driver.page_source

    if 'Contact Info Verification' in page_source:
        civ_skipped = False
    else:
        civ_skipped = True

    if not civ_skipped:
        verify_button = driver.find_element(By.XPATH, '//input[@type="submit" and @value="Verify"]')
        verify_button.click()

    time.sleep(3)

    if get_bonds:
        try:
            link = wait.until(EC.presence_of_element_located((By.LINK_TEXT, "SAVINGS BONDS")))
            link.click()
        except (NoSuchElementException, TimeoutException):
            print("Unable to locate the 'SAVINGS BONDS' link within the specified timeout. This account probably has no savings bonds.")


        time.sleep(3)

        # Find all radio buttons
        radio_buttons = WebDriverWait(driver, 10).until(
            EC.presence_of_all_elements_located((By.XPATH, '//input[@type="radio"]'))
        )

        print(f"Found {len(radio_buttons)} radio buttons")  # Debugging statement

        # Find the first radio button that is not disabled
        radio_button_to_select = None
        for radio_button in radio_buttons:
            is_disabled = radio_button.get_attribute("disabled")
            print(f"Radio button disabled attribute: {is_disabled}")  # Debugging statement
            if not is_disabled:
                radio_button_to_select = radio_button
                break

        if radio_button_to_select:
            print("Found an enabled radio button.")
            try:
                radio_button_to_select.click()
                print("Click successful.")
            except Exception as e:
                print(f"Error clicking the radio button: {e}")
        else:
            print("No enabled radio button found.")

        # Click the submit button
        submit_button = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, '//input[@type="submit" and @value="Submit"]'))
        )
        try:
            submit_button.click()
        except Exception as e:
            print(f"Error clicking the submit button: {e}")
        try:
            # Locate the elements containing the desired information
            llc_and_account_number = driver.find_element(By.XPATH, '//div[@id="accountnumber"]').text

            original_purchase_amount = driver.find_element(By.XPATH,
                                                           '//p[contains(text(), "Series I current holdings total amount")]/span').text
            current_value = driver.find_element(By.XPATH,
                                                '//p[contains(text(), "Series I current holdings current value")]/span').text
            issue_date = driver.find_element(By.XPATH, '(//tr[contains(@class, "altrow")]/td)[3]').text
            interest_rate = driver.find_element(By.XPATH, '//td[contains(text(), "%")]').text

            try:
                status = driver.find_element(By.XPATH, '//td[contains(text(), "Pending Redemption")]').text
                if status:
                    print(f"Status: {status}")
                    df.loc[index, 'Status'] = status
                else:
                    print("Status is blank. Moving to the next field.")
                    df.loc[index, 'Status'] = "N/A"  # Or whatever value you wish to use for blank fields
            except NoSuchElementException:
                print("Element not found. Moving to the next field.")
                df.loc[index, 'Status'] = "N/A"  # Or whatever value you wish to use for missing fields
            # Separate the LLC name and account number
            llc_name, account_number = llc_and_account_number.split(':', 1)
            llc_name = llc_name.strip().replace("LLC Name: ", "")
            account_number = account_number.strip()

            # Print the extracted information
            print(f"LLC Name: {llc_name}")
            print(f"Account Number: {account_number}")
            print(f"Original Purchase Amount: {original_purchase_amount}")
            print(f"Current Value: {current_value}")
            print(f"Issue Date: {issue_date}")
            print(f"Interest Rate: {interest_rate}")

            # Save the extracted information as new columns for the current row
            df.loc[index, 'LLC Name'] = llc_name
            df.loc[index, 'Original Purchase Amount'] = original_purchase_amount
            df.loc[index, 'Current Value'] = current_value
            df.loc[index, 'Issue Date'] = issue_date
            df.loc[index, 'Interest Rate'] = interest_rate
        except NoSuchElementException:
            print(f"Failed to extract ibond information for account {account_number}. Moving to the next account.")


    if redeem:
        print('Redeem is turned on')
        radio_buttons = WebDriverWait(driver, 10).until(
            EC.presence_of_all_elements_located((By.XPATH, '//input[@type="radio"]'))
        )

        # Find the first radio button that is not disabled
        radio_button_to_select = None
        for radio_button in radio_buttons:
            if not radio_button.get_attribute("disabled"):
                radio_button_to_select = radio_button
                break

        # Select the radio button
        if radio_button_to_select:
            radio_button_to_select.click()
        else:
            print("No enabled radio button found")


        # Click the submit button
        select_button = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, '//input[@type="submit" and @value="Select"]'))
        )
        select_button.click()
        time.sleep(1)

        redeem_button = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, '//input[@type="submit" and @value="Redeem"]'))
        )
        redeem_button.click()
        time.sleep(1)


        review_button = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, '//input[@type="submit" and @value="Review"]'))
        )
        review_button.click()
        time.sleep(1)



        submit_button = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, '//input[@type="submit" and @value="Submit"]'))
        )
        submit_button.click()
        time.sleep(1)

    if get_bank_info:
        # Get bank account information
        # Click the "ManageDirect" link
        wait = WebDriverWait(driver, 10)
        manage_direct_link = wait.until(
            EC.presence_of_element_located((By.XPATH, '//a[contains(@href, "md.DisplayManageDirect")]')))

        manage_direct_link.click()

        time.sleep(3)

        # # Click the "Update my account information" link
        # update_account_info_link = driver.find_element_by_xpath('//a[contains(@href, "ai.DisplayEditAccountInfo")]')
        # update_account_info_link.click()

        # Click the "Update my Bank Information" link
        update_bank_info_link = driver.find_element(By.XPATH, '//a[contains(@href, "bank.DisplayBankInfo")]')
        update_bank_info_link.click()

        # # Check if the words "Security Question" exist on the page
        # security_question_elements = driver.find_elements_by_xpath('//h1/strong[contains(text(), "Security Question")]')
        #
        # if len(security_question_elements) > 0:
        #     # Extract the question
        #     question = driver.find_element_by_xpath('//h3').text
        #
        #     # If the question contains the word "pet's", fill the input element with 'mona'
        #     if "pet's" in question.lower():
        #         answer_input = driver.find_element_by_xpath('//input[@type="password" and @name="securityQuestionAnswer"]')
        #         answer_input.send_keys(pets_name)

        time.sleep(3)

        try:
            bank_name = driver.find_element(By.XPATH, '//tr[@class="altrow1"][1]/td[3]/strong').text
            routing_number = driver.find_element(By.XPATH, '//tr[@class="altrow1"][2]/td[3]/strong').text
            account_number = driver.find_element(By.XPATH, '//tr[@class="altrow1"][3]/td[3]/strong').text
            names_on_account = driver.find_element(By.XPATH, '//tr[@class="altrow1"][4]/td[3]/strong').text
            account_type = driver.find_element(By.XPATH, '//tr[@class="altrow1"][5]/td[3]/strong').text
            return_code = driver.find_element(By.XPATH, '//tr[@class="altrow1"][6]/td[3]/strong').text

            # Print the extracted information
            print("Bank Name:", bank_name)
            print("Routing Number:", routing_number)
            print("Account Number:", account_number)
            print("Name(s) on Account:", names_on_account)
            print("Account Type:", account_type)
            print("Return Code:", return_code)

            # Save the extracted information as new columns for the current row
            df.loc[index, 'Bank Name'] = bank_name
            df.loc[index, 'Routing Number'] = routing_number
            df.loc[index, 'Account Number'] = account_number
            df.loc[index, 'Name(s) on Account'] = names_on_account
            df.loc[index, 'Account Type'] = account_type
            df.loc[index, 'Return Code'] = return_code

            # print('Press enter to continue')
            # input()
        except NoSuchElementException:
            print(f"Failed to extract information for account {account_number}. Moving to the next account.")




    df.to_csv('accounts.csv', index=False)
    #
    # #code to redeem
    # # Click on Current Holdings
    # try:
    #     body_element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, 'currentholdings')))
    #     body_element.click()
    # except TimeoutException:
    #     print("Couldn't find Current Holdings element.")
    #
    # # Click on the radio button for Series I Savings Bond
    # try:
    #     series_i_radio = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//input[@type="radio" and @name="seriesCode" and @value="9122270681520925360"]')))
    #     series_i_radio.click()
    # except TimeoutException:
    #     print("Couldn't find Series I Savings Bond radio button.")
    #
    # # Click on the Submit button
    # try:
    #     submit_button = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//input[@class="action" and @type="submit" and @name="572180930158889311"]')))
    #     submit_button.click()
    # except TimeoutException:
    #     print("Couldn't find first Submit button.")
    #
    # # Click on the next radio button
    # try:
    #     next_radio_button = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//input[@type="radio" and @name="security_parms" and @value="7956006363914591110"]')))
    #     next_radio_button.click()
    # except TimeoutException:
    #     print("Couldn't find next radio button.")
    #
    # # Click on the Select button
    # try:
    #     select_button = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//input[@class="action" and @type="submit" and @name="8745557767672189629"]')))
    #     select_button.click()
    # except TimeoutException:
    #     print("Couldn't find Select button.")
    #
    # # Click on the Redeem radio button
    # try:
    #     redeem_button = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//input[@class="action" and @type="submit" and @name="2263125525558940209"]')))
    #     redeem_button.click()
    # except TimeoutException:
    #     print("Couldn't find Redeem button.")
    #
    # # Click on the Review radio button
    # try:
    #     review_button = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//input[@class="action" and @type="submit" and @name="5163266466587291065"]')))
    #     review_button.click()
    # except TimeoutException:
    #     print("Couldn't find Review button.")

    if save_1099:
        directory = './1099'

        # Loop through a range of years, e.g., 2020 to 2024
        if not os.path.exists(directory):
            os.makedirs(directory)

        for year in range(2023, 2025):
            try:
                manage_direct_link = wait.until(
                    EC.presence_of_element_located((By.XPATH, '//a[contains(@href, "md.DisplayManageDirect")]')))
                manage_direct_link.click()
                time.sleep(2)

                # year_link_text = f"Year {year}"
                # year_link = WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.LINK_TEXT, year_link_text)))
                # year_link.click()
                # time.sleep(2)

                year_link = WebDriverWait(driver, 5).until(
                    EC.element_to_be_clickable((By.XPATH, f"//a[contains(text(), '{year}')]")))
                year_link.click()
                time.sleep(2)

                #you can click on the actual 1099 link but it won't exist in some instances(that tax year has to be in reportable period or later)
                # # Find and Click on the "View your 1099 for tax year" Link
                # view_1099_link_text = f"View your 1099 for tax year {year}"
                # view_1099_link = WebDriverWait(driver, 10).until(
                #     EC.presence_of_element_located((By.LINK_TEXT, view_1099_link_text)))
                # view_1099_link.click()
                # time.sleep(3)

                # Print the Page to PDF
                pdf_filename = f'{account_number}_{year}_1099.pdf'
                pdf_path = os.path.join(directory, pdf_filename)  # Path to save PDF in the '1099' subdirectory
                pdf_options = {
                    'printBackground': True,
                    'pageRanges': '1',
                    'paperWidth': 8.27,  # A4 paper size
                    'paperHeight': 11.69,  # A4 paper size
                    'path': pdf_path  # Save PDF with formatted filename in the '1099' subdirectory
                }

                result = driver.execute_cdp_cmd("Page.printToPDF", pdf_options)
                with open(pdf_path, "wb") as file:
                    file.write(base64.b64decode(result['data']))

            except (NoSuchElementException, TimeoutException):
                print(f"Unable to locate or process a link for the year {year}.")
                continue
    driver.close()
    move_otp_emails_to_trash()
    #Slow the program down as I believe the server is rate limiting.
    # time.sleep(60)

#gmail.py

import os
import pickle
import base64
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from google.auth.transport.requests import Request

# If modifying these SCOPES, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/gmail.modify']


def get_credentials():
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first time.
    if os.path.exists('token.pickle'):
        print("Loading credentials from pickle file.")
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    else:
        print("No pickle file found.")

    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            print("Credentials expired. Refreshing...")
            try:
                creds.refresh(Request())
                print("Credentials refreshed successfully.")
            except Exception as e:
                print(f"Could not refresh the token: {e}")
                # Remove the existing token.pickle file to avoid reusing it
                if os.path.exists('token.pickle'):
                    os.remove('token.pickle')
                print("Removed expired token.pickle file.")
                # Trigger the OAuth2 flow
                flow = InstalledAppFlow.from_client_secrets_file('credentials.json', SCOPES)
                creds = flow.run_local_server(port=0)
        else:
            print("No valid credentials. Running authorization flow.")
            flow = InstalledAppFlow.from_client_secrets_file('credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)

        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            print("Saving credentials to pickle file.")
            pickle.dump(creds, token)
    else:
        print("Credentials are valid.")
    return creds

def get_otp():
    try:
        creds = get_credentials()
        service = build('gmail', 'v1', credentials=creds)
        results = service.users().messages().list(userId='me',
                                                  q='from:Treasury.Direct@fiscal.treasury.gov subject:"One Time Passcode" is:unread newer_than:1m').execute()
        messages = results.get('messages', [])
        if not messages:
            print('No messages found.')
            return None
        else:
            # Get the first unread email
            message = messages[0]
            msg = service.users().messages().get(userId='me', id=message['id'], format='full').execute()
            msg_str = base64.urlsafe_b64decode(msg['payload']['body']['data']).decode()
            otp = msg_str.splitlines()[6].split()[0]
            if otp:
                one_time_passcode = otp
                print(f"{one_time_passcode}")
                return one_time_passcode
            else:
                print("No One Time Passcode found in the email.")
                return None
    except HttpError as error:
        print(f'An error occurred: {error}')
        return None

def move_otp_emails_to_trash():
    try:
        creds = get_credentials()
        service = build('gmail', 'v1', credentials=creds)
        results = service.users().messages().list(userId='me',
                                                  q='from:Treasury.Direct@fiscal.treasury.gov subject:"One Time Passcode"').execute()
        messages = results.get('messages', [])
        if not messages:
            print('No messages found.')
        else:
            for message in messages:
                service.users().messages().trash(userId='me', id=message['id']).execute()
                print(f"Moved message with ID {message['id']} to trash.")
    except HttpError as error:
        print(f'An error occurred: {error}')

move_otp_emails_to_trash()
#
#
# if __name__ == '__main__':
#     get_emails()

#config.py

password = 'password'

Mortgage Rates on Hold? Latest FED Futures Flip Forecast for March

Posted on January 23, 2024 by JeremyWhittaker

Fed Futures have indicated a change in the wind regarding the Federal Reserve’s upcoming policy decisions. While everyone was anticipating rate cuts for the March meeting now seem less likely, with the pendulum swinging from 52.9% to 46.2% in favor of maintaining the current rates.

FOMC Rate Probabilities

Homebuyers and sellers who have been on the sidelines, hoping for a more favorable mortgage landscape, may find themselves once again changing their timelines. The real estate market is about to be slowed down into peak season for 2024. If these future predictions hold the real estate market volume will continue heading south.

Arizona Housing Market Trends & Forecast

Modifying PyFolio to output to HTML

Posted on January 23, 2024 by JeremyWhittaker

Recently I was following a paper and in the example they used Pyfolio which is an awesome performance and risk analysis library in Python developed by Quantopian Inc when they were still around. Given that Quantopian is no longer around nobody is maintaining this library. I ran into a few errors and figured I would outline the solutions below in case anyone has these issues. But before I dive too deep into modifying this library you may be better off just uninstalling Pyfolio and loading Pyfolio-reloaded. But that is not the purpose of this article.

Today I want to discuss the output of Pyfolio. It was written to output in a Jupyter Notebook, which no real programmer uses. Then if you output it to the console the format is horrible and all over the place. So I ended up rewriting some of the Pyfolio files so that when you run create_full_tear_sheet() it will generate an HTML file for later analysis. Here is a sample of the output.

Sample Output

tears.py

#
# Copyright 2019 Quantopian, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import warnings
from time import time

import empyrical as ep
import matplotlib.gridspec as gridspec
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from IPython.display import display, Markdown
import os
import glob
import datetime
import base64

from . import capacity
from . import perf_attrib
from . import plotting
from . import pos
from . import round_trips
from . import timeseries
from . import txn
from . import utils
import logging
# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

FACTOR_PARTITIONS = {
    "style": [
        "momentum",
        "size",
        "value",
        "reversal_short_term",
        "volatility",
    ],
    "sector": [
        "basic_materials",
        "consumer_cyclical",
        "financial_services",
        "real_estate",
        "consumer_defensive",
        "health_care",
        "utilities",
        "communication_services",
        "energy",
        "industrials",
        "technology",
    ],
}

def save_plot(fig_or_ax, plot_name, directory="./plots/temp"):
    """Save a matplotlib figure or axes object."""
    logging.info(f'Saving {plot_name} to {directory}')
    if not os.path.exists(directory):
        os.makedirs(directory)
    filepath = os.path.join(directory, f"{plot_name}.png")
    if isinstance(fig_or_ax, plt.Figure):
        fig_or_ax.savefig(filepath)
    else:
        fig_or_ax.get_figure().savefig(filepath)
    print(f"Plot saved: {filepath}")

def timer(msg_body, previous_time):
    current_time = time()
    run_time = current_time - previous_time
    message = "\nFinished " + msg_body + " (required {:.2f} seconds)."
    print(message.format(run_time))

    return current_time


def create_full_tear_sheet(
    returns,
    positions=None,
    transactions=None,
    market_data=None,
    benchmark_rets=None,
    slippage=None,
    live_start_date=None,
    sector_mappings=None,
    round_trips=False,
    estimate_intraday="infer",
    hide_positions=False,
    cone_std=(1.0, 1.5, 2.0),
    bootstrap=False,
    unadjusted_returns=None,
    turnover_denom="AGB",
    set_context=True,
    factor_returns=None,
    factor_loadings=None,
    pos_in_dollars=True,
    header_rows=None,
    factor_partitions=FACTOR_PARTITIONS,
):
    """
    Generate a number of tear sheets that are useful
    for analyzing a strategy's performance.

    - Fetches benchmarks if needed.
    - Creates tear sheets for returns, and significant events.
        If possible, also creates tear sheets for position analysis
        and transaction analysis.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - Time series with decimal returns.
         - Example:
            2015-07-16    -0.012143
            2015-07-17    0.045350
            2015-07-20    0.030957
            2015-07-21    0.004902
    positions : pd.DataFrame, optional
        Daily net position values.
         - Time series of dollar amount invested in each position and cash.
         - Days where stocks are not held can be represented by 0 or NaN.
         - Non-working capital is labelled 'cash'
         - Example:
            index         'AAPL'         'MSFT'          cash
            2004-01-09    13939.3800     -14012.9930     711.5585
            2004-01-12    14492.6300     -14624.8700     27.1821
            2004-01-13    -13853.2800    13653.6400      -43.6375
    transactions : pd.DataFrame, optional
        Executed trade volumes and fill prices.
        - One row per trade.
        - Trades on different names that occur at the
          same time will have identical indicies.
        - Example:
            index                  amount   price    symbol
            2004-01-09 12:18:01    483      324.12   'AAPL'
            2004-01-09 12:18:01    122      83.10    'MSFT'
            2004-01-13 14:12:23    -75      340.43   'AAPL'
    market_data : pd.DataFrame, optional
        Daily market_data
        - DataFrame has a multi-index index, one level is dates and another is
        market_data contains volume & price, equities as columns
    slippage : int/float, optional
        Basis points of slippage to apply to returns before generating
        tearsheet stats and plots.
        If a value is provided, slippage parameter sweep
        plots will be generated from the unadjusted returns.
        Transactions and positions must also be passed.
        - See txn.adjust_returns_for_slippage for more details.
    live_start_date : datetime, optional
        The point in time when the strategy began live trading,
        after its backtest period. This datetime should be normalized.
    hide_positions : bool, optional
        If True, will not output any symbol names.
    round_trips: boolean, optional
        If True, causes the generation of a round trip tear sheet.
    sector_mappings : dict or pd.Series, optional
        Security identifier to sector mapping.
        Security ids as keys, sectors as values.
    estimate_intraday: boolean or str, optional
        Instead of using the end-of-day positions, use the point in the day
        where we have the most $ invested. This will adjust positions to
        better approximate and represent how an intraday strategy behaves.
        By default, this is 'infer', and an attempt will be made to detect
        an intraday strategy. Specifying this value will prevent detection.
    cone_std : float, or tuple, optional
        If float, The standard deviation to use for the cone plots.
        If tuple, Tuple of standard deviation values to use for the cone plots
         - The cone is a normal distribution with this standard deviation
             centered around a linear regression.
    bootstrap : boolean (optional)
        Whether to perform bootstrap analysis for the performance
        metrics. Takes a few minutes longer.
    turnover_denom : str
        Either AGB or portfolio_value, default AGB.
        - See full explanation in txn.get_turnover.
    factor_returns : pd.Dataframe, optional
        Returns by factor, with date as index and factors as columns
    factor_loadings : pd.Dataframe, optional
        Factor loadings for all days in the date range, with date and
        ticker as index, and factors as columns.
    pos_in_dollars : boolean, optional
        indicates whether positions is in dollars
    header_rows : dict or OrderedDict, optional
        Extra rows to display at the top of the perf stats table.
    set_context : boolean, optional
        If True, set default plotting style context.
         - See plotting.context().
    factor_partitions : dict, optional
        dict specifying how factors should be separated in perf attrib
        factor returns and risk exposures plots
        - See create_perf_attrib_tear_sheet().
    """

    if (
        (unadjusted_returns is None)
        and (slippage is not None)
        and (transactions is not None)
    ):
        unadjusted_returns = returns.copy()
        returns = txn.adjust_returns_for_slippage(
            returns, positions, transactions, slippage
        )

    positions = utils.check_intraday(
        estimate_intraday, returns, positions, transactions
    )

    create_returns_tear_sheet(
        returns,
        positions=positions,
        transactions=transactions,
        live_start_date=live_start_date,
        cone_std=cone_std,
        benchmark_rets=benchmark_rets,
        bootstrap=bootstrap,
        turnover_denom=turnover_denom,
        header_rows=header_rows,
        set_context=set_context,
    )

    create_interesting_times_tear_sheet(
        returns, benchmark_rets=benchmark_rets, set_context=set_context
    )

    if positions is not None:
        create_position_tear_sheet(
            returns,
            positions,
            hide_positions=hide_positions,
            set_context=set_context,
            sector_mappings=sector_mappings,
            estimate_intraday=False,
        )

        if transactions is not None:
            create_txn_tear_sheet(
                returns,
                positions,
                transactions,
                unadjusted_returns=unadjusted_returns,
                estimate_intraday=False,
                set_context=set_context,
            )
            if round_trips:
                create_round_trip_tear_sheet(
                    returns=returns,
                    positions=positions,
                    transactions=transactions,
                    sector_mappings=sector_mappings,
                    estimate_intraday=False,
                )

            if market_data is not None:
                create_capacity_tear_sheet(
                    returns,
                    positions,
                    transactions,
                    market_data,
                    liquidation_daily_vol_limit=0.2,
                    last_n_days=125,
                    estimate_intraday=False,
                )

        if factor_returns is not None and factor_loadings is not None:
            create_perf_attrib_tear_sheet(
                returns,
                positions,
                factor_returns,
                factor_loadings,
                transactions,
                pos_in_dollars=pos_in_dollars,
                factor_partitions=factor_partitions,
            )
            
    # Location where the individual HTML files are saved
    html_files_dir = './plots/temp'
    plots_dir = './plots/temp'  # Directory where PNG files are saved

    # Aggregate HTML content from tables
    aggregated_html_content = ''
    for html_file in glob.glob(os.path.join(html_files_dir, "*.html")):
        with open(html_file, 'r') as file:
            aggregated_html_content += file.read() + '<br><hr><br>'

    # Embed PNG files into HTML content
    for png_file in glob.glob(os.path.join(plots_dir, "*.png")):
        with open(png_file, "rb") as image_file:
            encoded_string = base64.b64encode(image_file.read()).decode()
            img_tag = f'<img src="data:image/png;base64,{encoded_string}" style="width:100%"><br><hr><br>'
            aggregated_html_content += img_tag

    # Save the aggregated content to a new HTML file
    timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S%f")
    aggregated_filename = f"full_tearsheet.html"
    aggregated_file_path = os.path.join('./plots', aggregated_filename)
    with open(aggregated_file_path, 'w') as file:
        file.write(aggregated_html_content)
    logging.info(f"Aggregated tearsheet saved to {aggregated_file_path}")

    # Delete individual HTML files to avoid duplication in the future
    for html_file in glob.glob(os.path.join(html_files_dir, "*.html")):
        os.remove(html_file)

@plotting.customize
def create_simple_tear_sheet(
    returns,
    positions=None,
    transactions=None,
    benchmark_rets=None,
    slippage=None,
    estimate_intraday="infer",
    live_start_date=None,
    turnover_denom="AGB",
    header_rows=None,
):
    """
    Simpler version of create_full_tear_sheet; generates summary performance
    statistics and important plots as a single image.

    - Plots: cumulative returns, rolling beta, rolling Sharpe, underwater,
        exposure, top 10 holdings, total holdings, long/short holdings,
        daily turnover, transaction time distribution.
    - Never accept market_data input (market_data = None)
    - Never accept sector_mappings input (sector_mappings = None)
    - Never perform bootstrap analysis (bootstrap = False)
    - Never hide posistions on top 10 holdings plot (hide_positions = False)
    - Always use default cone_std (cone_std = (1.0, 1.5, 2.0))

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - Time series with decimal returns.
         - Example:
            2015-07-16    -0.012143
            2015-07-17    0.045350
            2015-07-20    0.030957
            2015-07-21    0.004902
    positions : pd.DataFrame, optional
        Daily net position values.
         - Time series of dollar amount invested in each position and cash.
         - Days where stocks are not held can be represented by 0 or NaN.
         - Non-working capital is labelled 'cash'
         - Example:
            index         'AAPL'         'MSFT'          cash
            2004-01-09    13939.3800     -14012.9930     711.5585
            2004-01-12    14492.6300     -14624.8700     27.1821
            2004-01-13    -13853.2800    13653.6400      -43.6375
    transactions : pd.DataFrame, optional
        Executed trade volumes and fill prices.
        - One row per trade.
        - Trades on different names that occur at the
          same time will have identical indicies.
        - Example:
            index                  amount   price    symbol
            2004-01-09 12:18:01    483      324.12   'AAPL'
            2004-01-09 12:18:01    122      83.10    'MSFT'
            2004-01-13 14:12:23    -75      340.43   'AAPL'
    benchmark_rets : pd.Series, optional
        Daily returns of the benchmark, noncumulative.
    slippage : int/float, optional
        Basis points of slippage to apply to returns before generating
        tearsheet stats and plots.
        If a value is provided, slippage parameter sweep
        plots will be generated from the unadjusted returns.
        Transactions and positions must also be passed.
        - See txn.adjust_returns_for_slippage for more details.
    live_start_date : datetime, optional
        The point in time when the strategy began live trading,
        after its backtest period. This datetime should be normalized.
    turnover_denom : str, optional
        Either AGB or portfolio_value, default AGB.
        - See full explanation in txn.get_turnover.
    header_rows : dict or OrderedDict, optional
        Extra rows to display at the top of the perf stats table.
    set_context : boolean, optional
        If True, set default plotting style context.
    """

    positions = utils.check_intraday(
        estimate_intraday, returns, positions, transactions
    )

    if (slippage is not None) and (transactions is not None):
        returns = txn.adjust_returns_for_slippage(
            returns, positions, transactions, slippage
        )

    always_sections = 4
    positions_sections = 4 if positions is not None else 0
    transactions_sections = 2 if transactions is not None else 0
    live_sections = 1 if live_start_date is not None else 0
    benchmark_sections = 1 if benchmark_rets is not None else 0

    vertical_sections = sum(
        [
            always_sections,
            positions_sections,
            transactions_sections,
            live_sections,
            benchmark_sections,
        ]
    )

    if live_start_date is not None:
        live_start_date = ep.utils.get_utc_timestamp(live_start_date)

    plotting.show_perf_stats(
        returns,
        benchmark_rets,
        positions=positions,
        transactions=transactions,
        turnover_denom=turnover_denom,
        live_start_date=live_start_date,
        header_rows=header_rows,
    )

    fig = plt.figure(figsize=(14, vertical_sections * 6))
    gs = gridspec.GridSpec(vertical_sections, 3, wspace=0.5, hspace=0.5)

    ax_rolling_returns = plt.subplot(gs[:2, :])
    i = 2
    if benchmark_rets is not None:
        ax_rolling_beta = plt.subplot(gs[i, :], sharex=ax_rolling_returns)
        i += 1
    ax_rolling_sharpe = plt.subplot(gs[i, :], sharex=ax_rolling_returns)
    i += 1
    ax_underwater = plt.subplot(gs[i, :], sharex=ax_rolling_returns)
    i += 1

    plotting.plot_rolling_returns(
        returns,
        factor_returns=benchmark_rets,
        live_start_date=live_start_date,
        cone_std=(1.0, 1.5, 2.0),
        ax=ax_rolling_returns,
    )
    ax_rolling_returns.set_title("Cumulative returns")

    if benchmark_rets is not None:
        plotting.plot_rolling_beta(returns, benchmark_rets, ax=ax_rolling_beta)

    plotting.plot_rolling_sharpe(returns, ax=ax_rolling_sharpe)

    plotting.plot_drawdown_underwater(returns, ax=ax_underwater)

    if positions is not None:
        # Plot simple positions tear sheet
        ax_exposures = plt.subplot(gs[i, :])
        i += 1
        ax_top_positions = plt.subplot(gs[i, :], sharex=ax_exposures)
        i += 1
        ax_holdings = plt.subplot(gs[i, :], sharex=ax_exposures)
        i += 1
        ax_long_short_holdings = plt.subplot(gs[i, :])
        i += 1

        positions_alloc = pos.get_percent_alloc(positions)

        plotting.plot_exposures(returns, positions, ax=ax_exposures)

        plotting.show_and_plot_top_positions(
            returns,
            positions_alloc,
            show_and_plot=0,
            hide_positions=False,
            ax=ax_top_positions,
        )

        plotting.plot_holdings(returns, positions_alloc, ax=ax_holdings)

        plotting.plot_long_short_holdings(
            returns, positions_alloc, ax=ax_long_short_holdings
        )

        if transactions is not None:
            # Plot simple transactions tear sheet
            ax_turnover = plt.subplot(gs[i, :])
            i += 1
            ax_txn_timings = plt.subplot(gs[i, :])
            i += 1

            plotting.plot_turnover(
                returns,
                transactions,
                positions,
                turnover_denom=turnover_denom,
                ax=ax_turnover,
            )

            plotting.plot_txn_time_hist(transactions, ax=ax_txn_timings)

    for ax in fig.axes:
        ax.tick_params(
            axis="x",
            which="major",
            bottom=True,
            top=False,
            labelbottom=True,
        )


@plotting.customize
def create_returns_tear_sheet(
    returns,
    positions=None,
    transactions=None,
    live_start_date=None,
    cone_std=(1.0, 1.5, 2.0),
    benchmark_rets=None,
    bootstrap=False,
    turnover_denom="AGB",
    header_rows=None,
    return_fig=False,
):
    """
    Generate a number of plots for analyzing a strategy's returns.

    - Fetches benchmarks, then creates the plots on a single figure.
    - Plots: rolling returns (with cone), rolling beta, rolling sharpe,
        rolling Fama-French risk factors, drawdowns, underwater plot, monthly
        and annual return plots, daily similarity plots,
        and return quantile box plot.
    - Will also print the start and end dates of the strategy,
        performance statistics, drawdown periods, and the return range.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in create_full_tear_sheet.
    positions : pd.DataFrame, optional
        Daily net position values.
         - See full explanation in create_full_tear_sheet.
    transactions : pd.DataFrame, optional
        Executed trade volumes and fill prices.
        - See full explanation in create_full_tear_sheet.
    live_start_date : datetime, optional
        The point in time when the strategy began live trading,
        after its backtest period.
    cone_std : float, or tuple, optional
        If float, The standard deviation to use for the cone plots.
        If tuple, Tuple of standard deviation values to use for the cone plots
         - The cone is a normal distribution with this standard deviation
             centered around a linear regression.
    benchmark_rets : pd.Series, optional
        Daily noncumulative returns of the benchmark.
         - This is in the same style as returns.
    bootstrap : boolean, optional
        Whether to perform bootstrap analysis for the performance
        metrics. Takes a few minutes longer.
    turnover_denom : str, optional
        Either AGB or portfolio_value, default AGB.
        - See full explanation in txn.get_turnover.
    header_rows : dict or OrderedDict, optional
        Extra rows to display at the top of the perf stats table.
    return_fig : boolean, optional
        If True, returns the figure that was plotted on.
    """
    if benchmark_rets is not None:
        returns = utils.clip_returns_to_benchmark(returns, benchmark_rets)

    plotting.show_perf_stats(
        returns,
        benchmark_rets,
        positions=positions,
        transactions=transactions,
        turnover_denom=turnover_denom,
        bootstrap=bootstrap,
        live_start_date=live_start_date,
        header_rows=header_rows,
    )

    plotting.show_worst_drawdown_periods(returns)

    vertical_sections = 11

    if live_start_date is not None:
        vertical_sections += 1
        live_start_date = ep.utils.get_utc_timestamp(live_start_date)

    if benchmark_rets is not None:
        vertical_sections += 1

    if bootstrap:
        vertical_sections += 1

    fig = plt.figure(figsize=(14, vertical_sections * 6))
    gs = gridspec.GridSpec(vertical_sections, 3, wspace=0.5, hspace=0.5)
    ax_rolling_returns = plt.subplot(gs[:2, :])

    i = 2
    ax_rolling_returns_vol_match = plt.subplot(gs[i, :], sharex=ax_rolling_returns)
    i += 1
    ax_rolling_returns_log = plt.subplot(gs[i, :], sharex=ax_rolling_returns)
    i += 1
    ax_returns = plt.subplot(gs[i, :], sharex=ax_rolling_returns)
    i += 1
    if benchmark_rets is not None:
        ax_rolling_beta = plt.subplot(gs[i, :], sharex=ax_rolling_returns)
        i += 1
    ax_rolling_volatility = plt.subplot(gs[i, :], sharex=ax_rolling_returns)
    i += 1
    ax_rolling_sharpe = plt.subplot(gs[i, :], sharex=ax_rolling_returns)
    i += 1
    ax_drawdown = plt.subplot(gs[i, :], sharex=ax_rolling_returns)
    i += 1
    ax_underwater = plt.subplot(gs[i, :], sharex=ax_rolling_returns)
    i += 1
    ax_monthly_heatmap = plt.subplot(gs[i, 0])
    ax_annual_returns = plt.subplot(gs[i, 1])
    ax_monthly_dist = plt.subplot(gs[i, 2])
    i += 1
    ax_return_quantiles = plt.subplot(gs[i, :])
    i += 1

    plotting.plot_rolling_returns(
        returns,
        factor_returns=benchmark_rets,
        live_start_date=live_start_date,
        cone_std=cone_std,
        ax=ax_rolling_returns,
    )
    ax_rolling_returns.set_title("Cumulative returns")

    plotting.plot_rolling_returns(
        returns,
        factor_returns=benchmark_rets,
        live_start_date=live_start_date,
        cone_std=None,
        volatility_match=(benchmark_rets is not None),
        legend_loc=None,
        ax=ax_rolling_returns_vol_match,
    )
    ax_rolling_returns_vol_match.set_title(
        "Cumulative returns volatility matched to benchmark"
    )

    plotting.plot_rolling_returns(
        returns,
        factor_returns=benchmark_rets,
        logy=True,
        live_start_date=live_start_date,
        cone_std=cone_std,
        ax=ax_rolling_returns_log,
    )
    ax_rolling_returns_log.set_title("Cumulative returns on logarithmic scale")

    plotting.plot_returns(
        returns,
        live_start_date=live_start_date,
        ax=ax_returns,
    )
    ax_returns.set_title("Returns")

    if benchmark_rets is not None:
        plotting.plot_rolling_beta(returns, benchmark_rets, ax=ax_rolling_beta)

    plotting.plot_rolling_volatility(
        returns, factor_returns=benchmark_rets, ax=ax_rolling_volatility
    )

    plotting.plot_rolling_sharpe(returns, ax=ax_rolling_sharpe)

    # Drawdowns
    plotting.plot_drawdown_periods(returns, top=5, ax=ax_drawdown)

    plotting.plot_drawdown_underwater(returns=returns, ax=ax_underwater)

    plotting.plot_monthly_returns_heatmap(returns, ax=ax_monthly_heatmap)
    plotting.plot_annual_returns(returns, ax=ax_annual_returns)
    plotting.plot_monthly_returns_dist(returns, ax=ax_monthly_dist)

    plotting.plot_return_quantiles(
        returns, live_start_date=live_start_date, ax=ax_return_quantiles
    )

    if bootstrap and (benchmark_rets is not None):
        ax_bootstrap = plt.subplot(gs[i, :])
        plotting.plot_perf_stats(returns, benchmark_rets, ax=ax_bootstrap)
    elif bootstrap:
        raise ValueError("bootstrap requires passing of benchmark_rets.")

    for ax in fig.axes:
        ax.tick_params(
            axis="x",
            which="major",
            bottom=True,
            top=False,
            labelbottom=True,
        )

    save_plot(fig,'Full Tear Sheet')
    
    if return_fig:
        return fig


@plotting.customize
def create_position_tear_sheet(
    returns,
    positions,
    show_and_plot_top_pos=2,
    hide_positions=False,
    sector_mappings=None,
    transactions=None,
    estimate_intraday="infer",
    return_fig=False,
):
    """
    Generate a number of plots for analyzing a
    strategy's positions and holdings.

    - Plots: gross leverage, exposures, top positions, and holdings.
    - Will also print the top positions held.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in create_full_tear_sheet.
    positions : pd.DataFrame
        Daily net position values.
         - See full explanation in create_full_tear_sheet.
    show_and_plot_top_pos : int, optional
        By default, this is 2, and both prints and plots the
        top 10 positions.
        If this is 0, it will only plot; if 1, it will only print.
    hide_positions : bool, optional
        If True, will not output any symbol names.
        Overrides show_and_plot_top_pos to 0 to suppress text output.
    sector_mappings : dict or pd.Series, optional
        Security identifier to sector mapping.
        Security ids as keys, sectors as values.
    transactions : pd.DataFrame, optional
        Prices and amounts of executed trades. One row per trade.
         - See full explanation in create_full_tear_sheet.
    estimate_intraday: boolean or str, optional
        Approximate returns for intraday strategies.
        See description in create_full_tear_sheet.
    return_fig : boolean, optional
        If True, returns the figure that was plotted on.
    """

    positions = utils.check_intraday(
        estimate_intraday, returns, positions, transactions
    )

    if hide_positions:
        show_and_plot_top_pos = 0
    vertical_sections = 7 if sector_mappings is not None else 6

    fig = plt.figure(figsize=(14, vertical_sections * 6))
    gs = gridspec.GridSpec(vertical_sections, 3, wspace=0.5, hspace=0.5)
    ax_exposures = plt.subplot(gs[0, :])
    ax_top_positions = plt.subplot(gs[1, :], sharex=ax_exposures)
    ax_max_median_pos = plt.subplot(gs[2, :], sharex=ax_exposures)
    ax_holdings = plt.subplot(gs[3, :], sharex=ax_exposures)
    ax_long_short_holdings = plt.subplot(gs[4, :])
    ax_gross_leverage = plt.subplot(gs[5, :], sharex=ax_exposures)

    positions_alloc = pos.get_percent_alloc(positions)

    plotting.plot_exposures(returns, positions, ax=ax_exposures)

    plotting.show_and_plot_top_positions(
        returns,
        positions_alloc,
        show_and_plot=show_and_plot_top_pos,
        hide_positions=hide_positions,
        ax=ax_top_positions,
    )

    plotting.plot_max_median_position_concentration(positions, ax=ax_max_median_pos)

    plotting.plot_holdings(returns, positions_alloc, ax=ax_holdings)

    plotting.plot_long_short_holdings(
        returns, positions_alloc, ax=ax_long_short_holdings
    )

    plotting.plot_gross_leverage(returns, positions, ax=ax_gross_leverage)

    if sector_mappings is not None:
        sector_exposures = pos.get_sector_exposures(positions, sector_mappings)
        if len(sector_exposures.columns) > 1:
            sector_alloc = pos.get_percent_alloc(sector_exposures)
            sector_alloc = sector_alloc.drop("cash", axis="columns")
            ax_sector_alloc = plt.subplot(gs[6, :], sharex=ax_exposures)
            plotting.plot_sector_allocations(returns, sector_alloc, ax=ax_sector_alloc)

    for ax in fig.axes:
        ax.tick_params(
            axis="x",
            which="major",
            bottom=True,
            top=False,
            labelbottom=True,
        )

    save_plot(fig,'Position Tear Sheet')

    if return_fig:
        return fig


@plotting.customize
def create_txn_tear_sheet(
    returns,
    positions,
    transactions,
    turnover_denom="AGB",
    unadjusted_returns=None,
    estimate_intraday="infer",
    return_fig=False,
):
    """
    Generate a number of plots for analyzing a strategy's transactions.

    Plots: turnover, daily volume, and a histogram of daily volume.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in create_full_tear_sheet.
    positions : pd.DataFrame
        Daily net position values.
         - See full explanation in create_full_tear_sheet.
    transactions : pd.DataFrame
        Prices and amounts of executed trades. One row per trade.
         - See full explanation in create_full_tear_sheet.
    turnover_denom : str, optional
        Either AGB or portfolio_value, default AGB.
        - See full explanation in txn.get_turnover.
    unadjusted_returns : pd.Series, optional
        Daily unadjusted returns of the strategy, noncumulative.
        Will plot additional swippage sweep analysis.
         - See pyfolio.plotting.plot_swippage_sleep and
           pyfolio.plotting.plot_slippage_sensitivity
    estimate_intraday: boolean or str, optional
        Approximate returns for intraday strategies.
        See description in create_full_tear_sheet.
    return_fig : boolean, optional
        If True, returns the figure that was plotted on.
    """

    positions = utils.check_intraday(
        estimate_intraday, returns, positions, transactions
    )

    vertical_sections = 6 if unadjusted_returns is not None else 4

    fig = plt.figure(figsize=(14, vertical_sections * 6))
    gs = gridspec.GridSpec(vertical_sections, 3, wspace=0.5, hspace=0.5)
    ax_turnover = plt.subplot(gs[0, :])
    ax_daily_volume = plt.subplot(gs[1, :], sharex=ax_turnover)
    ax_turnover_hist = plt.subplot(gs[2, :])
    ax_txn_timings = plt.subplot(gs[3, :])

    plotting.plot_turnover(
        returns,
        transactions,
        positions,
        turnover_denom=turnover_denom,
        ax=ax_turnover,
    )

    plotting.plot_daily_volume(returns, transactions, ax=ax_daily_volume)

    try:
        plotting.plot_daily_turnover_hist(
            transactions,
            positions,
            turnover_denom=turnover_denom,
            ax=ax_turnover_hist,
        )
    except ValueError:
        warnings.warn("Unable to generate turnover plot.", UserWarning)

    plotting.plot_txn_time_hist(transactions, ax=ax_txn_timings)

    if unadjusted_returns is not None:
        ax_slippage_sweep = plt.subplot(gs[4, :])
        plotting.plot_slippage_sweep(
            unadjusted_returns, positions, transactions, ax=ax_slippage_sweep
        )
        ax_slippage_sensitivity = plt.subplot(gs[5, :])
        plotting.plot_slippage_sensitivity(
            unadjusted_returns,
            positions,
            transactions,
            ax=ax_slippage_sensitivity,
        )
    for ax in fig.axes:
        ax.tick_params(
            axis="x",
            which="major",
            bottom=True,
            top=False,
            labelbottom=True,
        )

    save_plot(fig,'TXN Tear Sheet')

    if return_fig:
        return fig


@plotting.customize
def create_round_trip_tear_sheet(
    returns,
    positions,
    transactions,
    sector_mappings=None,
    estimate_intraday="infer",
    return_fig=False,
):
    """
    Generate a number of figures and plots describing the duration,
    frequency, and profitability of trade "round trips."
    A round trip is started when a new long or short position is
    opened and is only completed when the number of shares in that
    position returns to or crosses zero.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in create_full_tear_sheet.
    positions : pd.DataFrame
        Daily net position values.
         - See full explanation in create_full_tear_sheet.
    transactions : pd.DataFrame
        Prices and amounts of executed trades. One row per trade.
         - See full explanation in create_full_tear_sheet.
    sector_mappings : dict or pd.Series, optional
        Security identifier to sector mapping.
        Security ids as keys, sectors as values.
    estimate_intraday: boolean or str, optional
        Approximate returns for intraday strategies.
        See description in create_full_tear_sheet.
    return_fig : boolean, optional
        If True, returns the figure that was plotted on.
    """

    positions = utils.check_intraday(
        estimate_intraday, returns, positions, transactions
    )

    transactions_closed = round_trips.add_closing_transactions(positions, transactions)
    # extract_round_trips requires BoD portfolio_value
    trades = round_trips.extract_round_trips(
        transactions_closed,
        portfolio_value=positions.sum(axis="columns") / (1 + returns),
    )

    if len(trades) < 5:
        warnings.warn(
            """Fewer than 5 round-trip trades made.
               Skipping round trip tearsheet.""",
            UserWarning,
        )
        return

    round_trips.print_round_trip_stats(trades)

    plotting.show_profit_attribution(trades)

    if sector_mappings is not None:
        sector_trades = round_trips.apply_sector_mappings_to_round_trips(
            trades, sector_mappings
        )
        plotting.show_profit_attribution(sector_trades)

    fig = plt.figure(figsize=(14, 3 * 6))

    gs = gridspec.GridSpec(3, 2, wspace=0.5, hspace=0.5)

    ax_trade_lifetimes = plt.subplot(gs[0, :])
    ax_prob_profit_trade = plt.subplot(gs[1, 0])
    ax_holding_time = plt.subplot(gs[1, 1])
    ax_pnl_per_round_trip_dollars = plt.subplot(gs[2, 0])
    ax_pnl_per_round_trip_pct = plt.subplot(gs[2, 1])

    plotting.plot_round_trip_lifetimes(trades, ax=ax_trade_lifetimes)

    plotting.plot_prob_profit_trade(trades, ax=ax_prob_profit_trade)

    trade_holding_times = [x.days for x in trades["duration"]]
    sns.histplot(trade_holding_times, ax=ax_holding_time)
    ax_holding_time.set(xlabel="Holding time in days")

    sns.histplot(trades.pnl, ax=ax_pnl_per_round_trip_dollars)
    ax_pnl_per_round_trip_dollars.set(xlabel="PnL per round-trip trade in $")

    sns.histplot(trades.returns.dropna() * 100, ax=ax_pnl_per_round_trip_pct)
    ax_pnl_per_round_trip_pct.set(xlabel="Round-trip returns in %")

    gs.tight_layout(fig)

    save_plot(fig,'Round Trip Tear Sheet')

    if return_fig:
        return fig


@plotting.customize
def create_interesting_times_tear_sheet(
    returns,
    benchmark_rets=None,
    periods=None,
    legend_loc="best",
    return_fig=False,
):
    """
    Generate a number of returns plots around interesting points in time,
    like the flash crash and 9/11.

    Plots: returns around the dotcom bubble burst, Lehmann Brothers' failure,
    9/11, US downgrade and EU debt crisis, Fukushima meltdown, US housing
    bubble burst, EZB IR, Great Recession (August 2007, March and September
    of 2008, Q1 & Q2 2009), flash crash, April and October 2014.

    benchmark_rets must be passed, as it is meaningless to analyze performance
    during interesting times without some benchmark to refer to.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in create_full_tear_sheet.
    benchmark_rets : pd.Series
        Daily noncumulative returns of the benchmark.
         - This is in the same style as returns.
    periods: dict or OrderedDict, optional
        historical event dates that may have had significant
        impact on markets
    legend_loc : plt.legend_loc, optional
         The legend's location.
    return_fig : boolean, optional
        If True, returns the figure that was plotted on.
    """
    logging.info('Running create_interesting_times_tear_sheet')

    rets_interesting = timeseries.extract_interesting_date_ranges(returns, periods)

    if not rets_interesting:
        warnings.warn(
            "Passed returns do not overlap with any" "interesting times.",
            UserWarning,
        )
        return

    utils.print_table(
        pd.DataFrame(rets_interesting)
        .describe()
        .transpose()
        .loc[:, ["mean", "min", "max"]]
        * 100,
        name="Stress Events",
        float_format="{0:.2f}%".format,
    )

    if benchmark_rets is not None:
        returns = utils.clip_returns_to_benchmark(returns, benchmark_rets)

        bmark_interesting = timeseries.extract_interesting_date_ranges(
            benchmark_rets, periods
        )

    num_plots = len(rets_interesting)
    # 2 plots, 1 row; 3 plots, 2 rows; 4 plots, 2 rows; etc.
    num_rows = int((num_plots + 1) / 2.0)
    fig = plt.figure(figsize=(14, num_rows * 6.0))
    gs = gridspec.GridSpec(num_rows, 2, wspace=0.5, hspace=0.5)

    for i, (name, rets_period) in enumerate(rets_interesting.items()):
        # i=0 -> 0, i=1 -> 0, i=2 -> 1 ;; i=0 -> 0, i=1 -> 1, i=2 -> 0
        ax = plt.subplot(gs[int(i / 2.0), i % 2])

        ep.cum_returns(rets_period).plot(
            ax=ax, color="forestgreen", label="algo", alpha=0.7, lw=2
        )

        if benchmark_rets is not None:
            ep.cum_returns(bmark_interesting[name]).plot(
                ax=ax, color="gray", label="benchmark", alpha=0.6
            )
            ax.legend(
                ["Algo", "benchmark"],
                loc=legend_loc,
                frameon=True,
                framealpha=0.5,
            )
        else:
            ax.legend(["Algo"], loc=legend_loc, frameon=True, framealpha=0.5)

        ax.set_title(name)
        ax.set_ylabel("Returns")
        ax.set_xlabel("")

    save_plot(fig,'Interesting Times Tear Sheet')

    if return_fig:
        return fig


@plotting.customize
def create_capacity_tear_sheet(
    returns,
    positions,
    transactions,
    market_data,
    liquidation_daily_vol_limit=0.2,
    trade_daily_vol_limit=0.05,
    last_n_days=utils.APPROX_BDAYS_PER_MONTH * 6,
    days_to_liquidate_limit=1,
    estimate_intraday="infer",
    return_fig=False,
):
    """
    Generates a report detailing portfolio size constraints set by
    least liquid tickers. Plots a "capacity sweep," a curve describing
    projected sharpe ratio given the slippage penalties that are
    applied at various capital bases.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in create_full_tear_sheet.
    positions : pd.DataFrame
        Daily net position values.
         - See full explanation in create_full_tear_sheet.
    transactions : pd.DataFrame
        Prices and amounts of executed trades. One row per trade.
         - See full explanation in create_full_tear_sheet.
    market_data : pd.DataFrame
        Daily market_data
        - DataFrame has a multi-index index, one level is dates and another is
        market_data contains volume & price, equities as columns
    liquidation_daily_vol_limit : float
        Max proportion of a daily bar that can be consumed in the
        process of liquidating a position in the
        "days to liquidation" analysis.
    trade_daily_vol_limit : float
        Flag daily transaction totals that exceed proportion of
        daily bar.
    last_n_days : integer
        Compute max position allocation and dollar volume for only
        the last N days of the backtest
    days_to_liquidate_limit : integer
        Display all tickers with greater max days to liquidation.
    estimate_intraday: boolean or str, optional
        Approximate returns for intraday strategies.
        See description in create_full_tear_sheet.
    return_fig : boolean, optional
        If True, returns the figure that was plotted on.
    """

    positions = utils.check_intraday(
        estimate_intraday, returns, positions, transactions
    )

    print(
        "Max days to liquidation is computed for each traded name "
        "assuming a 20% limit on daily bar consumption \n"
        "and trailing 5 day mean volume as the available bar volume.\n\n"
        "Tickers with >1 day liquidation time at a"
        " constant $1m capital base:"
    )

    max_days_by_ticker = capacity.get_max_days_to_liquidate_by_ticker(
        positions,
        market_data,
        max_bar_consumption=liquidation_daily_vol_limit,
        capital_base=1e6,
        mean_volume_window=5,
    )
    max_days_by_ticker.index = max_days_by_ticker.index.map(utils.format_asset)

    print("Whole backtest:")
    utils.print_table(
        max_days_by_ticker[
            max_days_by_ticker.days_to_liquidate > days_to_liquidate_limit
        ]
    )

    max_days_by_ticker_lnd = capacity.get_max_days_to_liquidate_by_ticker(
        positions,
        market_data,
        max_bar_consumption=liquidation_daily_vol_limit,
        capital_base=1e6,
        mean_volume_window=5,
        last_n_days=last_n_days,
    )
    max_days_by_ticker_lnd.index = max_days_by_ticker_lnd.index.map(utils.format_asset)

    print("Last {} trading days:".format(last_n_days))
    utils.print_table(
        max_days_by_ticker_lnd[max_days_by_ticker_lnd.days_to_liquidate > 1]
    )

    llt = capacity.get_low_liquidity_transactions(transactions, market_data)
    llt.index = llt.index.map(utils.format_asset)

    print(
        "Tickers with daily transactions consuming >{}% of daily bar \n"
        "all backtest:".format(trade_daily_vol_limit * 100)
    )
    utils.print_table(llt[llt["max_pct_bar_consumed"] > trade_daily_vol_limit * 100])

    llt = capacity.get_low_liquidity_transactions(
        transactions, market_data, last_n_days=last_n_days
    )

    print("Last {} trading days:".format(last_n_days))
    utils.print_table(llt[llt["max_pct_bar_consumed"] > trade_daily_vol_limit * 100])

    bt_starting_capital = positions.iloc[0].sum() / (1 + returns.iloc[0])
    fig, ax_capacity_sweep = plt.subplots(figsize=(14, 6))
    plotting.plot_capacity_sweep(
        returns,
        transactions,
        market_data,
        bt_starting_capital,
        min_pv=100000,
        max_pv=300000000,
        step_size=1000000,
        ax=ax_capacity_sweep,
    )

    save_plot(fig,'Capacity Tear Sheet')

    if return_fig:
        return fig


@plotting.customize
def create_perf_attrib_tear_sheet(
    returns,
    positions,
    factor_returns,
    factor_loadings,
    transactions=None,
    pos_in_dollars=True,
    factor_partitions=FACTOR_PARTITIONS,
    return_fig=False,
):
    """
    Generate plots and tables for analyzing a strategy's performance.

    Parameters
    ----------
    returns : pd.Series
        Returns for each day in the date range.

    positions: pd.DataFrame
        Daily holdings (in dollars or percentages), indexed by date.
        Will be converted to percentages if positions are in dollars.
        Short positions show up as cash in the 'cash' column.

    factor_returns : pd.DataFrame
        Returns by factor, with date as index and factors as columns

    factor_loadings : pd.DataFrame
        Factor loadings for all days in the date range, with date
        and ticker as index, and factors as columns.

    transactions : pd.DataFrame, optional
        Prices and amounts of executed trades. One row per trade.
         - See full explanation in create_full_tear_sheet.
         - Default is None.

    pos_in_dollars : boolean, optional
        Flag indicating whether `positions` are in dollars or percentages
        If True, positions are in dollars.

    factor_partitions : dict
        dict specifying how factors should be separated in factor returns
        and risk exposures plots
        - Example:
          {'style': ['momentum', 'size', 'value', ...],
           'sector': ['technology', 'materials', ... ]}

    return_fig : boolean, optional
        If True, returns the figure that was plotted on.
    """
    portfolio_exposures, perf_attrib_data = perf_attrib.perf_attrib(
        returns,
        positions,
        factor_returns,
        factor_loadings,
        transactions,
        pos_in_dollars=pos_in_dollars,
    )

    display(Markdown("## Performance Relative to Common Risk Factors"))

    # aggregate perf attrib stats and show summary table
    perf_attrib.show_perf_attrib_stats(
        returns,
        positions,
        factor_returns,
        factor_loadings,
        transactions,
        pos_in_dollars,
    )

    # one section for the returns plot, and for each factor grouping
    # one section for factor returns, and one for risk exposures
    if factor_partitions is not None:
        vertical_sections = 1 + 2 * max(len(factor_partitions), 1)
    else:
        vertical_sections = 1 + 2

    current_section = 0

    fig = plt.figure(figsize=[14, vertical_sections * 6])

    gs = gridspec.GridSpec(vertical_sections, 1, wspace=0.5, hspace=0.5)

    perf_attrib.plot_returns(perf_attrib_data, ax=plt.subplot(gs[current_section]))
    current_section += 1

    if factor_partitions is not None:

        for factor_type, partitions in factor_partitions.items():
            columns_to_select = perf_attrib_data.columns.intersection(partitions)

            perf_attrib.plot_factor_contribution_to_perf(
                perf_attrib_data[columns_to_select],
                ax=plt.subplot(gs[current_section]),
                title=("Cumulative common {} returns attribution").format(factor_type),
            )
            current_section += 1

        for factor_type, partitions in factor_partitions.items():
            columns_to_select = portfolio_exposures.columns.intersection(partitions)

            perf_attrib.plot_risk_exposures(
                portfolio_exposures[columns_to_select],
                ax=plt.subplot(gs[current_section]),
                title="Daily {} factor exposures".format(factor_type),
            )
            current_section += 1

    else:

        perf_attrib.plot_factor_contribution_to_perf(
            perf_attrib_data, ax=plt.subplot(gs[current_section])
        )
        current_section += 1

        perf_attrib.plot_risk_exposures(
            portfolio_exposures, ax=plt.subplot(gs[current_section])
        )

    # gs.tight_layout(fig)
    save_plot(fig,'Perf Attribution Tear Sheet')

    if return_fig:
        return fig

timeseries.py

#
# Copyright 2018 Quantopian, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from collections import OrderedDict
from functools import partial

import empyrical as ep
import numpy as np
import pandas as pd
import scipy as sp
import scipy.stats as stats
from sklearn import linear_model

from .deprecate import deprecated
from .interesting_periods import PERIODS
from .txn import get_turnover
from .utils import APPROX_BDAYS_PER_MONTH, APPROX_BDAYS_PER_YEAR
from .utils import DAILY

DEPRECATION_WARNING = (
    "Risk functions in pyfolio.timeseries are deprecated "
    "and will be removed in a future release. Please "
    "install the empyrical package instead."
)


def var_cov_var_normal(P, c, mu=0, sigma=1):
    """
    Variance-covariance calculation of daily Value-at-Risk in a
    portfolio.

    Parameters
    ----------
    P : float
        Portfolio value.
    c : float
        Confidence level.
    mu : float, optional
        Mean.

    Returns
    -------
    float
        Variance-covariance.
    """

    alpha = sp.stats.norm.ppf(1 - c, mu, sigma)
    return P - P * (alpha + 1)


@deprecated(msg=DEPRECATION_WARNING)
def max_drawdown(returns):
    """
    Determines the maximum drawdown of a strategy.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.

    Returns
    -------
    float
        Maximum drawdown.

    Note
    -----
    See https://en.wikipedia.org/wiki/Drawdown_(economics) for more details.
    """

    return ep.max_drawdown(returns)


@deprecated(msg=DEPRECATION_WARNING)
def annual_return(returns, period=DAILY):
    """
    Determines the mean annual growth rate of returns.

    Parameters
    ----------
    returns : pd.Series
        Periodic returns of the strategy, noncumulative.
        - See full explanation in :func:`~pyfolio.timeseries.cum_returns`.
    period : str, optional
        Defines the periodicity of the 'returns' data for purposes of
        annualizing. Can be 'monthly', 'weekly', or 'daily'.
        - Defaults to 'daily'.

    Returns
    -------
    float
        Annual Return as CAGR (Compounded Annual Growth Rate).
    """

    return ep.annual_return(returns, period=period)


@deprecated(msg=DEPRECATION_WARNING)
def annual_volatility(returns, period=DAILY):
    """
    Determines the annual volatility of a strategy.

    Parameters
    ----------
    returns : pd.Series
        Periodic returns of the strategy, noncumulative.
        - See full explanation in :func:`~pyfolio.timeseries.cum_returns`.
    period : str, optional
        Defines the periodicity of the 'returns' data for purposes of
        annualizing volatility. Can be 'monthly' or 'weekly' or 'daily'.
        - Defaults to 'daily'.

    Returns
    -------
    float
        Annual volatility.
    """

    return ep.annual_volatility(returns, period=period)


@deprecated(msg=DEPRECATION_WARNING)
def calmar_ratio(returns, period=DAILY):
    """
    Determines the Calmar ratio, or drawdown ratio, of a strategy.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
        - See full explanation in :func:`~pyfolio.timeseries.cum_returns`.
    period : str, optional
        Defines the periodicity of the 'returns' data for purposes of
        annualizing. Can be 'monthly', 'weekly', or 'daily'.
        - Defaults to 'daily'.

    Returns
    -------
    float
        Calmar ratio (drawdown ratio) as float. Returns np.nan if there is no
        calmar ratio.

    Note
    -----
    See https://en.wikipedia.org/wiki/Calmar_ratio for more details.
    """

    return ep.calmar_ratio(returns, period=period)


@deprecated(msg=DEPRECATION_WARNING)
def omega_ratio(returns, annual_return_threshhold=0.0):
    """
    Determines the Omega ratio of a strategy.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
        - See full explanation in :func:`~pyfolio.timeseries.cum_returns`.
    annual_return_threshold : float, optional
        Minimum acceptable return of the investor. Annual threshold over which
        returns are considered positive or negative. It is converted to a
        value appropriate for the period of the returns for this ratio.
        E.g. An annual minimum acceptable return of 100 translates to a daily
        minimum acceptable return of 0.01848.
            (1 + 100) ** (1. / 252) - 1 = 0.01848
        Daily returns must exceed this value to be considered positive. The
        daily return yields the desired annual return when compounded over
        the average number of business days in a year.
            (1 + 0.01848) ** 252 - 1 = 99.93
        - Defaults to 0.0


    Returns
    -------
    float
        Omega ratio.

    Note
    -----
    See https://en.wikipedia.org/wiki/Omega_ratio for more details.
    """

    return ep.omega_ratio(returns, required_return=annual_return_threshhold)


@deprecated(msg=DEPRECATION_WARNING)
def sortino_ratio(returns, required_return=0, period=DAILY):
    """
    Determines the Sortino ratio of a strategy.

    Parameters
    ----------
    returns : pd.Series or pd.DataFrame
        Daily returns of the strategy, noncumulative.
        - See full explanation in :func:`~pyfolio.timeseries.cum_returns`.
    required_return: float / series
        minimum acceptable return
    period : str, optional
        Defines the periodicity of the 'returns' data for purposes of
        annualizing. Can be 'monthly', 'weekly', or 'daily'.
        - Defaults to 'daily'.

    Returns
    -------
    depends on input type
    series ==> float
    DataFrame ==> np.array

        Annualized Sortino ratio.
    """

    return ep.sortino_ratio(returns, required_return=required_return)


@deprecated(msg=DEPRECATION_WARNING)
def downside_risk(returns, required_return=0, period=DAILY):
    """
    Determines the downside deviation below a threshold

    Parameters
    ----------
    returns : pd.Series or pd.DataFrame
        Daily returns of the strategy, noncumulative.
        - See full explanation in :func:`~pyfolio.timeseries.cum_returns`.
    required_return: float / series
        minimum acceptable return
    period : str, optional
        Defines the periodicity of the 'returns' data for purposes of
        annualizing. Can be 'monthly', 'weekly', or 'daily'.
        - Defaults to 'daily'.

    Returns
    -------
    depends on input type
    series ==> float
    DataFrame ==> np.array

        Annualized downside deviation
    """

    return ep.downside_risk(returns, required_return=required_return, period=period)


@deprecated(msg=DEPRECATION_WARNING)
def sharpe_ratio(returns, risk_free=0, period=DAILY):
    """
    Determines the Sharpe ratio of a strategy.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
        - See full explanation in :func:`~pyfolio.timeseries.cum_returns`.
    risk_free : int, float
        Constant risk-free return throughout the period.
    period : str, optional
        Defines the periodicity of the 'returns' data for purposes of
        annualizing. Can be 'monthly', 'weekly', or 'daily'.
        - Defaults to 'daily'.

    Returns
    -------
    float
        Sharpe ratio.
    np.nan
        If insufficient length of returns or if if adjusted returns are 0.

    Note
    -----
    See https://en.wikipedia.org/wiki/Sharpe_ratio for more details.
    """

    return ep.sharpe_ratio(returns, risk_free=risk_free, period=period)


@deprecated(msg=DEPRECATION_WARNING)
def alpha_beta(returns, factor_returns):
    """
    Calculates both alpha and beta.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
        - See full explanation in :func:`~pyfolio.timeseries.cum_returns`.
    factor_returns : pd.Series
        Daily noncumulative returns of the benchmark factor to which betas are
        computed. Usually a benchmark such as market returns.
         - This is in the same style as returns.

    Returns
    -------
    float
        Alpha.
    float
        Beta.
    """

    return ep.alpha_beta(returns, factor_returns=factor_returns)


@deprecated(msg=DEPRECATION_WARNING)
def alpha(returns, factor_returns):
    """
    Calculates annualized alpha.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
        - See full explanation in :func:`~pyfolio.timeseries.cum_returns`.
    factor_returns : pd.Series
        Daily noncumulative returns of the benchmark factor to which betas are
        computed. Usually a benchmark such as market returns.
         - This is in the same style as returns.

    Returns
    -------
    float
        Alpha.
    """

    return ep.alpha(returns, factor_returns=factor_returns)


@deprecated(msg=DEPRECATION_WARNING)
def beta(returns, factor_returns):
    """
    Calculates beta.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
        - See full explanation in :func:`~pyfolio.timeseries.cum_returns`.
    factor_returns : pd.Series
        Daily noncumulative returns of the benchmark factor to which betas are
        computed. Usually a benchmark such as market returns.
         - This is in the same style as returns.

    Returns
    -------
    float
        Beta.
    """

    return ep.beta(returns, factor_returns)


@deprecated(msg=DEPRECATION_WARNING)
def stability_of_timeseries(returns):
    """
    Determines R-squared of a linear fit to the cumulative
    log returns. Computes an ordinary least squares linear fit,
    and returns R-squared.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
        - See full explanation in :func:`~pyfolio.timeseries.cum_returns`.

    Returns
    -------
    float
        R-squared.
    """

    return ep.stability_of_timeseries(returns)


@deprecated(msg=DEPRECATION_WARNING)
def tail_ratio(returns):
    """
    Determines the ratio between the right (95%) and left tail (5%).

    For example, a ratio of 0.25 means that losses are four times
    as bad as profits.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in :func:`~pyfolio.timeseries.cum_returns`.

    Returns
    -------
    float
        tail ratio
    """

    return ep.tail_ratio(returns)


def common_sense_ratio(returns):
    """
    Common sense ratio is the multiplication of the tail ratio and the
    Gain-to-Pain-Ratio -- sum(profits) / sum(losses).

    See http://bit.ly/1ORzGBk for more information on motivation of
    this metric.


    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.

    Returns
    -------
    float
        common sense ratio
    """

    return ep.tail_ratio(returns) * (1 + ep.annual_return(returns))


def normalize(returns, starting_value=1):
    """
    Normalizes a returns timeseries based on the first value.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    starting_value : float, optional
       The starting returns (default 1).

    Returns
    -------
    pd.Series
        Normalized returns.
    """

    return starting_value * (returns / returns.iloc[0])


@deprecated(msg=DEPRECATION_WARNING)
def cum_returns(returns, starting_value=0):
    """
    Compute cumulative returns from simple returns.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    starting_value : float, optional
       The starting returns (default 1).

    Returns
    -------
    pandas.Series
        Series of cumulative returns.

    Notes
    -----
    For increased numerical accuracy, convert input to log returns
    where it is possible to sum instead of multiplying.
    """

    return ep.cum_returns(returns, starting_value=starting_value)


@deprecated(msg=DEPRECATION_WARNING)
def aggregate_returns(returns, convert_to):
    """
    Aggregates returns by week, month, or year.

    Parameters
    ----------
    returns : pd.Series
       Daily returns of the strategy, noncumulative.
        - See full explanation in :func:`~pyfolio.timeseries.cum_returns`.
    convert_to : str
        Can be 'weekly', 'monthly', or 'yearly'.

    Returns
    -------
    pd.Series
        Aggregated returns.
    """

    return ep.aggregate_returns(returns, convert_to=convert_to)


def rolling_beta(returns, factor_returns, rolling_window=APPROX_BDAYS_PER_MONTH * 6):
    """
    Determines the rolling beta of a strategy.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    factor_returns : pd.Series or pd.DataFrame
        Daily noncumulative returns of the benchmark factor to which betas are
        computed. Usually a benchmark such as market returns.
         - If DataFrame is passed, computes rolling beta for each column.
         - This is in the same style as returns.
    rolling_window : int, optional
        The size of the rolling window, in days, over which to compute
        beta (default 6 months).

    Returns
    -------
    pd.Series
        Rolling beta.

    Note
    -----
    See https://en.wikipedia.org/wiki/Beta_(finance) for more details.
    """

    if factor_returns.ndim > 1:
        # Apply column-wise
        return factor_returns.apply(
            partial(rolling_beta, returns), rolling_window=rolling_window
        )
    else:
        out = pd.Series(index=returns.index, dtype="float64")
        for beg, end in zip(
            returns.index[0:-rolling_window], returns.index[rolling_window:]
        ):
            out.loc[end] = ep.beta(returns.loc[beg:end], factor_returns.loc[beg:end])

        return out


def rolling_regression(
    returns,
    factor_returns,
    rolling_window=APPROX_BDAYS_PER_MONTH * 6,
    nan_threshold=0.1,
):
    """
    Computes rolling factor betas using a multivariate linear regression
    (separate linear regressions is problematic because the factors may be
    confounded).

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    factor_returns : pd.DataFrame
        Daily noncumulative returns of the benchmark factor to which betas are
        computed. Usually a benchmark such as market returns.
         - Computes rolling beta for each column.
         - This is in the same style as returns.
    rolling_window : int, optional
        The days window over which to compute the beta. Defaults to 6 months.
    nan_threshold : float, optional
        If there are more than this fraction of NaNs, the rolling regression
        for the given date will be skipped.

    Returns
    -------
    pandas.DataFrame
        DataFrame containing rolling beta coefficients to SMB, HML and UMD
    """

    # We need to drop NaNs to regress
    ret_no_na = returns.dropna()

    columns = ["alpha"] + factor_returns.columns.tolist()
    rolling_risk = pd.DataFrame(columns=columns, index=ret_no_na.index)

    rolling_risk.index.name = "dt"

    for beg, end in zip(
        ret_no_na.index[:-rolling_window], ret_no_na.index[rolling_window:]
    ):
        returns_period = ret_no_na[beg:end]
        factor_returns_period = factor_returns.loc[returns_period.index]

        if np.all(factor_returns_period.isnull().mean()) < nan_threshold:
            factor_returns_period_dnan = factor_returns_period.dropna()
            reg = linear_model.LinearRegression(fit_intercept=True).fit(
                factor_returns_period_dnan,
                returns_period.loc[factor_returns_period_dnan.index],
            )
            rolling_risk.loc[end, factor_returns.columns] = reg.coef_
            rolling_risk.loc[end, "alpha"] = reg.intercept_

    return rolling_risk


def gross_lev(positions):
    """
    Calculates the gross leverage of a strategy.

    Parameters
    ----------
    positions : pd.DataFrame
        Daily net position values.
         - See full explanation in tears.create_full_tear_sheet.

    Returns
    -------
    pd.Series
        Gross leverage.
    """

    exposure = positions.drop("cash", axis=1).abs().sum(axis=1)
    return exposure / positions.sum(axis=1)


def value_at_risk(returns, period=None, sigma=2.0):
    """
    Get value at risk (VaR).

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    period : str, optional
        Period over which to calculate VaR. Set to 'weekly',
        'monthly', or 'yearly', otherwise defaults to period of
        returns (typically daily).
    sigma : float, optional
        Standard deviations of VaR, default 2.
    """
    if period is not None:
        returns_agg = ep.aggregate_returns(returns, period)
    else:
        returns_agg = returns.copy()

    value_at_risk = returns_agg.mean() - sigma * returns_agg.std()
    return value_at_risk


SIMPLE_STAT_FUNCS = [
    ep.annual_return,
    ep.cum_returns_final,
    ep.annual_volatility,
    ep.sharpe_ratio,
    ep.calmar_ratio,
    ep.stability_of_timeseries,
    ep.max_drawdown,
    ep.omega_ratio,
    ep.sortino_ratio,
    stats.skew,
    stats.kurtosis,
    ep.tail_ratio,
    value_at_risk,
]

FACTOR_STAT_FUNCS = [
    ep.alpha,
    ep.beta,
]

STAT_FUNC_NAMES = {
    "annual_return": "Annual return",
    "cum_returns_final": "Cumulative returns",
    "annual_volatility": "Annual volatility",
    "sharpe_ratio": "Sharpe ratio",
    "calmar_ratio": "Calmar ratio",
    "stability_of_timeseries": "Stability",
    "max_drawdown": "Max drawdown",
    "omega_ratio": "Omega ratio",
    "sortino_ratio": "Sortino ratio",
    "skew": "Skew",
    "kurtosis": "Kurtosis",
    "tail_ratio": "Tail ratio",
    "common_sense_ratio": "Common sense ratio",
    "value_at_risk": "Daily value at risk",
    "alpha": "Alpha",
    "beta": "Beta",
}


def perf_stats(
    returns,
    factor_returns=None,
    positions=None,
    transactions=None,
    turnover_denom="AGB",
):
    """
    Calculates various performance metrics of a strategy, for use in
    plotting.show_perf_stats.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    factor_returns : pd.Series, optional
        Daily noncumulative returns of the benchmark factor to which betas are
        computed. Usually a benchmark such as market returns.
         - This is in the same style as returns.
         - If None, do not compute alpha, beta, and information ratio.
    positions : pd.DataFrame
        Daily net position values.
         - See full explanation in tears.create_full_tear_sheet.
    transactions : pd.DataFrame
        Prices and amounts of executed trades. One row per trade.
        - See full explanation in tears.create_full_tear_sheet.
    turnover_denom : str
        Either AGB or portfolio_value, default AGB.
        - See full explanation in txn.get_turnover.

    Returns
    -------
    pd.Series
        Performance metrics.
    """

    stats = pd.Series(dtype="float64")
    for stat_func in SIMPLE_STAT_FUNCS:
        stats[STAT_FUNC_NAMES[stat_func.__name__]] = stat_func(returns)

    if not (positions is None or positions.empty):
        stats["Gross leverage"] = gross_lev(positions).mean()
        if not (transactions is None or transactions.empty):
            stats["Daily turnover"] = get_turnover(
                positions, transactions, turnover_denom
            ).mean()
    if factor_returns is not None:
        for stat_func in FACTOR_STAT_FUNCS:
            res = stat_func(returns, factor_returns)
            stats[STAT_FUNC_NAMES[stat_func.__name__]] = res

    return stats


def perf_stats_bootstrap(returns, factor_returns=None, return_stats=True, **kwargs):
    """Calculates various bootstrapped performance metrics of a strategy.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    factor_returns : pd.Series, optional
        Daily noncumulative returns of the benchmark factor to which betas are
        computed. Usually a benchmark such as market returns.
         - This is in the same style as returns.
         - If None, do not compute alpha, beta, and information ratio.
    return_stats : boolean (optional)
        If True, returns a DataFrame of mean, median, 5 and 95 percentiles
        for each perf metric.
        If False, returns a DataFrame with the bootstrap samples for
        each perf metric.

    Returns
    -------
    pd.DataFrame
        if return_stats is True:
        - Distributional statistics of bootstrapped sampling
        distribution of performance metrics.
        if return_stats is False:
        - Bootstrap samples for each performance metric.
    """

    bootstrap_values = OrderedDict()

    for stat_func in SIMPLE_STAT_FUNCS:
        stat_name = STAT_FUNC_NAMES[stat_func.__name__]
        bootstrap_values[stat_name] = calc_bootstrap(stat_func, returns)

    if factor_returns is not None:
        for stat_func in FACTOR_STAT_FUNCS:
            stat_name = STAT_FUNC_NAMES[stat_func.__name__]
            bootstrap_values[stat_name] = calc_bootstrap(
                stat_func, returns, factor_returns=factor_returns
            )

    bootstrap_values = pd.DataFrame(bootstrap_values)

    if return_stats:
        stats = bootstrap_values.apply(calc_distribution_stats)
        return stats.T[["mean", "median", "5%", "95%"]]
    else:
        return bootstrap_values


def calc_bootstrap(func, returns, *args, **kwargs):
    """Performs a bootstrap analysis on a user-defined function returning
    a summary statistic.

    Parameters
    ----------
    func : function
        Function that either takes a single array (commonly returns)
        or two arrays (commonly returns and factor returns) and
        returns a single value (commonly a summary
        statistic). Additional args and kwargs are passed as well.
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    factor_returns : pd.Series, optional
        Daily noncumulative returns of the benchmark factor to which betas are
        computed. Usually a benchmark such as market returns.
         - This is in the same style as returns.
    n_samples : int, optional
        Number of bootstrap samples to draw. Default is 1000.
        Increasing this will lead to more stable / accurate estimates.

    Returns
    -------
    numpy.ndarray
        Bootstrapped sampling distribution of passed in func.
    """

    n_samples = kwargs.pop("n_samples", 1000)
    out = np.empty(n_samples)

    factor_returns = kwargs.pop("factor_returns", None)

    for i in range(n_samples):
        idx = np.random.randint(len(returns), size=len(returns))
        returns_i = returns.iloc[idx].reset_index(drop=True)
        if factor_returns is not None:
            factor_returns_i = factor_returns.iloc[idx].reset_index(drop=True)
            out[i] = func(returns_i, factor_returns_i, *args, **kwargs)
        else:
            out[i] = func(returns_i, *args, **kwargs)

    return out


def calc_distribution_stats(x):
    """Calculate various summary statistics of data.

    Parameters
    ----------
    x : numpy.ndarray or pandas.Series
        Array to compute summary statistics for.

    Returns
    -------
    pandas.Series
        Series containing mean, median, std, as well as 5, 25, 75 and
        95 percentiles of passed in values.
    """

    return pd.Series(
        {
            "mean": np.mean(x),
            "median": np.median(x),
            "std": np.std(x),
            "5%": np.percentile(x, 5),
            "25%": np.percentile(x, 25),
            "75%": np.percentile(x, 75),
            "95%": np.percentile(x, 95),
            "IQR": np.subtract.reduce(np.percentile(x, [75, 25])),
        }
    )


def get_max_drawdown_underwater(underwater):
    """
    Determines peak, valley, and recovery dates given an 'underwater'
    DataFrame.

    An underwater DataFrame is a DataFrame that has precomputed
    rolling drawdown.

    Parameters
    ----------
    underwater : pd.Series
       Underwater returns (rolling drawdown) of a strategy.

    Returns
    -------
    peak : datetime
        The maximum drawdown's peak.
    valley : datetime
        The maximum drawdown's valley.
    recovery : datetime
        The maximum drawdown's recovery.
    """

    valley = underwater.idxmin()  # end of the period
    # Find first 0
    peak = underwater[:valley][underwater[:valley] == 0].index[-1]
    # Find last 0
    try:
        recovery = underwater[valley:][underwater[valley:] == 0].index[0]
    except IndexError:
        recovery = np.nan  # drawdown not recovered
    return peak, valley, recovery


def get_max_drawdown(returns):
    """
    Determines the maximum drawdown of a strategy.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
        - See full explanation in :func:`~pyfolio.timeseries.cum_returns`.

    Returns
    -------
    float
        Maximum drawdown.

    Note
    -----
    See https://en.wikipedia.org/wiki/Drawdown_(economics) for more details.
    """

    returns = returns.copy()
    df_cum = ep.cum_returns(returns, 1.0)
    running_max = np.maximum.accumulate(df_cum)
    underwater = df_cum / running_max - 1
    return get_max_drawdown_underwater(underwater)


def get_top_drawdowns(returns, top=10):
    """
    Finds top drawdowns, sorted by drawdown amount.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    top : int, optional
        The amount of top drawdowns to find (default 10).

    Returns
    -------
    drawdowns : list
        List of drawdown peaks, valleys, and recoveries. See get_max_drawdown.
    """

    returns = returns.copy()
    df_cum = ep.cum_returns(returns, 1.0)
    running_max = np.maximum.accumulate(df_cum)
    underwater = df_cum / running_max - 1

    drawdowns = []
    for _ in range(top):
        peak, valley, recovery = get_max_drawdown_underwater(underwater)
        # Slice out draw-down period
        if not pd.isnull(recovery):
            underwater.drop(underwater[peak:recovery].index[1:-1], inplace=True)
        else:
            # drawdown has not ended yet
            underwater = underwater.loc[:peak]

        drawdowns.append((peak, valley, recovery))
        if (len(returns) == 0) or (len(underwater) == 0) or (np.min(underwater) == 0):
            break

    return drawdowns


def gen_drawdown_table(returns, top=10):
    """
    Places top drawdowns in a table.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    top : int, optional
        The amount of top drawdowns to find (default 10).

    Returns
    -------
    df_drawdowns : pd.DataFrame
        Information about top drawdowns.
    """

    df_cum = ep.cum_returns(returns, 1.0)
    drawdown_periods = get_top_drawdowns(returns, top=top)
    df_drawdowns = pd.DataFrame(
        index=list(range(top)),
        columns=[
            "Net drawdown in %",
            "Peak date",
            "Valley date",
            "Recovery date",
            "Duration",
        ],
    )

    for i, (peak, valley, recovery) in enumerate(drawdown_periods):
        if pd.isnull(recovery):
            df_drawdowns.loc[i, "Duration"] = np.nan
        else:
            df_drawdowns.loc[i, "Duration"] = len(
                pd.date_range(peak, recovery, freq="B")
            )
        df_drawdowns.loc[i, "Peak date"] = peak.to_pydatetime().strftime("%Y-%m-%d")
        df_drawdowns.loc[i, "Valley date"] = valley.to_pydatetime().strftime("%Y-%m-%d")
        if isinstance(recovery, float):
            df_drawdowns.loc[i, "Recovery date"] = recovery
        else:
            df_drawdowns.loc[i, "Recovery date"] = recovery.to_pydatetime().strftime(
                "%Y-%m-%d"
            )
        df_drawdowns.loc[i, "Net drawdown in %"] = (
            (df_cum.loc[peak] - df_cum.loc[valley]) / df_cum.loc[peak]
        ) * 100

    df_drawdowns["Peak date"] = pd.to_datetime(df_drawdowns["Peak date"])
    df_drawdowns["Valley date"] = pd.to_datetime(df_drawdowns["Valley date"])
    df_drawdowns["Recovery date"] = pd.to_datetime(df_drawdowns["Recovery date"])

    return df_drawdowns


def rolling_volatility(returns, rolling_vol_window):
    """
    Determines the rolling volatility of a strategy.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    rolling_vol_window : int
        Length of rolling window, in days, over which to compute.

    Returns
    -------
    pd.Series
        Rolling volatility.
    """

    return returns.rolling(rolling_vol_window).std() * np.sqrt(APPROX_BDAYS_PER_YEAR)


def rolling_sharpe(returns, rolling_sharpe_window):
    """
    Determines the rolling Sharpe ratio of a strategy.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    rolling_sharpe_window : int
        Length of rolling window, in days, over which to compute.

    Returns
    -------
    pd.Series
        Rolling Sharpe ratio.

    Note
    -----
    See https://en.wikipedia.org/wiki/Sharpe_ratio for more details.
    """

    return (
        returns.rolling(rolling_sharpe_window).mean()
        / returns.rolling(rolling_sharpe_window).std()
        * np.sqrt(APPROX_BDAYS_PER_YEAR)
    )


def simulate_paths(
    is_returns, num_days, starting_value=1, num_samples=1000, random_seed=None
):
    """
    Gnerate alternate paths using available values from in-sample returns.

    Parameters
    ----------
    is_returns : pandas.core.frame.DataFrame
        Non-cumulative in-sample returns.
    num_days : int
        Number of days to project the probability cone forward.
    starting_value : int or float
        Starting value of the out of sample period.
    num_samples : int
        Number of samples to draw from the in-sample daily returns.
        Each sample will be an array with length num_days.
        A higher number of samples will generate a more accurate
        bootstrap cone.
    random_seed : int
        Seed for the pseudorandom number generator used by the pandas
        sample method.

    Returns
    -------
    samples : numpy.ndarray
    """

    samples = np.empty((num_samples, num_days))
    seed = np.random.RandomState(seed=random_seed)
    for i in range(num_samples):
        samples[i, :] = is_returns.sample(num_days, replace=True, random_state=seed)

    return samples


def summarize_paths(samples, cone_std=(1.0, 1.5, 2.0), starting_value=1.0):
    """
    Gnerate the upper and lower bounds of an n standard deviation
    cone of forecasted cumulative returns.

    Parameters
    ----------
    samples : numpy.ndarray
        Alternative paths, or series of possible outcomes.
    cone_std : list of int/float
        Number of standard devations to use in the boundaries of
        the cone. If multiple values are passed, cone bounds will
        be generated for each value.

    Returns
    -------
    samples : pandas.core.frame.DataFrame
    """

    cum_samples = ep.cum_returns(samples.T, starting_value=starting_value).T

    cum_mean = cum_samples.mean(axis=0)
    cum_std = cum_samples.std(axis=0)

    if isinstance(cone_std, (float, int)):
        cone_std = [cone_std]

    cone_bounds = pd.DataFrame(columns=pd.Index([], dtype="float64"))
    for num_std in cone_std:
        cone_bounds.loc[:, float(num_std)] = cum_mean + cum_std * num_std
        cone_bounds.loc[:, float(-num_std)] = cum_mean - cum_std * num_std

    return cone_bounds


def forecast_cone_bootstrap(
    is_returns,
    num_days,
    cone_std=(1.0, 1.5, 2.0),
    starting_value=1,
    num_samples=1000,
    random_seed=None,
):
    """
    Determines the upper and lower bounds of an n standard deviation
    cone of forecasted cumulative returns. Future cumulative mean and
    standard devation are computed by repeatedly sampling from the
    in-sample daily returns (i.e. bootstrap). This cone is non-parametric,
    meaning it does not assume that returns are normally distributed.

    Parameters
    ----------
    is_returns : pd.Series
        In-sample daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    num_days : int
        Number of days to project the probability cone forward.
    cone_std : int, float, or list of int/float
        Number of standard devations to use in the boundaries of
        the cone. If multiple values are passed, cone bounds will
        be generated for each value.
    starting_value : int or float
        Starting value of the out of sample period.
    num_samples : int
        Number of samples to draw from the in-sample daily returns.
        Each sample will be an array with length num_days.
        A higher number of samples will generate a more accurate
        bootstrap cone.
    random_seed : int
        Seed for the pseudorandom number generator used by the pandas
        sample method.

    Returns
    -------
    pd.DataFrame
        Contains upper and lower cone boundaries. Column names are
        strings corresponding to the number of standard devations
        above (positive) or below (negative) the projected mean
        cumulative returns.
    """

    samples = simulate_paths(
        is_returns=is_returns,
        num_days=num_days,
        starting_value=starting_value,
        num_samples=num_samples,
        random_seed=random_seed,
    )

    cone_bounds = summarize_paths(
        samples=samples, cone_std=cone_std, starting_value=starting_value
    )

    return cone_bounds


def extract_interesting_date_ranges(returns, periods=None):
    """
    Extracts returns based on interesting events. See
    gen_date_range_interesting.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.

    Returns
    -------
    ranges : OrderedDict
        Date ranges, with returns, of all valid events.
    """
    if periods is None:
        periods = PERIODS

    returns_dupe = returns.copy()
    returns_dupe.index = returns_dupe.index.map(pd.Timestamp)
    ranges = OrderedDict()
    for name, (start, end) in periods.items():
        try:
            period = returns_dupe.loc[start:end]
            if len(period) == 0:
                continue
            ranges[name] = period
        except BaseException:
            continue

    return ranges

plotting.py

#
# Copyright 2018 Quantopian, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import datetime
import calendar
from collections import OrderedDict
from functools import wraps

import empyrical as ep
import matplotlib
import matplotlib.patches as patches
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pytz
import scipy as sp
from matplotlib import figure
from matplotlib.backends.backend_agg import FigureCanvasAgg
from matplotlib.ticker import FuncFormatter
import os
import seaborn as sns
from . import capacity
from . import pos
from . import timeseries
from . import txn
from . import utils
from .utils import APPROX_BDAYS_PER_MONTH, MM_DISPLAY_UNIT


def customize(func):
    """
    Decorator to set plotting context and axes style during function call.
    """

    @wraps(func)
    def call_w_context(*args, **kwargs):
        set_context = kwargs.pop("set_context", True)
        if set_context:
            with plotting_context(), axes_style():
                return func(*args, **kwargs)
        else:
            return func(*args, **kwargs)

    return call_w_context


def plotting_context(context="notebook", font_scale=1.5, rc=None):
    """
    Create pyfolio default plotting style context.

    Under the hood, calls and returns seaborn.plotting_context() with
    some custom settings. Usually you would use in a with-context.

    Parameters
    ----------
    context : str, optional
        Name of seaborn context.
    font_scale : float, optional
        Scale font by factor font_scale.
    rc : dict, optional
        Config flags.
        By default, {'lines.linewidth': 1.5}
        is being used and will be added to any
        rc passed in, unless explicitly overriden.

    Returns
    -------
    seaborn plotting context

    Example
    -------
    >>> with pyfolio.plotting.plotting_context(font_scale=2):
    >>>    pyfolio.create_full_tear_sheet(..., set_context=False)

    See also
    --------
    For more information, see seaborn.plotting_context().

    """
    if rc is None:
        rc = {}

    rc_default = {"lines.linewidth": 1.5}

    # Add defaults if they do not exist
    for name, val in rc_default.items():
        rc.setdefault(name, val)

    return sns.plotting_context(context=context, font_scale=font_scale, rc=rc)


def axes_style(style="darkgrid", rc=None):
    """
    Create pyfolio default axes style context.

    Under the hood, calls and returns seaborn.axes_style() with
    some custom settings. Usually you would use in a with-context.

    Parameters
    ----------
    style : str, optional
        Name of seaborn style.
    rc : dict, optional
        Config flags.

    Returns
    -------
    seaborn plotting context

    Example
    -------
    >>> with pyfolio.plotting.axes_style(style='whitegrid'):
    >>>    pyfolio.create_full_tear_sheet(..., set_context=False)

    See also
    --------
    For more information, see seaborn.plotting_context().

    """
    if rc is None:
        rc = {}

    rc_default = {}

    # Add defaults if they do not exist
    for name, val in rc_default.items():
        rc.setdefault(name, val)

    return sns.axes_style(style=style, rc=rc)


def plot_monthly_returns_heatmap(returns, ax=None, **kwargs):
    """
    Plots a heatmap of returns by month.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to seaborn plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    monthly_ret_table = ep.aggregate_returns(returns, "monthly")
    monthly_ret_table = monthly_ret_table.unstack().round(3)

    monthly_ret_table.rename(
        columns={i: m for i, m in enumerate(calendar.month_abbr)}, inplace=True
    )

    sns.heatmap(
        monthly_ret_table.fillna(0) * 100.0,
        annot=True,
        annot_kws={"size": 9},
        alpha=1.0,
        center=0.0,
        cbar=False,
        cmap=matplotlib.cm.RdYlGn,
        ax=ax,
        **kwargs,
    )
    ax.set_ylabel("Year")
    ax.set_xlabel("Month")
    ax.set_title("Monthly returns (%)")
    return ax


def plot_annual_returns(returns, ax=None, **kwargs):
    """
    Plots a bar graph of returns by year.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    x_axis_formatter = FuncFormatter(utils.percentage)
    ax.xaxis.set_major_formatter(FuncFormatter(x_axis_formatter))
    ax.tick_params(axis="x", which="major")

    ann_ret_df = pd.DataFrame(ep.aggregate_returns(returns, "yearly"))

    ax.axvline(
        100 * ann_ret_df.values.mean(),
        color="red",
        linestyle="--",
        lw=1,
        alpha=0.7,
    )
    (100 * ann_ret_df.sort_index(ascending=False)).plot(
        ax=ax, kind="barh", alpha=0.70, **kwargs
    )
    ax.axvline(0.0, color="black", linestyle="-", lw=2)

    ax.set_ylabel("Year")
    ax.set_xlabel("Returns")
    ax.set_title("Annual returns")
    ax.legend(["Mean"], frameon=True, framealpha=0.5)
    return ax


def plot_monthly_returns_dist(returns, ax=None, **kwargs):
    """
    Plots a distribution of monthly returns.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    x_axis_formatter = FuncFormatter(utils.percentage)
    ax.xaxis.set_major_formatter(FuncFormatter(x_axis_formatter))
    ax.tick_params(axis="x", which="major")

    monthly_ret_table = ep.aggregate_returns(returns, "monthly")

    ax.hist(
        100 * monthly_ret_table,
        color="steelblue",
        alpha=0.80,
        bins=20,
        **kwargs,
    )

    ax.axvline(
        100 * monthly_ret_table.mean(),
        color="red",
        linestyle="--",
        lw=1,
        alpha=1.0,
    )

    ax.axvline(0.0, color="black", linestyle="-", lw=1, alpha=0.75)
    ax.legend(["Mean"], frameon=True, framealpha=0.5)
    ax.set_ylabel("Number of months")
    ax.set_xlabel("Returns")
    ax.set_title("Distribution of monthly returns")
    return ax


def plot_holdings(returns, positions, legend_loc="best", ax=None, **kwargs):
    """
    Plots total amount of stocks with an active position, either short
    or long. Displays daily total, daily average per month, and
    all-time daily average.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    positions : pd.DataFrame, optional
        Daily net position values.
         - See full explanation in tears.create_full_tear_sheet.
    legend_loc : matplotlib.loc, optional
        The location of the legend on the plot.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    positions = positions.copy().drop("cash", axis="columns")
    df_holdings = positions.replace(0, np.nan).count(axis=1)
    df_holdings_by_month = df_holdings.resample("1M").mean()
    df_holdings.plot(color="steelblue", alpha=0.6, lw=0.5, ax=ax, **kwargs)
    df_holdings_by_month.plot(color="orangered", lw=2, ax=ax, **kwargs)
    ax.axhline(df_holdings.values.mean(), color="steelblue", ls="--", lw=3)

    ax.set_xlim((returns.index[0], returns.index[-1]))

    leg = ax.legend(
        [
            "Daily holdings",
            "Average daily holdings, by month",
            "Average daily holdings, overall",
        ],
        loc=legend_loc,
        frameon=True,
        framealpha=0.5,
    )
    leg.get_frame().set_edgecolor("black")

    ax.set_title("Total holdings")
    ax.set_ylabel("Holdings")
    ax.set_xlabel("")
    return ax


def plot_long_short_holdings(
    returns, positions, legend_loc="upper left", ax=None, **kwargs
):
    """
    Plots total amount of stocks with an active position, breaking out
    short and long into transparent filled regions.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    positions : pd.DataFrame, optional
        Daily net position values.
         - See full explanation in tears.create_full_tear_sheet.
    legend_loc : matplotlib.loc, optional
        The location of the legend on the plot.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.

    """

    if ax is None:
        ax = plt.gca()

    positions = positions.drop("cash", axis="columns")
    positions = positions.replace(0, np.nan)
    df_longs = positions[positions > 0].count(axis=1)
    df_shorts = positions[positions < 0].count(axis=1)
    lf = ax.fill_between(
        df_longs.index, 0, df_longs.values, color="g", alpha=0.5, lw=2.0
    )
    sf = ax.fill_between(
        df_shorts.index, 0, df_shorts.values, color="r", alpha=0.5, lw=2.0
    )

    bf = patches.Rectangle([0, 0], 1, 1, color="darkgoldenrod")
    leg = ax.legend(
        [lf, sf, bf],
        [
            "Long (max: %s, min: %s)" % (df_longs.max(), df_longs.min()),
            "Short (max: %s, min: %s)" % (df_shorts.max(), df_shorts.min()),
            "Overlap",
        ],
        loc=legend_loc,
        frameon=True,
        framealpha=0.5,
    )
    leg.get_frame().set_edgecolor("black")

    ax.set_xlim((returns.index[0], returns.index[-1]))
    ax.set_title("Long and short holdings")
    ax.set_ylabel("Holdings")
    ax.set_xlabel("")
    return ax


def plot_drawdown_periods(returns, top=10, ax=None, **kwargs):
    """
    Plots cumulative returns highlighting top drawdown periods.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    top : int, optional
        Amount of top drawdowns periods to plot (default 10).
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    y_axis_formatter = FuncFormatter(utils.two_dec_places)
    ax.yaxis.set_major_formatter(FuncFormatter(y_axis_formatter))

    df_cum_rets = ep.cum_returns(returns, starting_value=1.0)
    df_drawdowns = timeseries.gen_drawdown_table(returns, top=top)

    df_cum_rets.plot(ax=ax, **kwargs)

    lim = ax.get_ylim()
    colors = sns.cubehelix_palette(len(df_drawdowns))[::-1]
    for i, (peak, recovery) in df_drawdowns[["Peak date", "Recovery date"]].iterrows():
        if pd.isnull(recovery):
            recovery = returns.index[-1]
        ax.fill_between((peak, recovery), lim[0], lim[1], alpha=0.4, color=colors[i])
    ax.set_ylim(lim)
    ax.set_title("Top %i drawdown periods" % top)
    ax.set_ylabel("Cumulative returns")
    ax.legend(["Portfolio"], loc="upper left", frameon=True, framealpha=0.5)
    ax.set_xlabel("")
    return ax


def plot_drawdown_underwater(returns, ax=None, **kwargs):
    """
    Plots how far underwaterr returns are over time, or plots current
    drawdown vs. date.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    y_axis_formatter = FuncFormatter(utils.percentage)
    ax.yaxis.set_major_formatter(FuncFormatter(y_axis_formatter))

    df_cum_rets = ep.cum_returns(returns, starting_value=1.0)
    running_max = np.maximum.accumulate(df_cum_rets)
    underwater = -100 * ((running_max - df_cum_rets) / running_max)
    underwater.plot(ax=ax, kind="area", color="salmon", alpha=0.7, **kwargs)
    ax.set_ylabel("Drawdown")
    ax.set_title("Underwater plot")
    ax.set_xlabel("")
    return ax


def plot_perf_stats(returns, factor_returns, ax=None):
    """
    Create box plot of some performance metrics of the strategy.
    The width of the box whiskers is determined by a bootstrap.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    factor_returns : pd.Series
        Daily noncumulative returns of the benchmark factor to which betas are
        computed. Usually a benchmark such as market returns.
         - This is in the same style as returns.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    bootstrap_values = timeseries.perf_stats_bootstrap(
        returns, factor_returns, return_stats=False
    )
    bootstrap_values = bootstrap_values.drop("Kurtosis", axis="columns")

    sns.boxplot(data=bootstrap_values, orient="h", ax=ax)

    return ax


STAT_FUNCS_PCT = [
    "Annual return",
    "Cumulative returns",
    "Annual volatility",
    "Max drawdown",
    "Daily value at risk",
    "Daily turnover",
]


def show_perf_stats(
    returns,
    factor_returns=None,
    positions=None,
    transactions=None,
    turnover_denom="AGB",
    live_start_date=None,
    bootstrap=False,
    header_rows=None,
):
    """
    Prints some performance metrics of the strategy.

    - Shows amount of time the strategy has been run in backtest and
      out-of-sample (in live trading).

    - Shows Omega ratio, max drawdown, Calmar ratio, annual return,
      stability, Sharpe ratio, annual volatility, alpha, and beta.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    factor_returns : pd.Series, optional
        Daily noncumulative returns of the benchmark factor to which betas are
        computed. Usually a benchmark such as market returns.
         - This is in the same style as returns.
    positions : pd.DataFrame, optional
        Daily net position values.
         - See full explanation in create_full_tear_sheet.
    transactions : pd.DataFrame, optional
        Prices and amounts of executed trades. One row per trade.
        - See full explanation in tears.create_full_tear_sheet
    turnover_denom : str, optional
        Either AGB or portfolio_value, default AGB.
        - See full explanation in txn.get_turnover.
    live_start_date : datetime, optional
        The point in time when the strategy began live trading, after
        its backtest period.
    bootstrap : boolean, optional
        Whether to perform bootstrap analysis for the performance
        metrics.
         - For more information, see timeseries.perf_stats_bootstrap
    header_rows : dict or OrderedDict, optional
        Extra rows to display at the top of the displayed table.
    """

    if bootstrap:
        perf_func = timeseries.perf_stats_bootstrap
    else:
        perf_func = timeseries.perf_stats

    perf_stats_all = perf_func(
        returns,
        factor_returns=factor_returns,
        positions=positions,
        transactions=transactions,
        turnover_denom=turnover_denom,
    )

    date_rows = OrderedDict()
    if len(returns.index) > 0:
        date_rows["Start date"] = returns.index[0].strftime("%Y-%m-%d")
        date_rows["End date"] = returns.index[-1].strftime("%Y-%m-%d")

    if live_start_date is not None:
        live_start_date = ep.utils.get_utc_timestamp(live_start_date)
        returns_is = returns[returns.index < live_start_date]
        returns_oos = returns[returns.index >= live_start_date]

        positions_is = None
        positions_oos = None
        transactions_is = None
        transactions_oos = None

        if positions is not None:
            positions_is = positions[positions.index < live_start_date]
            positions_oos = positions[positions.index >= live_start_date]
            if transactions is not None:
                transactions_is = transactions[(transactions.index < live_start_date)]
                transactions_oos = transactions[(transactions.index > live_start_date)]

        perf_stats_is = perf_func(
            returns_is,
            factor_returns=factor_returns,
            positions=positions_is,
            transactions=transactions_is,
            turnover_denom=turnover_denom,
        )

        perf_stats_oos = perf_func(
            returns_oos,
            factor_returns=factor_returns,
            positions=positions_oos,
            transactions=transactions_oos,
            turnover_denom=turnover_denom,
        )
        if len(returns.index) > 0:
            date_rows["In-sample months"] = int(
                len(returns_is) / APPROX_BDAYS_PER_MONTH
            )
            date_rows["Out-of-sample months"] = int(
                len(returns_oos) / APPROX_BDAYS_PER_MONTH
            )

        perf_stats = pd.concat(
            OrderedDict(
                [
                    ("In-sample", perf_stats_is),
                    ("Out-of-sample", perf_stats_oos),
                    ("All", perf_stats_all),
                ]
            ),
            axis=1,
        )
    else:
        if len(returns.index) > 0:
            date_rows["Total months"] = int(len(returns) / APPROX_BDAYS_PER_MONTH)
        perf_stats = pd.DataFrame(perf_stats_all, columns=["Backtest"])

    for column in perf_stats.columns:
        for stat, value in perf_stats[column].items():
            if stat in STAT_FUNCS_PCT:
                perf_stats.loc[stat, column] = str(np.round(value * 100, 3)) + "%"
    if header_rows is None:
        header_rows = date_rows
    else:
        header_rows = OrderedDict(header_rows)
        header_rows.update(date_rows)

    utils.print_table(
        perf_stats,
        float_format="{0:.2f}".format,
        header_rows=header_rows,
    )


def plot_returns(returns, live_start_date=None, ax=None):
    """
    Plots raw returns over time.

    Backtest returns are in green, and out-of-sample (live trading)
    returns are in red.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    live_start_date : datetime, optional
        The date when the strategy began live trading, after
        its backtest period. This date should be normalized.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    ax.set_label("")
    ax.set_ylabel("Returns")

    if live_start_date is not None:
        live_start_date = ep.utils.get_utc_timestamp(live_start_date)
        is_returns = returns.loc[returns.index < live_start_date]
        oos_returns = returns.loc[returns.index >= live_start_date]
        is_returns.plot(ax=ax, color="g")
        oos_returns.plot(ax=ax, color="r")

    else:
        returns.plot(ax=ax, color="g")

    return ax


def plot_rolling_returns(
    returns,
    factor_returns=None,
    live_start_date=None,
    logy=False,
    cone_std=None,
    legend_loc="best",
    volatility_match=False,
    cone_function=timeseries.forecast_cone_bootstrap,
    ax=None,
    **kwargs,
):
    """
    Plots cumulative rolling returns versus some benchmarks'.

    Backtest returns are in green, and out-of-sample (live trading)
    returns are in red.

    Additionally, a non-parametric cone plot may be added to the
    out-of-sample returns region.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    factor_returns : pd.Series, optional
        Daily noncumulative returns of the benchmark factor to which betas are
        computed. Usually a benchmark such as market returns.
         - This is in the same style as returns.
    live_start_date : datetime, optional
        The date when the strategy began live trading, after
        its backtest period. This date should be normalized.
    logy : bool, optional
        Whether to log-scale the y-axis.
    cone_std : float, or tuple, optional
        If float, The standard deviation to use for the cone plots.
        If tuple, Tuple of standard deviation values to use for the cone plots
         - See timeseries.forecast_cone_bounds for more details.
    legend_loc : matplotlib.loc, optional
        The location of the legend on the plot.
    volatility_match : bool, optional
        Whether to normalize the volatility of the returns to those of the
        benchmark returns. This helps compare strategies with different
        volatilities. Requires passing of benchmark_rets.
    cone_function : function, optional
        Function to use when generating forecast probability cone.
        The function signiture must follow the form:
        def cone(in_sample_returns (pd.Series),
                 days_to_project_forward (int),
                 cone_std= (float, or tuple),
                 starting_value= (int, or float))
        See timeseries.forecast_cone_bootstrap for an example.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    ax.set_xlabel("")
    ax.set_ylabel("Cumulative returns")
    ax.set_yscale("log" if logy else "linear")

    if volatility_match and factor_returns is None:
        raise ValueError("volatility_match requires passing of factor_returns.")
    elif volatility_match and factor_returns is not None:
        bmark_vol = factor_returns.loc[returns.index].std()
        returns = (returns / returns.std()) * bmark_vol

    cum_rets = ep.cum_returns(returns, 1.0)

    y_axis_formatter = FuncFormatter(utils.two_dec_places)
    ax.yaxis.set_major_formatter(FuncFormatter(y_axis_formatter))

    if factor_returns is not None:
        cum_factor_returns = ep.cum_returns(factor_returns[cum_rets.index], 1.0)
        cum_factor_returns.plot(
            lw=2,
            color="gray",
            label=factor_returns.name,
            alpha=0.60,
            ax=ax,
            **kwargs,
        )

    if live_start_date is not None:
        live_start_date = ep.utils.get_utc_timestamp(live_start_date)
        is_cum_returns = cum_rets.loc[cum_rets.index < live_start_date]
        oos_cum_returns = cum_rets.loc[cum_rets.index >= live_start_date]
    else:
        is_cum_returns = cum_rets
        oos_cum_returns = pd.Series([], dtype="float64")

    is_cum_returns.plot(
        lw=2, color="forestgreen", alpha=0.6, label="Backtest", ax=ax, **kwargs
    )

    if len(oos_cum_returns) > 0:
        oos_cum_returns.plot(
            lw=2, color="red", alpha=0.6, label="Live", ax=ax, **kwargs
        )

        if cone_std is not None:
            if isinstance(cone_std, (float, int)):
                cone_std = [cone_std]

            is_returns = returns.loc[returns.index < live_start_date]
            cone_bounds = cone_function(
                is_returns,
                len(oos_cum_returns),
                cone_std=cone_std,
                starting_value=is_cum_returns[-1],
            )

            cone_bounds = cone_bounds.set_index(oos_cum_returns.index)
            for std in cone_std:
                ax.fill_between(
                    cone_bounds.index,
                    cone_bounds[float(std)],
                    cone_bounds[float(-std)],
                    color="steelblue",
                    alpha=0.5,
                )

    if legend_loc is not None:
        ax.legend(loc=legend_loc, frameon=True, framealpha=0.5)
    ax.axhline(1.0, linestyle="--", color="black", lw=1)

    return ax


def plot_rolling_beta(returns, factor_returns, legend_loc="best", ax=None, **kwargs):
    """
    Plots the rolling 6-month and 12-month beta versus date.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    factor_returns : pd.Series
        Daily noncumulative returns of the benchmark factor to which betas are
        computed. Usually a benchmark such as market returns.
         - This is in the same style as returns.
    legend_loc : matplotlib.loc, optional
        The location of the legend on the plot.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    y_axis_formatter = FuncFormatter(utils.two_dec_places)
    ax.yaxis.set_major_formatter(FuncFormatter(y_axis_formatter))

    ax.set_title("Rolling portfolio beta to " + str(factor_returns.name))
    ax.set_ylabel("Beta")
    rb_1 = timeseries.rolling_beta(
        returns, factor_returns, rolling_window=APPROX_BDAYS_PER_MONTH * 6
    )
    rb_1.plot(color="steelblue", lw=2, alpha=0.6, ax=ax, **kwargs)
    rb_2 = timeseries.rolling_beta(
        returns, factor_returns, rolling_window=APPROX_BDAYS_PER_MONTH * 12
    )
    rb_2.plot(color="grey", lw=2, alpha=0.4, ax=ax, **kwargs)
    ax.axhline(rb_1.mean(), color="steelblue", linestyle="--", lw=2)
    ax.axhline(1.0, color="black", linestyle="--", lw=1)

    ax.set_xlabel("")
    ax.legend(
        ["6-mo", "12-mo", "6-mo Average"],
        loc=legend_loc,
        frameon=True,
        framealpha=0.5,
    )
    # ax.set_ylim((-0.5, 1.5))
    return ax


def plot_rolling_volatility(
    returns,
    factor_returns=None,
    rolling_window=APPROX_BDAYS_PER_MONTH * 6,
    legend_loc="best",
    ax=None,
    **kwargs,
):
    """
    Plots the rolling volatility versus date.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    factor_returns : pd.Series, optional
        Daily noncumulative returns of the benchmark factor for which the
        benchmark rolling volatility is computed. Usually a benchmark such
        as market returns.
         - This is in the same style as returns.
    rolling_window : int, optional
        The days window over which to compute the volatility.
    legend_loc : matplotlib.loc, optional
        The location of the legend on the plot.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    y_axis_formatter = FuncFormatter(utils.two_dec_places)
    ax.yaxis.set_major_formatter(FuncFormatter(y_axis_formatter))

    rolling_vol_ts = timeseries.rolling_volatility(returns, rolling_window)
    rolling_vol_ts.plot(alpha=0.7, lw=2, color="orangered", ax=ax, **kwargs)
    if factor_returns is not None:
        rolling_vol_ts_factor = timeseries.rolling_volatility(
            factor_returns, rolling_window
        )
        rolling_vol_ts_factor.plot(alpha=0.7, lw=2, color="grey", ax=ax, **kwargs)

    ax.set_title("Rolling volatility (6-month)")
    ax.axhline(rolling_vol_ts.mean(), color="steelblue", linestyle="--", lw=2)

    ax.axhline(0.0, color="black", linestyle="--", lw=1, zorder=2)

    ax.set_ylabel("Volatility")
    ax.set_xlabel("")
    if factor_returns is None:
        ax.legend(
            ["Volatility", "Average volatility"],
            loc=legend_loc,
            frameon=True,
            framealpha=0.5,
        )
    else:
        ax.legend(
            ["Volatility", "Benchmark volatility", "Average volatility"],
            loc=legend_loc,
            frameon=True,
            framealpha=0.5,
        )
    return ax


def plot_rolling_sharpe(
    returns,
    factor_returns=None,
    rolling_window=APPROX_BDAYS_PER_MONTH * 6,
    legend_loc="best",
    ax=None,
    **kwargs,
):
    """
    Plots the rolling Sharpe ratio versus date.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    factor_returns : pd.Series, optional
        Daily noncumulative returns of the benchmark factor for
        which the benchmark rolling Sharpe is computed. Usually
        a benchmark such as market returns.
         - This is in the same style as returns.
    rolling_window : int, optional
        The days window over which to compute the sharpe ratio.
    legend_loc : matplotlib.loc, optional
        The location of the legend on the plot.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    y_axis_formatter = FuncFormatter(utils.two_dec_places)
    ax.yaxis.set_major_formatter(FuncFormatter(y_axis_formatter))

    rolling_sharpe_ts = timeseries.rolling_sharpe(returns, rolling_window)
    rolling_sharpe_ts.plot(alpha=0.7, lw=2, color="orangered", ax=ax, **kwargs)

    if factor_returns is not None:
        rolling_sharpe_ts_factor = timeseries.rolling_sharpe(
            factor_returns, rolling_window
        )
        rolling_sharpe_ts_factor.plot(alpha=0.7, lw=2, color="grey", ax=ax, **kwargs)

    ax.set_title("Rolling Sharpe ratio (6-month)")
    ax.axhline(rolling_sharpe_ts.mean(), color="steelblue", linestyle="--", lw=2)
    ax.axhline(0.0, color="black", linestyle="--", lw=1, zorder=2)

    ax.set_ylabel("Sharpe ratio")
    ax.set_xlabel("")
    if factor_returns is None:
        ax.legend(["Sharpe", "Average"], loc=legend_loc, frameon=True, framealpha=0.5)
    else:
        ax.legend(
            ["Sharpe", "Benchmark Sharpe", "Average"],
            loc=legend_loc,
            frameon=True,
            framealpha=0.5,
        )

    return ax


def plot_gross_leverage(returns, positions, ax=None, **kwargs):
    """
    Plots gross leverage versus date.

    Gross leverage is the sum of long and short exposure per share
    divided by net asset value.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    positions : pd.DataFrame
        Daily net position values.
         - See full explanation in create_full_tear_sheet.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()
    gl = timeseries.gross_lev(positions)
    gl.plot(lw=0.5, color="limegreen", legend=False, ax=ax, **kwargs)

    ax.axhline(gl.mean(), color="g", linestyle="--", lw=3)

    ax.set_title("Gross leverage")
    ax.set_ylabel("Gross leverage")
    ax.set_xlabel("")
    return ax


def plot_exposures(returns, positions, ax=None, **kwargs):
    """
    Plots a cake chart of the long and short exposure.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    positions_alloc : pd.DataFrame
        Portfolio allocation of positions. See
        pos.get_percent_alloc.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    pos_no_cash = positions.drop("cash", axis=1)
    l_exp = pos_no_cash[pos_no_cash > 0].sum(axis=1) / positions.sum(axis=1)
    s_exp = pos_no_cash[pos_no_cash < 0].sum(axis=1) / positions.sum(axis=1)
    net_exp = pos_no_cash.sum(axis=1) / positions.sum(axis=1)

    ax.fill_between(
        l_exp.index, 0, l_exp.values, label="Long", color="green", alpha=0.5
    )
    ax.fill_between(s_exp.index, 0, s_exp.values, label="Short", color="red", alpha=0.5)
    ax.plot(
        net_exp.index,
        net_exp.values,
        label="Net",
        color="black",
        linestyle="dotted",
    )

    ax.set_xlim((returns.index[0], returns.index[-1]))
    ax.set_title("Exposure")
    ax.set_ylabel("Exposure")
    ax.legend(loc="lower left", frameon=True, framealpha=0.5)
    ax.set_xlabel("")
    return ax


def show_and_plot_top_positions(
    returns,
    positions_alloc,
    show_and_plot=2,
    hide_positions=False,
    legend_loc="real_best",
    ax=None,
    **kwargs,
):
    """
    Prints and/or plots the exposures of the top 10 held positions of
    all time.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    positions_alloc : pd.DataFrame
        Portfolio allocation of positions. See pos.get_percent_alloc.
    show_and_plot : int, optional
        By default, this is 2, and both prints and plots.
        If this is 0, it will only plot; if 1, it will only print.
    hide_positions : bool, optional
        If True, will not output any symbol names.
    legend_loc : matplotlib.loc, optional
        The location of the legend on the plot.
        By default, the legend will display below the plot.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to plotting function.

    Returns
    -------
    ax : matplotlib.Axes, conditional
        The axes that were plotted on.

    """
    positions_alloc = positions_alloc.copy()
    positions_alloc.columns = positions_alloc.columns.map(utils.format_asset)

    df_top_long, df_top_short, df_top_abs = pos.get_top_long_short_abs(positions_alloc)

    if show_and_plot == 1 or show_and_plot == 2:
        utils.print_table(
            pd.DataFrame(df_top_long * 100, columns=["max"]),
            float_format="{0:.2f}%".format,
            name="Top 10 long positions of all time",
        )

        utils.print_table(
            pd.DataFrame(df_top_short * 100, columns=["max"]),
            float_format="{0:.2f}%".format,
            name="Top 10 short positions of all time",
        )

        utils.print_table(
            pd.DataFrame(df_top_abs * 100, columns=["max"]),
            float_format="{0:.2f}%".format,
            name="Top 10 positions of all time",
        )

    if show_and_plot == 0 or show_and_plot == 2:

        if ax is None:
            ax = plt.gca()

        positions_alloc[df_top_abs.index].plot(
            title="Portfolio allocation over time, only top 10 holdings",
            alpha=0.5,
            ax=ax,
            **kwargs,
        )

        # Place legend below plot, shrink plot by 20%
        if legend_loc == "real_best":
            box = ax.get_position()
            ax.set_position(
                [
                    box.x0,
                    box.y0 + box.height * 0.1,
                    box.width,
                    box.height * 0.9,
                ]
            )

            # Put a legend below current axis
            ax.legend(
                loc="upper center",
                frameon=True,
                framealpha=0.5,
                bbox_to_anchor=(0.5, -0.14),
                ncol=5,
            )
        else:
            ax.legend(loc=legend_loc)

        ax.set_xlim((returns.index[0], returns.index[-1]))
        ax.set_ylabel("Exposure by holding")

        if hide_positions:
            ax.legend_.remove()

        return ax


def plot_max_median_position_concentration(positions, ax=None, **kwargs):
    """
    Plots the max and median of long and short position concentrations
    over the time.

    Parameters
    ----------
    positions : pd.DataFrame
        The positions that the strategy takes over time.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    alloc_summary = pos.get_max_median_position_concentration(positions)
    colors = ["mediumblue", "steelblue", "tomato", "firebrick"]
    alloc_summary.plot(linewidth=1, color=colors, alpha=0.6, ax=ax)

    ax.legend(loc="center left", frameon=True, framealpha=0.5)
    ax.set_ylabel("Exposure")
    ax.set_title("Long/short max and median position concentration")

    return ax


def plot_sector_allocations(returns, sector_alloc, ax=None, **kwargs):
    """
    Plots the sector exposures of the portfolio over time.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    sector_alloc : pd.DataFrame
        Portfolio allocation of positions. See pos.get_sector_alloc.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    sector_alloc.plot(title="Sector allocation over time", alpha=0.5, ax=ax, **kwargs)

    box = ax.get_position()
    ax.set_position([box.x0, box.y0 + box.height * 0.1, box.width, box.height * 0.9])

    # Put a legend below current axis
    ax.legend(
        loc="upper center",
        frameon=True,
        framealpha=0.5,
        bbox_to_anchor=(0.5, -0.14),
        ncol=5,
    )

    ax.set_xlim((sector_alloc.index[0], sector_alloc.index[-1]))
    ax.set_ylabel("Exposure by sector")
    ax.set_xlabel("")

    return ax


def plot_return_quantiles(returns, live_start_date=None, ax=None, **kwargs):
    """
    Creates a box plot of daily, weekly, and monthly return
    distributions.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    live_start_date : datetime, optional
        The point in time when the strategy began live trading, after
        its backtest period.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to seaborn plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    is_returns = (
        returns
        if live_start_date is None
        else returns.loc[returns.index < live_start_date]
    )
    is_weekly = ep.aggregate_returns(is_returns, "weekly")
    is_monthly = ep.aggregate_returns(is_returns, "monthly")
    sns.boxplot(
        data=[is_returns, is_weekly, is_monthly],
        palette=["#4c72B0", "#55A868", "#CCB974"],
        ax=ax,
        **kwargs,
    )

    if live_start_date is not None:
        oos_returns = returns.loc[returns.index >= live_start_date]
        oos_weekly = ep.aggregate_returns(oos_returns, "weekly")
        oos_monthly = ep.aggregate_returns(oos_returns, "monthly")

        sns.swarmplot(
            data=[oos_returns, oos_weekly, oos_monthly],
            ax=ax,
            palette="dark:red",
            marker="d",
            **kwargs,
        )
        red_dots = matplotlib.lines.Line2D(
            [],
            [],
            color="red",
            marker="d",
            label="Out-of-sample data",
            linestyle="",
        )
        ax.legend(handles=[red_dots], frameon=True, framealpha=0.5)
    ax.set_xticklabels(["Daily", "Weekly", "Monthly"])
    ax.set_title("Return quantiles")

    return ax


def plot_turnover(
    returns,
    transactions,
    positions,
    turnover_denom="AGB",
    legend_loc="best",
    ax=None,
    **kwargs,
):
    """
    Plots turnover vs. date.

    Turnover is the number of shares traded for a period as a fraction
    of total shares.

    Displays daily total, daily average per month, and all-time daily
    average.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    transactions : pd.DataFrame
        Prices and amounts of executed trades. One row per trade.
         - See full explanation in tears.create_full_tear_sheet.
    positions : pd.DataFrame
        Daily net position values.
         - See full explanation in tears.create_full_tear_sheet.
    turnover_denom : str, optional
        Either AGB or portfolio_value, default AGB.
        - See full explanation in txn.get_turnover.
    legend_loc : matplotlib.loc, optional
        The location of the legend on the plot.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    y_axis_formatter = FuncFormatter(utils.two_dec_places)
    ax.yaxis.set_major_formatter(FuncFormatter(y_axis_formatter))

    df_turnover = txn.get_turnover(positions, transactions, turnover_denom)
    df_turnover_by_month = df_turnover.resample("M").mean()
    df_turnover.plot(color="steelblue", alpha=1.0, lw=0.5, ax=ax, **kwargs)
    df_turnover_by_month.plot(color="orangered", alpha=0.5, lw=2, ax=ax, **kwargs)
    ax.axhline(df_turnover.mean(), color="steelblue", linestyle="--", lw=3, alpha=1.0)
    ax.legend(
        [
            "Daily turnover",
            "Average daily turnover, by month",
            "Average daily turnover, net",
        ],
        loc=legend_loc,
        frameon=True,
        framealpha=0.5,
    )
    ax.set_title("Daily turnover")
    ax.set_xlim((returns.index[0], returns.index[-1]))
    ax.set_ylim((0, 2))
    ax.set_ylabel("Turnover")
    ax.set_xlabel("")
    return ax


def plot_slippage_sweep(
    returns,
    positions,
    transactions,
    slippage_params=(3, 8, 10, 12, 15, 20, 50),
    ax=None,
    **kwargs,
):
    """
    Plots equity curves at different per-dollar slippage assumptions.

    Parameters
    ----------
    returns : pd.Series
        Timeseries of portfolio returns to be adjusted for various
        degrees of slippage.
    positions : pd.DataFrame
        Daily net position values.
         - See full explanation in tears.create_full_tear_sheet.
    transactions : pd.DataFrame
        Prices and amounts of executed trades. One row per trade.
         - See full explanation in tears.create_full_tear_sheet.
    slippage_params: tuple
        Slippage pameters to apply to the return time series (in
        basis points).
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to seaborn plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    slippage_sweep = pd.DataFrame()
    for bps in slippage_params:
        adj_returns = txn.adjust_returns_for_slippage(
            returns, positions, transactions, bps
        )
        label = str(bps) + " bps"
        slippage_sweep[label] = ep.cum_returns(adj_returns, 1)

    slippage_sweep.plot(alpha=1.0, lw=0.5, ax=ax)

    ax.set_title("Cumulative returns given additional per-dollar slippage")
    ax.set_ylabel("")

    ax.legend(loc="center left", frameon=True, framealpha=0.5)

    return ax


def plot_slippage_sensitivity(returns, positions, transactions, ax=None, **kwargs):
    """
    Plots curve relating per-dollar slippage to average annual returns.

    Parameters
    ----------
    returns : pd.Series
        Timeseries of portfolio returns to be adjusted for various
        degrees of slippage.
    positions : pd.DataFrame
        Daily net position values.
         - See full explanation in tears.create_full_tear_sheet.
    transactions : pd.DataFrame
        Prices and amounts of executed trades. One row per trade.
         - See full explanation in tears.create_full_tear_sheet.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to seaborn plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    avg_returns_given_slippage = pd.Series(dtype="float64")
    for bps in range(1, 100):
        adj_returns = txn.adjust_returns_for_slippage(
            returns, positions, transactions, bps
        )
        avg_returns = ep.annual_return(adj_returns)
        avg_returns_given_slippage.loc[bps] = avg_returns

    avg_returns_given_slippage.plot(alpha=1.0, lw=2, ax=ax)

    ax.set_title("Average annual returns given additional per-dollar slippage")
    ax.set_xticks(np.arange(0, 100, 10))
    ax.set_ylabel("Average annual return")
    ax.set_xlabel("Per-dollar slippage (bps)")

    return ax


def plot_capacity_sweep(
    returns,
    transactions,
    market_data,
    bt_starting_capital,
    min_pv=100000,
    max_pv=300000000,
    step_size=1000000,
    ax=None,
):
    txn_daily_w_bar = capacity.daily_txns_with_bar_data(transactions, market_data)

    captial_base_sweep = pd.Series()
    for start_pv in range(min_pv, max_pv, step_size):
        adj_ret = capacity.apply_slippage_penalty(
            returns, txn_daily_w_bar, start_pv, bt_starting_capital
        )
        sharpe = ep.sharpe_ratio(adj_ret)
        if sharpe < -1:
            break
        captial_base_sweep.loc[start_pv] = sharpe
    captial_base_sweep.index = captial_base_sweep.index / MM_DISPLAY_UNIT

    if ax is None:
        ax = plt.gca()

    captial_base_sweep.plot(ax=ax)
    ax.set_xlabel("Capital base ($mm)")
    ax.set_ylabel("Sharpe ratio")
    ax.set_title("Capital base performance sweep")

    return ax


def plot_daily_turnover_hist(
    transactions, positions, turnover_denom="AGB", ax=None, **kwargs
):
    """
    Plots a histogram of daily turnover rates.

    Parameters
    ----------
    transactions : pd.DataFrame
        Prices and amounts of executed trades. One row per trade.
         - See full explanation in tears.create_full_tear_sheet.
    positions : pd.DataFrame
        Daily net position values.
         - See full explanation in tears.create_full_tear_sheet.
    turnover_denom : str, optional
        Either AGB or portfolio_value, default AGB.
        - See full explanation in txn.get_turnover.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to seaborn plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()
    turnover = txn.get_turnover(positions, transactions, turnover_denom)
    sns.histplot(turnover, ax=ax, **kwargs)
    ax.set_title("Distribution of daily turnover rates")
    ax.set_xlabel("Turnover rate")
    return ax


def plot_daily_volume(returns, transactions, ax=None, **kwargs):
    """
    Plots trading volume per day vs. date.

    Also displays all-time daily average.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    transactions : pd.DataFrame
        Prices and amounts of executed trades. One row per trade.
         - See full explanation in tears.create_full_tear_sheet.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()
    daily_txn = txn.get_txn_vol(transactions)
    daily_txn.txn_shares.plot(alpha=1.0, lw=0.5, ax=ax, **kwargs)
    ax.axhline(
        daily_txn.txn_shares.mean(),
        color="steelblue",
        linestyle="--",
        lw=3,
        alpha=1.0,
    )
    ax.set_title("Daily trading volume")
    ax.set_xlim((returns.index[0], returns.index[-1]))
    ax.set_ylabel("Amount of shares traded")
    ax.set_xlabel("")
    return ax


def plot_txn_time_hist(
    transactions, bin_minutes=5, tz="America/New_York", ax=None, **kwargs
):
    """
    Plots a histogram of transaction times, binning the times into
    buckets of a given duration.

    Parameters
    ----------
    transactions : pd.DataFrame
        Prices and amounts of executed trades. One row per trade.
         - See full explanation in tears.create_full_tear_sheet.
    bin_minutes : float, optional
        Sizes of the bins in minutes, defaults to 5 minutes.
    tz : str, optional
        Time zone to plot against. Note that if the specified
        zone does not apply daylight savings, the distribution
        may be partially offset.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.gca()

    txn_time = transactions.copy()

    txn_time.index = txn_time.index.tz_convert(pytz.timezone(tz))
    txn_time.index = txn_time.index.map(lambda x: x.hour * 60 + x.minute)
    txn_time["trade_value"] = (txn_time.amount * txn_time.price).abs()
    txn_time = (
        txn_time.groupby(level=0).sum(numeric_only=True).reindex(index=range(570, 961))
    )
    txn_time.index = (txn_time.index / bin_minutes).astype(int) * bin_minutes
    txn_time = txn_time.groupby(level=0).sum(numeric_only=True)

    txn_time["time_str"] = txn_time.index.map(
        lambda x: str(datetime.time(int(x / 60), x % 60))[:-3]
    )

    trade_value_sum = txn_time.trade_value.sum()
    txn_time.trade_value = txn_time.trade_value.fillna(0) / trade_value_sum

    ax.bar(txn_time.index, txn_time.trade_value, width=bin_minutes, **kwargs)

    ax.set_xlim(570, 960)
    ax.set_xticks(txn_time.index[:: int(30 / bin_minutes)])
    ax.set_xticklabels(txn_time.time_str[:: int(30 / bin_minutes)])
    ax.set_title("Transaction time distribution")
    ax.set_ylabel("Proportion")
    ax.set_xlabel("")
    return ax


def show_worst_drawdown_periods(returns, top=5):
    """
    Prints information about the worst drawdown periods.

    Prints peak dates, valley dates, recovery dates, and net
    drawdowns.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    top : int, optional
        Amount of top drawdowns periods to plot (default 5).
    """

    drawdown_df = timeseries.gen_drawdown_table(returns, top=top)
    utils.print_table(
        drawdown_df.sort_values("Net drawdown in %", ascending=False),
        name="Worst drawdown periods",
        float_format="{0:.2f}".format,
    )


def plot_monthly_returns_timeseries(returns, ax=None, **kwargs):
    """
    Plots monthly returns as a timeseries.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to seaborn plotting function.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    def cumulate_returns(x):
        return ep.cum_returns(x)[-1]

    if ax is None:
        ax = plt.gca()

    monthly_rets = returns.resample("M").apply(lambda x: cumulate_returns(x))
    monthly_rets = monthly_rets.to_period()

    sns.barplot(x=monthly_rets.index, y=monthly_rets.values, color="steelblue")

    _, labels = plt.xticks()
    plt.setp(labels, rotation=90)

    # only show x-labels on year boundary
    xticks_coord = []
    xticks_label = []
    count = 0
    for i in monthly_rets.index:
        if i.month == 1:
            xticks_label.append(i)
            xticks_coord.append(count)
            # plot yearly boundary line
            ax.axvline(count, color="gray", ls="--", alpha=0.3)

        count += 1

    ax.axhline(0.0, color="darkgray", ls="-")
    ax.set_xticks(xticks_coord)
    ax.set_xticklabels(xticks_label)

    return ax


def plot_round_trip_lifetimes(round_trips, disp_amount=16, lsize=18, ax=None):
    """
    Plots timespans and directions of a sample of round trip trades.

    Parameters
    ----------
    round_trips : pd.DataFrame
        DataFrame with one row per round trip trade.
        - See full explanation in round_trips.extract_round_trips
    ax : matplotlib.Axes, optional
        Axes upon which to plot.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    if ax is None:
        ax = plt.subplot()

    symbols_sample = round_trips.symbol.unique()
    np.random.seed(1)
    sample = np.random.choice(
        round_trips.symbol.unique(),
        replace=False,
        size=min(disp_amount, len(symbols_sample)),
    )
    sample_round_trips = round_trips[round_trips.symbol.isin(sample)]

    symbol_idx = pd.Series(np.arange(len(sample)), index=sample)

    for symbol, sym_round_trips in sample_round_trips.groupby("symbol"):
        for _, row in sym_round_trips.iterrows():
            c = "b" if row.long else "r"
            y_ix = symbol_idx[symbol] + 0.05
            ax.plot(
                [row["open_dt"], row["close_dt"]],
                [y_ix, y_ix],
                color=c,
                linewidth=lsize,
                solid_capstyle="butt",
            )

    ax.set_yticks(range(len(sample)))
    ax.set_yticklabels([utils.format_asset(s) for s in sample])

    ax.set_ylim((-0.5, min(len(sample), disp_amount) - 0.5))
    blue = patches.Rectangle([0, 0], 1, 1, color="b", label="Long")
    red = patches.Rectangle([0, 0], 1, 1, color="r", label="Short")
    leg = ax.legend(handles=[blue, red], loc="lower left", frameon=True, framealpha=0.5)
    leg.get_frame().set_edgecolor("black")
    ax.grid(False)

    return ax


def show_profit_attribution(round_trips):
    """
    Prints the share of total PnL contributed by each
    traded name.

    Parameters
    ----------
    round_trips : pd.DataFrame
        DataFrame with one row per round trip trade.
        - See full explanation in round_trips.extract_round_trips
    ax : matplotlib.Axes, optional
        Axes upon which to plot.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    total_pnl = round_trips["pnl"].sum()
    pnl_attribution = round_trips.groupby("symbol")["pnl"].sum() / total_pnl
    pnl_attribution.name = ""

    pnl_attribution.index = pnl_attribution.index.map(utils.format_asset)
    utils.print_table(
        pnl_attribution.sort_values(
            inplace=False,
            ascending=False,
        ),
        name="Profitability (PnL / PnL total) per name",
        float_format="{:.2%}".format,
    )


def plot_prob_profit_trade(round_trips, ax=None):
    """
    Plots a probability distribution for the event of making
    a profitable trade.

    Parameters
    ----------
    round_trips : pd.DataFrame
        DataFrame with one row per round trip trade.
        - See full explanation in round_trips.extract_round_trips
    ax : matplotlib.Axes, optional
        Axes upon which to plot.

    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.
    """

    x = np.linspace(0, 1.0, 500)

    round_trips["profitable"] = round_trips.pnl > 0

    dist = sp.stats.beta(round_trips.profitable.sum(), (~round_trips.profitable).sum())
    y = dist.pdf(x)
    lower_perc = dist.ppf(0.025)
    upper_perc = dist.ppf(0.975)

    lower_plot = dist.ppf(0.001)
    upper_plot = dist.ppf(0.999)

    if ax is None:
        ax = plt.subplot()

    ax.plot(x, y)
    ax.axvline(lower_perc, color="0.5")
    ax.axvline(upper_perc, color="0.5")

    ax.set_xlabel("Probability of making a profitable decision")
    ax.set_ylabel("Belief")
    ax.set_xlim(lower_plot, upper_plot)
    ax.set_ylim((0, y.max() + 1.0))

    return ax


def plot_cones(
    name,
    bounds,
    oos_returns,
    num_samples=1000,
    ax=None,
    cone_std=(1.0, 1.5, 2.0),
    random_seed=None,
    num_strikes=3,
):
    """
    Plots the upper and lower bounds of an n standard deviation
    cone of forecasted cumulative returns. Redraws a new cone when
    cumulative returns fall outside of last cone drawn.

    Parameters
    ----------
    name : str
        Account name to be used as figure title.
    bounds : pandas.core.frame.DataFrame
        Contains upper and lower cone boundaries. Column names are
        strings corresponding to the number of standard devations
        above (positive) or below (negative) the projected mean
        cumulative returns.
    oos_returns : pandas.core.frame.DataFrame
        Non-cumulative out-of-sample returns.
    num_samples : int
        Number of samples to draw from the in-sample daily returns.
        Each sample will be an array with length num_days.
        A higher number of samples will generate a more accurate
        bootstrap cone.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    cone_std : list of int/float
        Number of standard devations to use in the boundaries of
        the cone. If multiple values are passed, cone bounds will
        be generated for each value.
    random_seed : int
        Seed for the pseudorandom number generator used by the pandas
        sample method.
    num_strikes : int
        Upper limit for number of cones drawn. Can be anything from 0 to 3.

    Returns
    -------
    Returns are either an ax or fig option, but not both. If a
    matplotlib.Axes instance is passed in as ax, then it will be modified
    and returned. This allows for users to plot interactively in jupyter
    notebook. When no ax object is passed in, a matplotlib.figure instance
    is generated and returned. This figure can then be used to save
    the plot as an image without viewing it.

    ax : matplotlib.Axes
        The axes that were plotted on.
    fig : matplotlib.figure
        The figure instance which contains all the plot elements.
    """

    if ax is None:
        fig = figure.Figure(figsize=(10, 8))
        FigureCanvasAgg(fig)
        axes = fig.add_subplot(111)
    else:
        axes = ax

    returns = ep.cum_returns(oos_returns, starting_value=1.0)
    bounds_tmp = bounds.copy()
    returns_tmp = returns.copy()
    cone_start = returns.index[0]
    colors = ["green", "orange", "orangered", "darkred"]

    for c in range(num_strikes + 1):
        if c > 0:
            tmp = returns.loc[cone_start:]
            bounds_tmp = bounds_tmp.iloc[0 : len(tmp)]
            bounds_tmp = bounds_tmp.set_index(tmp.index)
            crossing = tmp < bounds_tmp[float(-2.0)].iloc[: len(tmp)]
            if crossing.sum() <= 0:
                break
            cone_start = crossing.loc[crossing].index[0]
            returns_tmp = returns.loc[cone_start:]
            bounds_tmp = bounds - (1 - returns.loc[cone_start])
        for std in cone_std:
            x = returns_tmp.index
            y1 = bounds_tmp[float(std)].iloc[: len(returns_tmp)]
            y2 = bounds_tmp[float(-std)].iloc[: len(returns_tmp)]
            axes.fill_between(x, y1, y2, color=colors[c], alpha=0.5)

    # Plot returns line graph
    label = "Cumulative returns = {:.2f}%".format((returns.iloc[-1] - 1) * 100)
    axes.plot(returns.index, returns.values, color="black", lw=2.0, label=label)

    if name is not None:
        axes.set_title(name)
    axes.axhline(1, color="black", alpha=0.2)
    axes.legend(frameon=True, framealpha=0.5)

    if ax is None:
        return fig
    else:
        return axes

utils.py

#
# Copyright 2018 Quantopian, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import warnings
from itertools import cycle

import empyrical.utils
import numpy as np
import pandas as pd
from IPython.display import display, HTML
from matplotlib.pyplot import cm
from packaging.version import Version
import os
import datetime

from . import pos
from . import txn

APPROX_BDAYS_PER_MONTH = 21
APPROX_BDAYS_PER_YEAR = 252

MONTHS_PER_YEAR = 12
WEEKS_PER_YEAR = 52

MM_DISPLAY_UNIT = 1000000.0

DAILY = "daily"
WEEKLY = "weekly"
MONTHLY = "monthly"
YEARLY = "yearly"

ANNUALIZATION_FACTORS = {
    DAILY: APPROX_BDAYS_PER_YEAR,
    WEEKLY: WEEKS_PER_YEAR,
    MONTHLY: MONTHS_PER_YEAR,
}

COLORMAP = "Paired"
COLORS = [
    "#e6194b",
    "#3cb44b",
    "#ffe119",
    "#0082c8",
    "#f58231",
    "#911eb4",
    "#46f0f0",
    "#f032e6",
    "#d2f53c",
    "#fabebe",
    "#008080",
    "#e6beff",
    "#aa6e28",
    "#800000",
    "#aaffc3",
    "#808000",
    "#ffd8b1",
    "#000080",
    "#808080",
]

pandas_version = Version(pd.__version__)

pandas_one_point_one_or_less = pandas_version < Version("1.2")


def one_dec_places(x, pos):
    """
    Adds 1/10th decimal to plot ticks.
    """

    return "%.1f" % x


def two_dec_places(x, pos):
    """
    Adds 1/100th decimal to plot ticks.
    """

    return "%.2f" % x


def percentage(x, pos):
    """
    Adds percentage sign to plot ticks.
    """

    return "%.0f%%" % x


def format_asset(asset):
    """
    If zipline asset objects are used, we want to print them out prettily
    within the tear sheet. This function should only be applied directly
    before displaying.
    """

    try:
        import zipline.assets
    except ImportError:
        return asset

    if isinstance(asset, zipline.assets.Asset):
        return asset.symbol
    else:
        return asset


def vectorize(func):
    """
    Decorator so that functions can be written to work on Series but
    may still be called with DataFrames.
    """

    def wrapper(df, *args, **kwargs):
        if df.ndim == 1:
            return func(df, *args, **kwargs)
        elif df.ndim == 2:
            return df.apply(func, *args, **kwargs)

    return wrapper


def extract_rets_pos_txn_from_zipline(backtest):
    """
    Extract returns, positions, transactions and leverage from the
    backtest data structure returned by zipline.TradingAlgorithm.run().

    The returned data structures are in a format compatible with the
    rest of pyfolio and can be directly passed to
    e.g. tears.create_full_tear_sheet().

    Parameters
    ----------
    backtest : pd.DataFrame
        DataFrame returned by zipline.TradingAlgorithm.run()

    Returns
    -------
    returns : pd.Series
        Daily returns of strategy.
         - See full explanation in tears.create_full_tear_sheet.
    positions : pd.DataFrame
        Daily net position values.
         - See full explanation in tears.create_full_tear_sheet.
    transactions : pd.DataFrame
        Prices and amounts of executed trades. One row per trade.
         - See full explanation in tears.create_full_tear_sheet.


    Example (on the Quantopian research platform)
    ---------------------------------------------
    >>> backtest = my_algo.run()
    >>> returns, positions, transactions =
    >>>     pyfolio.utils.extract_rets_pos_txn_from_zipline(backtest)
    >>> pyfolio.tears.create_full_tear_sheet(returns,
    >>>     positions, transactions)
    """

    backtest.index = backtest.index.normalize()
    if backtest.index.tzinfo is None:
        backtest.index = backtest.index.tz_localize("UTC")
    returns = backtest.returns
    raw_positions = []
    for dt, pos_row in backtest.positions.items():
        df = pd.DataFrame(pos_row)
        df.index = [dt] * len(df)
        raw_positions.append(df)
    if not raw_positions:
        raise ValueError("The backtest does not have any positions.")
    positions = pd.concat(raw_positions)
    positions = pos.extract_pos(positions, backtest.ending_cash)
    transactions = txn.make_transaction_frame(backtest.transactions)
    if transactions.index.tzinfo is None:
        transactions.index = transactions.index.tz_localize("utc")

    return returns, positions, transactions




def print_table(table, name=None, float_format=None, formatters=None, 
                header_rows=None, save_to_file=True, 
                output_dir='/home/shared/algos/ml4t/plots/temp'):
    """
    Pretty print a pandas DataFrame and optionally save it as an HTML file with a unique timestamped filename.

    Parameters
    ----------
    table : pandas.Series or pandas.DataFrame
        Table to pretty-print.
    name : str, optional
        Table name to display in upper left corner.
    float_format : function, optional
        Formatter to use for displaying table elements, passed as the
        `float_format` arg to pd.Dataframe.to_html.
    formatters : list or dict, optional
        Formatters to use by column, passed as the `formatters` arg to
        pd.Dataframe.to_html.
    header_rows : dict, optional
        Extra rows to display at the top of the table.
    save_to_file : bool, optional
        If True, save the table to an HTML file.
    output_dir : str, optional
        Directory where the HTML file will be saved.
    """

    if isinstance(table, pd.Series):
        table = pd.DataFrame(table)

    if name is not None:
        table.columns.name = name

    html = table.to_html(float_format=float_format, formatters=formatters)

    if header_rows is not None:
        n_cols = html.split("<thead>")[1].split("</thead>")[0].count("<th>")
        rows = ""
        for name, value in header_rows.items():
            rows += '\n    <tr style="text-align: right;"><th>%s</th>' % name
            rows += "<td colspan=%d>%s</td></tr>" % (n_cols, value)
        html = html.replace("<thead>", "<thead>" + rows)

    if save_to_file:
        # Generate a timestamped filename
        timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S%f")
        filename = f"table_{timestamp}.html"
        file_path = os.path.join(output_dir, filename)

        # Create output directory if it doesn't exist
        if not os.path.exists(output_dir):
            os.makedirs(output_dir)

        with open(file_path, 'w') as f:
            f.write(html)
        print(f"Table saved to {file_path}")
    else:
        display(HTML(html))


def standardize_data(x):
    """
    Standardize an array with mean and standard deviation.

    Parameters
    ----------
    x : np.array
        Array to standardize.

    Returns
    -------
    np.array
        Standardized array.
    """

    return (x - np.mean(x)) / np.std(x)


def detect_intraday(positions, transactions, threshold=0.25):
    """
    Attempt to detect an intraday strategy. Get the number of
    positions held at the end of the day, and divide that by the
    number of unique stocks transacted every day. If the average quotient
    is below a threshold, then an intraday strategy is detected.

    Parameters
    ----------
    positions : pd.DataFrame
        Daily net position values.
         - See full explanation in create_full_tear_sheet.
    transactions : pd.DataFrame
        Prices and amounts of executed trades. One row per trade.
         - See full explanation in create_full_tear_sheet.

    Returns
    -------
    boolean
        True if an intraday strategy is detected.
    """

    daily_txn = transactions.copy()
    daily_txn.index = daily_txn.index.date
    txn_count = daily_txn.groupby(level=0).symbol.nunique().sum()
    daily_pos = positions.drop("cash", axis=1).replace(0, np.nan)
    return daily_pos.count(axis=1).sum() / txn_count < threshold


def check_intraday(estimate, returns, positions, transactions):
    """
    Logic for checking if a strategy is intraday and processing it.

    Parameters
    ----------
    estimate: boolean or str, optional
        Approximate returns for intraday strategies.
        See description in tears.create_full_tear_sheet.
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in create_full_tear_sheet.
    positions : pd.DataFrame
        Daily net position values.
         - See full explanation in create_full_tear_sheet.
    transactions : pd.DataFrame
        Prices and amounts of executed trades. One row per trade.
         - See full explanation in create_full_tear_sheet.

    Returns
    -------
    pd.DataFrame
        Daily net position values, adjusted for intraday movement.
    """

    if estimate == "infer":
        if positions is not None and transactions is not None:
            if detect_intraday(positions, transactions):
                warnings.warn(
                    "Detected intraday strategy; inferring positi"
                    + "ons from transactions. Set estimate_intraday"
                    + "=False to disable."
                )
                return estimate_intraday(returns, positions, transactions)
            else:
                return positions
        else:
            return positions

    elif estimate:
        if positions is not None and transactions is not None:
            return estimate_intraday(returns, positions, transactions)
        else:
            raise ValueError("Positions and txns needed to estimate intraday")
    else:
        return positions


def estimate_intraday(returns, positions, transactions, EOD_hour=23):
    """
    Intraday strategies will often not hold positions at the day end.
    This attempts to find the point in the day that best represents
    the activity of the strategy on that day, and effectively resamples
    the end-of-day positions with the positions at this point of day.
    The point of day is found by detecting when our exposure in the
    market is at its maximum point. Note that this is an estimate.

    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in create_full_tear_sheet.
    positions : pd.DataFrame
        Daily net position values.
         - See full explanation in create_full_tear_sheet.
    transactions : pd.DataFrame
        Prices and amounts of executed trades. One row per trade.
         - See full explanation in create_full_tear_sheet.

    Returns
    -------
    pd.DataFrame
        Daily net position values, resampled for intraday behavior.
    """

    # Construct DataFrame of transaction amounts
    txn_val = transactions.copy()
    txn_val.index.names = ["date"]
    txn_val["value"] = txn_val.amount * txn_val.price
    txn_val = (
        txn_val.reset_index()
        .pivot_table(index="date", values="value", columns="symbol")
        .replace(np.nan, 0)
    )

    # Cumulate transaction amounts each day
    txn_val = txn_val.groupby(txn_val.index.date).cumsum()

    # Calculate exposure, then take peak of exposure every day
    txn_val["exposure"] = txn_val.abs().sum(axis=1)
    condition = txn_val["exposure"] == txn_val.groupby(pd.Grouper(freq="24H"))[
        "exposure"
    ].transform(max)
    txn_val = txn_val[condition].drop("exposure", axis=1)

    # Compute cash delta
    txn_val["cash"] = -txn_val.sum(axis=1)

    # Shift EOD positions to positions at start of next trading day
    positions_shifted = positions.copy().shift(1).fillna(0)
    starting_capital = positions.iloc[0].sum() / (1 + returns[0])
    positions_shifted.cash[0] = starting_capital

    # Format and add start positions to intraday position changes
    txn_val.index = txn_val.index.normalize()
    corrected_positions = positions_shifted.add(txn_val, fill_value=0)
    corrected_positions.index.name = "period_close"
    corrected_positions.columns.name = "sid"

    return corrected_positions


def clip_returns_to_benchmark(rets, benchmark_rets):
    """
    Drop entries from rets so that the start and end dates of rets match those
    of benchmark_rets.

    Parameters
    ----------
    rets : pd.Series
        Daily returns of the strategy, noncumulative.
         - See pf.tears.create_full_tear_sheet for more details

    benchmark_rets : pd.Series
        Daily returns of the benchmark, noncumulative.

    Returns
    -------
    clipped_rets : pd.Series
        Daily noncumulative returns with index clipped to match that of
        benchmark returns.
    """

    if (rets.index[0] < benchmark_rets.index[0]) or (
        rets.index[-1] > benchmark_rets.index[-1]
    ):
        clipped_rets = rets[benchmark_rets.index]
    else:
        clipped_rets = rets

    return clipped_rets


def to_utc(df):
    """
    For use in tests; applied UTC timestamp to DataFrame.
    """

    try:
        df.index = df.index.tz_localize("UTC")
    except TypeError:
        df.index = df.index.tz_convert("UTC")

    return df


def to_series(df):
    """
    For use in tests; converts DataFrame's first column to Series.
    """

    return df[df.columns[0]]


# This functions is simply a passthrough to empyrical, but is
# required by the register_returns_func and get_symbol_rets.
default_returns_func = empyrical.utils.default_returns_func

# Settings dict to store functions/values that may
# need to be overridden depending on the users environment
SETTINGS = {"returns_func": default_returns_func}


def register_return_func(func):
    """
    Registers the 'returns_func' that will be called for
    retrieving returns data.

    Parameters
    ----------
    func : function
        A function that returns a pandas Series of asset returns.
        The signature of the function must be as follows

        >>> func(symbol)

        Where symbol is an asset identifier

    Returns
    -------
    None
    """

    SETTINGS["returns_func"] = func


def get_symbol_rets(symbol, start=None, end=None):
    """
    Calls the currently registered 'returns_func'

    Parameters
    ----------
    symbol : object
        An identifier for the asset whose return
        series is desired.
        e.g. ticker symbol or database ID
    start : date, optional
        Earliest date to fetch data for.
        Defaults to earliest date available.
    end : date, optional
        Latest date to fetch data for.
        Defaults to latest date available.

    Returns
    -------
    pandas.Series
        Returned by the current 'returns_func'
    """

    return SETTINGS["returns_func"](symbol, start=start, end=end)


def configure_legend(
    ax, autofmt_xdate=True, change_colors=False, rotation=30, ha="right"
):
    """
    Format legend for perf attribution plots:
    - put legend to the right of plot instead of overlapping with it
    - make legend order match up with graph lines
    - set colors according to colormap
    """
    chartBox = ax.get_position()
    ax.set_position([chartBox.x0, chartBox.y0, chartBox.width * 0.75, chartBox.height])

    # make legend order match graph lines
    handles, labels = ax.get_legend_handles_labels()
    handles_and_labels_sorted = sorted(
        zip(handles, labels), key=lambda x: x[0].get_ydata()[-1], reverse=True
    )

    handles_sorted = [h[0] for h in handles_and_labels_sorted]
    labels_sorted = [h[1] for h in handles_and_labels_sorted]

    if change_colors:
        for handle, color in zip(handles_sorted, cycle(COLORS)):
            handle.set_color(color)

    ax.legend(
        handles=handles_sorted,
        labels=labels_sorted,
        frameon=True,
        framealpha=0.5,
        loc="upper left",
        bbox_to_anchor=(1.05, 1),
        fontsize="small",
    )

    # manually rotate xticklabels instead of using matplotlib's autofmt_xdate
    # because it disables xticklabels for all but the last plot
    if autofmt_xdate:
        for label in ax.get_xticklabels():
            label.set_ha(ha)
            label.set_rotation(rotation)


def sample_colormap(cmap_name, n_samples):
    """
    Sample a colormap from matplotlib
    """
    colors = []
    colormap = cm.cmap_d[cmap_name]
    for i in np.linspace(0, 1, n_samples):
        colors.append(colormap(i))

    return colors

Jeremy Whittaker

Dad, Husband, Entrepreneur, Quantitative Investor

Extracting text from a PDF using OCR, Python, and Google Colab for free

Machine Learning for Algorithmic Trading: Predictive models to extract signals from market and alternative data for systematic trading strategies with Python GPT

Sample Excerpt:

You:

Machine Learning for Trading Tutor:

You:

Machine Learning for Trading Tutor

1. Quandl

2. Yahoo! Finance

3. AlgoSeek

4. pandas_datareader

5. SEC Filings (EDGAR)

Automating 1099 statements from TreasuryDirect.gov

#main.py

#treasury_direct.py

#gmail.py

#config.py

Mortgage Rates on Hold? Latest FED Futures Flip Forecast for March

Modifying PyFolio to output to HTML

Sample Output

tears.py

timeseries.py

plotting.py

utils.py

Follow for updates!