Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: merge_asof should support MultiIndex #51372

Open
2 of 3 tasks
ddrinka opened this issue Feb 14, 2023 · 0 comments
Open
2 of 3 tasks

ENH: merge_asof should support MultiIndex #51372

ddrinka opened this issue Feb 14, 2023 · 0 comments
Labels
Enhancement MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@ddrinka
Copy link

ddrinka commented Feb 14, 2023

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I have large quantities of stock data sorted by date, symbol, time. I'd like to produce the most recent quote that happened at or before every trade. This is what merge_asof was made for. However, the shape of the data merge_asof seems to require seems strange to me. I think it should work with the data in the form it's currently in.

Feature Description

Create some sample data:

import pandas as pd
import numpy as np

# sample data for trades dataframe
trades_data = {'date': ['2022-01-01', '2022-01-01', '2022-01-02', '2022-01-02'],
               'symbol': ['AAPL', 'MSFT', 'AAPL', 'MSFT'],
               'time': [2200, 1200, 2200, 1200],
               'trade_price': [100, 200, 150, 50]}

# create trades dataframe
trades = pd.DataFrame(trades_data)

# set the multi-index for the trades dataframe
trades.set_index(['date', 'symbol', 'time'], inplace=True)
trades.sort_index(inplace=True)

# sample data for quotes dataframe
quotes_data = {'date': ['2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-02', '2022-01-02', '2022-01-02', '2022-01-02'],
               'symbol': ['AAPL', 'MSFT', 'AAPL', 'MSFT', 'AAPL', 'MSFT', 'AAPL', 'MSFT'],
               'time': [2000, 1000, 2500, 1500, 2000, 1000, 2500, 1500],
               'quote_price': [90, 10, 95, 15, 105, 25, 40, 45]}

# create quotes dataframe
quotes = pd.DataFrame(quotes_data)

# set the multi-index for the quotes dataframe
quotes.set_index(['date', 'symbol', 'time'], inplace=True)
quotes.sort_index(inplace=True)

image
image

pd.merge_asof(trades, quotes, left_index=True, right_index=True)

MergeError: left can only have one index
pd.merge_asof(trades, quotes, on='time', by=['date', 'symbol'])

ValueError: left keys must be sorted

Alternative Solutions

trades.reset_index(inplace=True)
trades.sort_values(by=['time'], inplace=True)

quotes.reset_index(inplace=True)
quotes.sort_values(by=['time'], inplace=True)

pd.merge_asof(trades, quotes, on='time', by=['date', 'symbol'])

image

OR

result = trades.merge(quotes, on=['date', 'symbol', 'time'], how='outer', sort=True)
result['quote_price'] = result.groupby(['date', 'symbol'])['quote_price'].ffill()
result.dropna(subset=['trade_price'], inplace=True)

image

Additional Context

No response

@ddrinka ddrinka added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 14, 2023
@mroeschke mroeschke added Reshaping Concat, Merge/Join, Stack/Unstack, Explode MultiIndex and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 17, 2024
@mroeschke mroeschke changed the title ENH: merge_asof should support hierarchical data ENH: merge_asof should support MultiIndex Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

2 participants