Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge_asof on multiple columns? #20369

Open
lorenzori opened this issue Mar 15, 2018 · 17 comments
Open

merge_asof on multiple columns? #20369

lorenzori opened this issue Mar 15, 2018 · 17 comments
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@lorenzori
Copy link

Thank you very much for your work Pandas people! I have the error pandas.errors.MergeError: can only asof on a key for left when I run the code below:

df1.sort_values(['gpsLongitude','gpsLatitude'], inplace=True)
df2.sort_values(['x','y'], inplace=True)

res = pd.merge_asof(df1, df2, left_by=['gpsLongitude','gpsLatitude'], right_by=['x','y'] )

Problem description

it seems that merge_asof does not accept list of columns to merge on. Documentation seems to say that the left_on argument is a "label" and not a list.

So maybe what I am asking is for an enhancement? It just seems weird that it can only merge on 1 column.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.4.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.22.0 pytest: None pip: 9.0.1 setuptools: 36.6.0 Cython: None numpy: 1.13.3 scipy: 0.19.1 pyarrow: None xarray: None IPython: 6.1.0 sphinx: 1.3.1 patsy: 0.4.1 dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.2.0 openpyxl: None xlrd: 1.0.0 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.9999999 sqlalchemy: 1.1.13 pymysql: None psycopg2: 2.7.3.1 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
@vlad-tokarev
Copy link

Very actual

@werenike
Copy link

werenike commented Apr 27, 2018

I looked into this a little bit and by removing these checks, I was able to merge on multiple keys and it seems to work, also with direction and tolerance arguments. Here's my monkey patch code. However I'm not sure whether the results are correct/as you would expect. I think it may just use the last key in the list. I think that may be changed here but that results in an 'truth value of array is ambiguous' error down the line.

@lorenzori
Copy link
Author

mmm the checks seem pretty clear that it can only work with 1 key, wonder why but I guess there must be a reason! If you look into the tests for merge_asof they all use just 1 key per side.

@akilawickey
Copy link

Hi,

I also got same problem when trying to select multiple keys. Great if we can have a fix for this.

@tyronecragg
Copy link

I've just stumbled upon this issue as well. I wonder if it would be practical to allow for this?

@shabha7092
Copy link

Hi,

I have come across this issue as well. It will be great if we can fix for this.

@adsmaniotto
Copy link

+1 on this, I think it would be a great enhancement if possible.

@mroeschke mroeschke added Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jan 13, 2019
@tbaer-c7ks7s
Copy link

tbaer-c7ks7s commented Feb 4, 2019

how would this work? this seems like an impossible request

take the OP as an example: you need some way to trade off 'closeness' in one field and 'closeness' in the second if there are no exact matches.

I would suggest defining and calculating a common metric yourself. for example, the haversine distance between two lat/longs. of course, that would mean calculating the distance between every point possible, which kind of defeats the purpose of merge_asof

what might be more helpful is some combination of regular merge and asof, e.g. exact match on one pair of fields, nearest match on another

@timurhamzin
Copy link

There is no need to calculate distance between EVERY point if one specifies tolerance criteria for separate merge keys. The common metric could be added as another parameter (function taking arbitrary number of arguments).

@mvirts
Copy link
Contributor

mvirts commented Feb 27, 2020

Since the asof merge requires pre-sorted data, wouldn't a multi-key merge be the same as merging on a single column of sorted tuples?

@tianhuat
Copy link

bump into this, and need this feature...

@jreback
Copy link
Contributor

jreback commented Apr 6, 2021

If anyone has a good motivation example please indicate it with pseudo code and full inputs and outputs

@MayaSandler
Copy link

Bump here as well. I need this feature. Would love to know if there is a development since 2018.

@jreback
Copy link
Contributor

jreback commented Dec 21, 2021

pandas is open source and all volunteer

community contributions are how things are implemented

@galenweld
Copy link

It took me a bit to interpret the documentation, but as I understand it, what @tbaer-c7ks7s suggested:

what might be more helpful is some combination of regular merge and asof, e.g. exact match on one pair of fields, nearest match on another

Is already implemented using the by parameter in addition to the on parameter(s). The by columns are used with an exact match, and then the on column is used for the nearest match.

The "real world time-series example" from the documentation makes this clear:

>>> pd.merge_asof(trades, quotes, on="time", by="ticker")
                     time ticker   price  quantity     bid     ask
0 2016-05-25 13:30:00.023   MSFT   51.95        75   51.95   51.96
1 2016-05-25 13:30:00.038   MSFT   51.95       155   51.97   51.98
2 2016-05-25 13:30:00.048   GOOG  720.77       100  720.50  720.93
3 2016-05-25 13:30:00.048   GOOG  720.92       100  720.50  720.93
4 2016-05-25 13:30:00.048   AAPL   98.00       100     NaN     NaN

And I have also confirmed that by works with multiple columns when passed as a list of column names.

@joshdunnlime
Copy link

pandas is open source and all volunteer

community contributions are how things are implemented

I think the OP has a great use case, which is the same as mine; namely joining/snapping lat, lon coorinate pairs to a lat, lon coordinate grid.

I think in the meantime, if it isn't possible, would adding some clearer documentation saying you can only join with a single column on by be more helpful?

@daviddavo
Copy link

Just to add another use case: I'm doing a metaheuristic search in a continuous hyperparameter space and I want to merge two experiments. Even though the experiment uses a subset of the hparams tried in the first experiment, because I used floats I can't use a normal join.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests