Skip to content

New feature/enhancement request: Merge dataframes where one value is between two others #10309

Open
@stoffprof

Description

In contrast to SQL, it is not currently possible to merge dataframes with inequality constraints. Could this be added?

As an example (which I previously posted on StackOverflow), I need to merge two pandas dataframes on an identifier and a condition where a date in one dataframe is between two dates in the other dataframe.

Dataframe A has a date ("fdate") and an ID ("cusip"):

I need to merge this with this dataframe B:

In SQL this would be trivial, but the only way I can see how to do this in pandas is to first merge unconditionally on the identifier, and then filter on the date condition:

df = pd.merge(A, B, how='inner', left_on='cusip', right_on='ncusip')
df = df[(df['fdate']>=df['namedt']) & (df['fdate']<=df['nameenddt'])]

The problem with this approach is that after the merge, but before the filtering step, df can become unnecessarily large. Presumably this could be avoided if the filtering were done as part of the merge process.

Metadata

Assignees

No one assigned

    Labels

    EnhancementReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions