Skip to content

kdb-like window join for pandas #13959

Open
@chrisaycock

Description

@chrisaycock

I would like a time-based aggregating function that also acts as a join:

http://code.kx.com/wiki/Reference/wj

In pandas, this might look something like:

ms = pd.Timedelta(1, 'ms')
merge_window(df1, df2, on='time', by='ticker', time_range=(-100*ms, 100*ms), how=(np.sum, 'volume'))

This would compute the total volume from df2 whose timestamps are within (-100, 100) ms of df1’s timestamps and whose tickers match.

I imagine I could specify different columns:

merge_window(df1, df2, left_on='trade_time', right_on='exchange_time', time_range=(0*ms, 500*ms),
             how={'total_volume':(np.sum, 'volume'), 'mean_volume':(np.mean, 'volume')})

By the way, this is a more general form of the .rolling() functions @jreback wrote recently since I can just use one DataFrame as both parameters:

merge_window(df, df, on='time', time_range=(-5*ms, 0*ms), how=[(np.median, 'price'), (np.sum, 'volume')])

My firm has custom business calendars for things like market hours as well as exchange holidays. The kdb version takes arrays of timestamps directly for the begin and end, which handles the general-purpose case of custom business calendars. So I imagine could get the five-day average of volume with:

# internal functions
cal = get_our_internal_business_calendar()
begin = adjust_our_time(df.time, -5, cal)
end = adjust_our_time(df.time, 0, cal)

# use values directly
merge_window(df1, df2, on='time', begin_times=begin, end_times=end, how=(np.mean, 'volume'))

I can get started on this right away if my proposal makes sense.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DatetimeDatetime data dtypeEnhancementReshapingConcat, Merge/Join, Stack/Unstack, ExplodeWindowrolling, ewma, expanding

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions