Description
I would like a time-based aggregating function that also acts as a join:
http://code.kx.com/wiki/Reference/wj
In pandas, this might look something like:
ms = pd.Timedelta(1, 'ms')
merge_window(df1, df2, on='time', by='ticker', time_range=(-100*ms, 100*ms), how=(np.sum, 'volume'))
This would compute the total volume from df2 whose timestamps are within (-100, 100) ms of df1’s timestamps and whose tickers match.
I imagine I could specify different columns:
merge_window(df1, df2, left_on='trade_time', right_on='exchange_time', time_range=(0*ms, 500*ms),
how={'total_volume':(np.sum, 'volume'), 'mean_volume':(np.mean, 'volume')})
By the way, this is a more general form of the .rolling()
functions @jreback wrote recently since I can just use one DataFrame as both parameters:
merge_window(df, df, on='time', time_range=(-5*ms, 0*ms), how=[(np.median, 'price'), (np.sum, 'volume')])
My firm has custom business calendars for things like market hours as well as exchange holidays. The kdb version takes arrays of timestamps directly for the begin and end, which handles the general-purpose case of custom business calendars. So I imagine could get the five-day average of volume with:
# internal functions
cal = get_our_internal_business_calendar()
begin = adjust_our_time(df.time, -5, cal)
end = adjust_our_time(df.time, 0, cal)
# use values directly
merge_window(df1, df2, on='time', begin_times=begin, end_times=end, how=(np.mean, 'volume'))
I can get started on this right away if my proposal makes sense.