Open
Description
First of all: TimeGapSplit
is a super useful feature!
Unfortunately, if I'm not mistaken, TimeGapSplit
currently does not support monthly or yearly data. This follows from the design choice to use timedelta
to construct the train and validation sets (timedelta
does not support months or years). For example:
df = (
pd.DataFrame(
data=np.random.randint(0, 30, size=(30, 4)),
columns=list('ABCy')
)
.assign(
date=pd.date_range(start='1/1/2018', end='1/30/2018')[::-1]
)
)
cv = TimeGapSplit(
df=df,
date_col='date',
train_duration=timedelta(months=1),
valid_duration=timedelta(months=1),
)
raises TypeError: 'months' is an invalid keyword argument for __new__()
.
This could maybe be fixed by using pd.DateOffset
over timedelta
.
Tomorrow, I probably have time to look into this, so any guidance or feedback would be very much appreciated @kayhoogland @stephanecollot.