Skip to content

[FEATURE] Add support for monthly and yearly data in TimeGapSplit #190

Open
@rubenvdg

Description

@rubenvdg

First of all: TimeGapSplit is a super useful feature!

Unfortunately, if I'm not mistaken, TimeGapSplit currently does not support monthly or yearly data. This follows from the design choice to use timedelta to construct the train and validation sets (timedelta does not support months or years). For example:

df = (
    pd.DataFrame(
        data=np.random.randint(0, 30, size=(30, 4)),
        columns=list('ABCy')
    )
    .assign(
        date=pd.date_range(start='1/1/2018', end='1/30/2018')[::-1]
    )
)

cv = TimeGapSplit(
    df=df,
    date_col='date',
    train_duration=timedelta(months=1),
    valid_duration=timedelta(months=1),
)

raises TypeError: 'months' is an invalid keyword argument for __new__().
This could maybe be fixed by using pd.DateOffset over timedelta.

Tomorrow, I probably have time to look into this, so any guidance or feedback would be very much appreciated @kayhoogland @stephanecollot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions