-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Add support for monthly and yearly data in TimeGapSplit #190
Comments
Hi,
Sounds good!
We could try to support both maybe.
Cheers
…On Thu, 19 Sep 2019 at 09:27, Ruben van de Geer ***@***.***> wrote:
First of all: TimeGapSplit is a super useful feature!
Unfortunately, if I'm not mistaken, TimeGapSplit currently does not
support monthly or yearly data. This follows from the design choice to use
timedelta to construct the train and validation sets (timedelta does not
support months or years). For example:
df = (
pd.DataFrame(
data=np.random.randint(0, 30, size=(30, 4)),
columns=list('ABCy')
)
.assign(
date=pd.date_range(start='1/1/2018', end='1/30/2018')[::-1]
)
)
cv = TimeGapSplit(
df=df,
date_col='date',
train_duration=timedelta(months=1),
valid_duration=timedelta(months=1),
)
raises TypeError: 'months' is an invalid keyword argument for __new__().
This could maybe be fixed by using pd.DateOffset over timedelta.
Tomorrow, I probably have time to look into this, so any guidance or
feedback would be very much appreciated @kayhoogland
<https://github.com/kayhoogland> @stephanecollot
<https://github.com/stephanecollot>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#190?email_source=notifications&email_token=AAJBFDHOO3KWNLMOJQZ7NSLQKMSVPA5CNFSM4IYHOK7KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HMKTBPA>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJBFDDP7DYZHTMTAZRT2K3QKMSVPANCNFSM4IYHOK7A>
.
|
@koaning I think I have something similarly laying around somewhere (quarters in that case), I can add it here if it's still relevant? |
I'd say so! If you add quarters though, feel free to also add months/years. I think pandas also has a special syntax for timeslots (that thing they use in |
Alright! The current version does something like In my version I decided to opt for passing the dates to |
I recall there being a good reason for the current design on that front. I also think this is a feature folks tend to use a lot so I wouldn't want to deprecate it without giving it a good pause and consideration. Could you describe the api you've got in mind here? Using python-pseudo-code? |
Probably because that way it's easier to use in GridSearch? I think the biggest difference is that the current approach can do:
whereas my approach would be called like this:
where
In that sense the first approach is more similar to the I would understand if you want to keep it as close to the |
On another note: I noticed that the current version only supports dataframes as input, so I'll fix that as well |
Yes it is better to be able to give a CV splitter class. Plus there is |
Dates in init vs split Another solution might be to force Questions about design
Maybe a new splitter is better?
|
First of all:
TimeGapSplit
is a super useful feature!Unfortunately, if I'm not mistaken,
TimeGapSplit
currently does not support monthly or yearly data. This follows from the design choice to usetimedelta
to construct the train and validation sets (timedelta
does not support months or years). For example:raises
TypeError: 'months' is an invalid keyword argument for __new__()
.This could maybe be fixed by using
pd.DateOffset
overtimedelta
.Tomorrow, I probably have time to look into this, so any guidance or feedback would be very much appreciated @kayhoogland @stephanecollot.
The text was updated successfully, but these errors were encountered: