Skip to content

[R-package] add support for specifying training indices in lgb.cv() #3924

Closed

Description

Summary

The addition of a train_folds argument in lgb.cv() would allow for more fine-grained folds generation that is useful in some scenarios, such as time series forecasting (just like xgboost R package does).

Motivation

The R function lgb.cv() currently has got an argument that allows you to specify manual folds through the folds argument. This argument expects a list of indices that should go to the test set for each fold, and all the other indices will go to the train set.

However, in some types of datasets and tasks (such as in time series), you may actually want to have different folds where some indices are just not used for that specific fold, neither in the train or test sets (just to avoid leaking information from the future).

Description

The Xgboost R package included this feature a while back, which essentially consists of adding one more argument to the cv() function called train_folds. This way, train_folds, if specified, makes sure that only those indices will go to the train set in each fold. If it is not specified, the train indices will just be the opposite of the ones in the folds argument, just like lgb.cv() works right now.

References

Please see the train_folds argument in xgb.cv() here, and the relevant code in xgb can be found here

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions