Description
Summary
The addition of a train_folds
argument in lgb.cv()
would allow for more fine-grained folds generation that is useful in some scenarios, such as time series forecasting (just like xgboost R package does).
Motivation
The R function lgb.cv()
currently has got an argument that allows you to specify manual folds through the folds
argument. This argument expects a list
of indices that should go to the test set for each fold, and all the other indices will go to the train set.
However, in some types of datasets and tasks (such as in time series), you may actually want to have different folds where some indices are just not used for that specific fold, neither in the train or test sets (just to avoid leaking information from the future).
Description
The Xgboost R package included this feature a while back, which essentially consists of adding one more argument to the cv()
function called train_folds
. This way, train_folds
, if specified, makes sure that only those indices will go to the train set in each fold. If it is not specified, the train indices will just be the opposite of the ones in the folds
argument, just like lgb.cv()
works right now.
References
Please see the train_folds
argument in xgb.cv()
here, and the relevant code in xgb can be found here
Thank you!