[python-package] Allow using only the last dataset for early stopping #6360
Description
Summary
I would like to add a parameter, last_dataset_only
or similar naming, to lightgbm.callback.early_stopping
that would set early stopping to use the last item of eval_set
only.
Motivation
There are situations where it's desirable to have multiple evaluation sets. Sometimes we want to record the evaluation results at each iteration for multiple datasets, but only use one for early stopping. The way XGBoost deals with this is by using only the last item of the eval_set
to determine early stopping. We could score the model at each iteration to recreate the evaluation history, but this is inefficient.
This is also important for us when we are developing tools or pipelines that we want to be compatible with both LightGBM and XGBoost, like implementing feature selection or model selection algorithms/utilities that we want to be able to work with both.
Description
In the early stopping callback, LightGBM will use all datasets provided for early stopping. This would add a parameter, last_dataset_only
or similar naming, to lightgbm.callback.early_stoppingthat would set early stopping to use the last item of
eval_set` only to determine when to early stop.
I would like the following to create an early stopping callback that would use only the first metric from the last dataset in eval_set
to early stop, but would still score on every dataset in eval_set
:
from lightgbm.callback import early_stopping
es_cb = early_stopping(5, first_metric_only=True, last_dataset_only=True)
I'm not super familiar with the LGBM codebase and what, if anything, would need to be changed in the codebase besides the early stopping callback, but for what it's worth, I have a working version of a modified early stopping callback that I'm happy to work with you to contribute.
Activity