Training fails bagging_freq > 1 and bagging_fraction is very small #6622
Description
Hello,
We've recently encountered a problematic edge case with lightgbm.
When simultaneously using bagging and training on a single data point, the model training fails.
Our expectations would have been that the model disregards any bagging mechanisms.
While training a model on a single data point is surely questionable from an analytical point of view, we regularly train millions of models (with the same hyper-parameter set) and cannot guarantee that the amount of training samples exceeds 1 for all of them.
Is there any rationales behind this behaviour? How would you reckon to best go about this one?
Reproducible example
import pandas as pd
import lightgbm as lgbm
data = pd.DataFrame({"FEATURE_1": [0], "FEATURE_2": [1]})
label = pd.Series([1])
train_dataset = lgbm.Dataset(data=data, label=label)
params = {
"seed": 1,
"bagging_fraction": 0.5,
"bagging_freq": 5,
}
lgbm.train(params=params, train_set=train_dataset)
Executing this code snippet leads to this error:
lightgbm.basic.LightGBMError: Check failed: (num_data) > (0)
But by setting bagging_fraction to 1, the model is correctly trained (and has a single leaf with output 1).
Environment info
python=3.10
pandas=2.2.2
lightgbm=4.5.0
Additional Comments
It seems like the error is raised when bagging_fraction * num_samples < 1
Activity