Training fails bagging_freq > 1 and bagging_fraction is very small

Hello,

We've recently encountered a problematic edge case with lightgbm. 
When simultaneously using bagging and training on a single data point, the model training fails. 
Our expectations would have been that the model disregards any bagging mechanisms.

While training a model on a single data point is surely questionable from an analytical point of view, we regularly train millions of models (with the same hyper-parameter set) and cannot guarantee that the amount of training samples exceeds 1 for all of them.

Is there any rationales behind this behaviour? How would you reckon to best go about this one?

**Reproducible example**

```python
import pandas as pd
import lightgbm as lgbm

data = pd.DataFrame({"FEATURE_1": [0], "FEATURE_2": [1]})
label = pd.Series([1])
train_dataset = lgbm.Dataset(data=data, label=label)

params = {
    "seed": 1,
    "bagging_fraction": 0.5,
    "bagging_freq": 5,
}

lgbm.train(params=params, train_set=train_dataset)
```

Executing this code snippet leads to this error:

```text
lightgbm.basic.LightGBMError: Check failed: (num_data) > (0)
```

<img width="502" alt="Screenshot 2024-08-26 at 15 32 01" src="https://github.com/user-attachments/assets/bb944868-dffd-4074-9810-63992c0b540e">

But by setting bagging_fraction to 1, the model is correctly trained (and has a single leaf with output 1).

**Environment info**

python=3.10
pandas=2.2.2
lightgbm=4.5.0 

**Additional Comments**

It seems like the error is raised when `bagging_fraction * num_samples < 1`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training fails bagging_freq > 1 and bagging_fraction is very small #6622

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development