Skip to content

Training fails bagging_freq > 1 and bagging_fraction is very small #6622

Open
@YovliDuvshani

Description

Hello,

We've recently encountered a problematic edge case with lightgbm.
When simultaneously using bagging and training on a single data point, the model training fails.
Our expectations would have been that the model disregards any bagging mechanisms.

While training a model on a single data point is surely questionable from an analytical point of view, we regularly train millions of models (with the same hyper-parameter set) and cannot guarantee that the amount of training samples exceeds 1 for all of them.

Is there any rationales behind this behaviour? How would you reckon to best go about this one?

Reproducible example

import pandas as pd
import lightgbm as lgbm

data = pd.DataFrame({"FEATURE_1": [0], "FEATURE_2": [1]})
label = pd.Series([1])
train_dataset = lgbm.Dataset(data=data, label=label)

params = {
    "seed": 1,
    "bagging_fraction": 0.5,
    "bagging_freq": 5,
}

lgbm.train(params=params, train_set=train_dataset)

Executing this code snippet leads to this error:

lightgbm.basic.LightGBMError: Check failed: (num_data) > (0)
Screenshot 2024-08-26 at 15 32 01

But by setting bagging_fraction to 1, the model is correctly trained (and has a single leaf with output 1).

Environment info

python=3.10
pandas=2.2.2
lightgbm=4.5.0

Additional Comments

It seems like the error is raised when bagging_fraction * num_samples < 1

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions