Skip to content

[CUDA] SIGFPE (Floating point exception) with discrete data when n_unique_values * n_features exceeds threshold #7122

@hiromuhana

Description

@hiromuhana

Description

LightGBM with device='cuda' crashes with SIGFPE (Floating point exception) when training on discrete data where the product of unique values and number of features exceeds a certain threshold.

Reproducible Example

import lightgbm as lgb
import numpy as np

# FAIL: 5 discrete values × 600 features
X = np.random.randint(0, 5, (50000, 600)).astype(np.float32)
y = np.random.uniform(0, 1, 50000).astype(np.float32)

model = lgb.LGBMRegressor(device='cuda', n_estimators=10, verbose=-1)
model.fit(X, y)  # SIGFPE: Floating point exception (core dumped)

Test Results

Tested with 50,000 rows:

Unique Values 500 cols 600 cols 700 cols
2 Pass Pass Pass
3 Pass Pass SIGFPE
4 Pass Pass SIGFPE
5 Pass SIGFPE SIGFPE
6 Pass SIGFPE SIGFPE
7 Pass SIGFPE SIGFPE
8 Pass SIGFPE SIGFPE

Observed Pattern

The crash threshold depends on both the number of unique values and the number of features:

Unique Values Approximate Safe Column Limit
2 700+
3-4 600-700
5+ 500

This suggests a relationship between n_unique * n_features and available CUDA histogram bins.

Workaround

Adding tiny noise converts discrete values to continuous and avoids the crash:

X = X.astype(np.float32)
X += np.random.uniform(-1e-6, 1e-6, X.shape).astype(np.float32)
# Training now succeeds

Real-World Impact

This bug affects Numerai tournament data:

  • 2.7M rows × 2376 features
  • int8 dtype with values {0, 1, 2, 3, 4} (5 discrete values)
  • Always triggers SIGFPE with CUDA

Environment

  • LightGBM version: 4.6.0.99 (source-built with GCC 10)
  • CUDA version: 12.6
  • GPU: NVIDIA RTX 5000 Ada Generation (Compute Capability 8.9)
  • Driver: 572.16
  • OS: Windows 11 + WSL2 (Ubuntu) + Docker (nvidia-docker)
  • Python: 3.10

Build Command

cmake -B build -S . \
    -DUSE_CUDA=1 \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_C_COMPILER=gcc-10 \
    -DCMAKE_CXX_COMPILER=g++-10
cmake --build build -j$(nproc)
cd python-package && pip install .

Notes

  • CPU training works fine with the same data
  • The issue appears to be in CUDA histogram binning logic
  • Row count does not significantly affect the threshold

Metadata

Metadata

Labels

buggpu (CUDA)Issue is related to the CUDA GPU variant.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions