-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Labels
Description
Description
LightGBM with device='cuda' crashes with SIGFPE (Floating point exception) when training on discrete data where the product of unique values and number of features exceeds a certain threshold.
Reproducible Example
import lightgbm as lgb
import numpy as np
# FAIL: 5 discrete values × 600 features
X = np.random.randint(0, 5, (50000, 600)).astype(np.float32)
y = np.random.uniform(0, 1, 50000).astype(np.float32)
model = lgb.LGBMRegressor(device='cuda', n_estimators=10, verbose=-1)
model.fit(X, y) # SIGFPE: Floating point exception (core dumped)Test Results
Tested with 50,000 rows:
| Unique Values | 500 cols | 600 cols | 700 cols |
|---|---|---|---|
| 2 | Pass | Pass | Pass |
| 3 | Pass | Pass | SIGFPE |
| 4 | Pass | Pass | SIGFPE |
| 5 | Pass | SIGFPE | SIGFPE |
| 6 | Pass | SIGFPE | SIGFPE |
| 7 | Pass | SIGFPE | SIGFPE |
| 8 | Pass | SIGFPE | SIGFPE |
Observed Pattern
The crash threshold depends on both the number of unique values and the number of features:
| Unique Values | Approximate Safe Column Limit |
|---|---|
| 2 | 700+ |
| 3-4 | 600-700 |
| 5+ | 500 |
This suggests a relationship between n_unique * n_features and available CUDA histogram bins.
Workaround
Adding tiny noise converts discrete values to continuous and avoids the crash:
X = X.astype(np.float32)
X += np.random.uniform(-1e-6, 1e-6, X.shape).astype(np.float32)
# Training now succeedsReal-World Impact
This bug affects Numerai tournament data:
- 2.7M rows × 2376 features
- int8 dtype with values {0, 1, 2, 3, 4} (5 discrete values)
- Always triggers SIGFPE with CUDA
Environment
- LightGBM version: 4.6.0.99 (source-built with GCC 10)
- CUDA version: 12.6
- GPU: NVIDIA RTX 5000 Ada Generation (Compute Capability 8.9)
- Driver: 572.16
- OS: Windows 11 + WSL2 (Ubuntu) + Docker (nvidia-docker)
- Python: 3.10
Build Command
cmake -B build -S . \
-DUSE_CUDA=1 \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=gcc-10 \
-DCMAKE_CXX_COMPILER=g++-10
cmake --build build -j$(nproc)
cd python-package && pip install .Notes
- CPU training works fine with the same data
- The issue appears to be in CUDA histogram binning logic
- Row count does not significantly affect the threshold
jameslambCopilot