Closed
Description
Describe the bug
SIGABRT, seemingly from out of bounds
Steps/Code to reproduce bug
Unknown, but paraameters were just Kaggle Paribas with some various Frequency encoding features to get to (91457, 331)
size.
parameters
OrderedDict([('output_type', 'numpy'), ('random_state', 840607124), ('verbose', False), ('n_estimators', 200), ('n_bins', 128), ('split_criterion', 1), ('max_depth', 18), ('max_leaves', 1024), ('max_features', 'auto'), ('min_samples_leaf', 1), ('min_samples_split', 10), ('min_impurity_decrease', 0.0)])
For a binary classification problem.
No messages in console at all, even though ran in debug mode with verbose=4. All I got was SIGABRT and in dmesg
this:
[Sun Jul 11 21:15:41 2021] NVRM: GPU at PCI:0000:01:00: GPU-0bb167f8-b3cd-8df7-9644-d5f95716e554
[Sun Jul 11 21:15:41 2021] NVRM: GPU Board Serial Number:
[Sun Jul 11 21:15:41 2021] NVRM: Xid (PCI:0000:01:00): 13, pid=2041, Graphics SM Warp Exception on (GPC 3, TPC 3, SM 0): Out Of Range Address
[Sun Jul 11 21:15:41 2021] NVRM: Xid (PCI:0000:01:00): 13, pid=2041, Graphics SM Global Exception on (GPC 3, TPC 3, SM 0): Multiple Warp Errors
[Sun Jul 11 21:15:41 2021] NVRM: Xid (PCI:0000:01:00): 13, pid=2041, Graphics Exception: ESR 0x51df30=0xc13000e 0x51df34=0x24 0x51df28=0x4c1eb72 0x51df2c=0x174
[Sun Jul 11 21:15:41 2021] NVRM: Xid (PCI:0000:01:00): 43, pid=6304, Ch 00000088
[Sun Jul 11 21:15:54 2021] NVRM: Xid (PCI:0000:01:00): 13, pid=6304, Graphics SM Warp Exception on (GPC 4, TPC 2, SM 1): Out Of Range Address
[Sun Jul 11 21:15:54 2021] NVRM: Xid (PCI:0000:01:00): 13, pid=6304, Graphics SM Global Exception on (GPC 4, TPC 2, SM 1): Multiple Warp Errors
[Sun Jul 11 21:15:54 2021] NVRM: Xid (PCI:0000:01:00): 13, pid=6304, Graphics Exception: ESR 0x5257b0=0xc12000e 0x5257b4=0x24 0x5257a8=0x4c1eb72 0x5257ac=0x174
[Sun Jul 11 21:15:54 2021] NVRM: Xid (PCI:0000:01:00): 43, pid=8874, Ch 00000088
Expected behavior
Not to crash, be more stable.
Environment details (please complete the following information):
- Environment location: Bare-metal
- Linux Distro/Architecture: Ubuntu 18.04LTS
- GPU Model/Driver: RTX2080 460.80
- CUDA: 11.2.2
- Method of cuDF & cuML install: conda nightly 21.08 -- nightly as of 7 days ago.
Additional context
If hit again will try to produce repro. But I expect just various testing on NVIDIA's side will reveal. I've only been using CUML RF for a day and already hit this after (maybe) 200 fits on small data.