[RFC] [python-package] remove h2o datatable
support? #6662
Description
Summary
Support for the h2o's datatable
library was added to LightGBM 5.5+ years ago, in #1970.
Proposing here that lightgbm
:
- issue a deprecation warning for the next 2-3 releases whenever
datatable
is used - permanently remove
datatable
support 2-3 releases from now
Motivation
That project seems to be abandoned:
- last commit and PyPI release was 10 months ago:
- seems that h2o has moved on to something new called
H2OFrame
in theh20-py
package:
In those 5.5 years since #1970, the only bug reports / feature requests received about datatable
support have been from one person working for h2o... and the last of those was 4 years ago:
- (Sep 2020) Support h2o datatable and numpy types, including for categorical types #3386
- (Feb 2019) implement datatable ingest directly into lightgbm #2003
- (Jan 2019) memory leak with H2O's DataTable #1968
And in all that time, I don't think we have ever tested against datatable
in CI.
Description
Doing this would simplify the Python package, making it easier for others to contribute.
It'd also make it more manageable to add support for newer, more popular input formats like polars
(#6204).
See @trivialfis's summary of the current state of supporting data frame libraries at dmlc/xgboost#10554 (comment) ... I agree with it.
References
I am not proposing here that lightgbm
should support H2OFrame
... Dask doesn't, XGBoost doesn't, scikit-learn
doesn't... and I think our limited time and attention here would be better spent on more widely-used input formats, like polars
.
Activity