Skip to content

[RFC] [python-package] remove h2o datatable support? #6662

Open
@jameslamb

Description

Summary

Support for the h2o's datatable library was added to LightGBM 5.5+ years ago, in #1970.

Proposing here that lightgbm:

  • issue a deprecation warning for the next 2-3 releases whenever datatable is used
  • permanently remove datatable support 2-3 releases from now

Motivation

That project seems to be abandoned:

In those 5.5 years since #1970, the only bug reports / feature requests received about datatable support have been from one person working for h2o... and the last of those was 4 years ago:

And in all that time, I don't think we have ever tested against datatable in CI.

Description

Doing this would simplify the Python package, making it easier for others to contribute.

It'd also make it more manageable to add support for newer, more popular input formats like polars (#6204).

See @trivialfis's summary of the current state of supporting data frame libraries at dmlc/xgboost#10554 (comment) ... I agree with it.

References

I am not proposing here that lightgbm should support H2OFrame... Dask doesn't, XGBoost doesn't, scikit-learn doesn't... and I think our limited time and attention here would be better spent on more widely-used input formats, like polars.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions