Open
Description
openedon Aug 1, 2019
This issue is to maintain all features request on one page.
Note to contributors: If you want to work for a requested feature, re-open the linked issue. Everyone is welcome to work on any of the issues below.
Note to maintainers: All feature requests should be consolidated to this page. When there are new feature request issues, close them and create the new entries, with the link to the issues, in this page. The one exception is issues marked good first issue
...these should be left open so they are discoverable by new contributors.
Call for Voting
we would like to call the voting here, to prioritize these requests.
If you think a feature request is very necessary for you, you can vote for it by the following process:
- got the issue (feature request) number.
- search the number in this issue, check the voting of it exists or not.
- if the voting exists, you can add 👍 to that voting
- if the voting doesn't exist, you can create a new voting by replying to this thread, and add the number in the it.
Discussions
- Efficiency improvements ([Discussion] efficiency improvements #2791)
- Accuracy improvements ([Discussion] accuracy improvements #2790)
Efficiency related
- Lock-free Inference (Lock-free Inference #4290)
- Faster Split (data partition) ([feature request] Faster Split (data partition) #2782)
- Numa-aware (LightGBM does not like NUMA (large performance impact on servers) #1441)
- Enable MM_PREFETCH and MM_MALLOC on aarch64 (Enable MM_PREFETCH and MM_MALLOC on aarch64 #4124)
- Optimisations for Apple Silicon (Optimisations for Apple Silicon #3606)
- Continued accerelate ConstructHistogram ([feature request] continued accerelate ConstructHistogram #2786)
- Accelerate the data loading from file ([feature request] accelerate the data loading from file #2788)
- Accelerate the data loading from Python/R object ([feature request] accelerate the data loading from Python/R object #2789)
- Fast feature grouping when the number of features is large (Slow dataset creation #4037)
- Allow training without loading full dataset into memory (Allow training without loading full dataset into memory #5094)
- Random number generation on CUDA ([CUDA] Random number generation on CUDA #5471)
- Faster LambdaRank ([feature request] faster lambdarank #2701)
- Improve efficiency of tree output renew for L1 regression with new CUDA version ([CUDA] Improve efficiency of tree output renew for L1 regression with new CUDA version #5459)
Effectiveness related
- Better Regularization for Categorical features ([Enhancement] Better Regularization for Categorical features #1934)
- Raise a warning when missing values are found in
label
(Warn when passing labels with missing values #4483) - Support monotone constraints with quantile objective (Monotone constraint with Quantile distribution does not work correctly #3371)
- Pairwise Ranking/Scoring in LambdaMART (Introduce Pairwise Ranking/Scoring in LambdaMART #6147)
Distributed platform and GPU (OpenCL-based and CUDA)
- YARN support (Distribution with YARN #790)
- Multiple GPU support (OpenCL version) ([GPU] multi-gpu #620)
- GPU performance improvement ([GPU] further improving GPU performance #768)
- Implement workaround for required folder permissions to compile kernels (LightGBM GPU requires folder permission to compile kernels #2955)
- Support for single precision float in CUDA version (Support for single precision float in CUDA version #3836)
- Support Windows in CUDA version (Support Windows in CUDA version #3837)
- Support LightGBM on macOS with real (possibly external) GPU device (Support LightGBM on macOS with real (possibly external) GPU device #4333)
- multi-node, multi-GPU training with CUDA version ([CUDA] add multi-node multi-GPU support for new CUDA version and improve efficiency #5993)
- Build Python wheels that support both GPU and CPU versions out of the box for non-Windows (Build Python wheels that support both GPU and CPU versions out of the box for non-Windows #4684)
- GPU binaries release (Suggestion: Also create a lightgbm-gpu release #2263)
- Support GPU with the conda-forge package ([python-package] support GPU training with the conda-forge package #5419)
Maintenance
- Run tests with ClangCL and Visual Studio on our CI services (Run tests with ClangCL and Visual Studio on our CI services #5280)
- Missing values during prediction should throw an Exception if missing data wasn't present during training (missing values during prediction should throw an Exception if missing data wasn't present during training #4040)
- Better document missing values behavior during prediction (What happens with missing values during prediction? #2921)
- Code refactoring (Code Refactoring #2341)
- Support int64_t data_size_t ([feature request] Support int64_t data_size_t #2818)
- Unify out results of
LGBM_BoosterDumpModel
andLGBM_BoosterSaveModel
(Unify out results of LGBM_BoosterDumpModel and LGBM_BoosterSaveModel #2604) - More tests (Need much more tests #261, Write tests for parallel code #3841)
- Publish
lib_lightgbm.dll
symbols to Microsoft Symbols Server (Publish lib_lightgbm.dll symbols to Microsoft Symbols Server #1725) - Enhance parameter tuning guide with more params and scenarios (suggested ranges) for different tasks/datasets (Documentation Request: More information on finding the optimal ranges for parameter tuning? #2617)
- Better documentation for loss functions (Better documentation for loss functions #4790)
- Add a page for listing related projects and links ([doc] A page for listing related projects and links #4576, original discussion: [doc] Add link to Neptune hyperparam tuning guide #4529)
- Add BoostFromAverage value as an attribute to LightGBM models (Add BoostFromAverage value as an attribute to LightGBM models. #4313)
- Regression tests on Dataset format ([ci] regression tests: binary Dataset format #4406)
- Regression tests on model files ([ci] regression tests: model files #4407)
- Better warning information when splitting of a tree being stopped early ([LightGBM] [Warning] No further splits with positive gain, best gain: -inf #4649).
- Adding checks for nullptr in the code ([c++] Adding checks for nullptr in the code #5085)
- Add support for CRLF line endings or improve documentation and error message (Add support for CRLF line endings or improve documentation and error message #5508)
- Support more build customizations in Conan recipe (Support more build customizations in Conan recipe #5770)
- Support C++20 (Support C++20 #6033)
- Add ability to predict on
Dataset
([R-package] Add the ability to predict onlgb.Dataset
inPredictor$predict()
#2666, Being able to do Prediction (task=prediction) on bin files. #6613, [python-package] How do I use lgb.Dataset() with lgb.Predict() without using pandas df or np array? #6285) - Add support for deploying to Android / iOS (Request for Native Android Apps Support for LightGBM: Android SDK or JNI Integration #6592)
- Refactor
CMakeLists.txt
so that it will be possible to build cpp tests with different options, e.g. with OpenMP support (RefactorCMakeLists.txt
so that it will be possible to build cpp tests with different options, e.g. with OpenMP support #4125) - Ensure consistent behavior when multiple parameter aliases given (ensure consistent, reproducible choice when multiple parameter aliases are given #5304)
- Remove unused-command-line-argument warning with Apple Clang (unused-command-line-argument warning with Apple Clang #1805)
- CI via GitHub actions (CI via GitHub actions #2353)
- Debug flag in CMake configuration (Debug flag in cmake #1588)
- Fix cpp lint problems (fix cpp lint problems #1990)
Python package:
- Refine pandas support ([python] refine pandas support #960)
- Refine categorical feature support ([python] refine categorical feature support #1021)
- Auto early stopping in Sklearn API ([Feature Request] Auto early stopping in Sklearn API #3313)
- Refactor sklearn wrapper after stabilizing upstream API, public API compatibility tests and official documentation (also after maturing
HistGradientBoosting
) (The sklearn wrapper is not really compatible with the sklearn ecosystem #2966, [RFC] compatibility with scikit-learn #2628) - Keep constants in sync with C++ library ([python-package] Keep constants in sync with C++ library #4321)
- Allowing custom objective / metric function with "objective" and "metric" parameters ([Python] Allowing custom objective / metric function with "objective" and "metric" parameters #3244)
- Replace calls of POINTER() by byref() in Python interface to pass data arrays (Replace calls of
POINTER()
bybyref()
in Python interface to pass data arrays #4298) -
staged_predict()
in the scikit-learn API (Staged predict function as in scikit-learn #5031) - Make
Dataset
pickleable ([python-package] makeDataset
pickleable #5098) - Accept
polars
input ([python-package] Adding support for polars for input data #6204) - Add
feature_names_in_
and related APIs toscikit-learn
estimators ([python-package] Supportfeature_names_in_
attribute via sklearn API #6279) - Load back saved parameters with save_model to Booster object (Load back saved parameters with save_model to Booster object #2613)
- Check input for prediction ([python package]: suggestion: lgb.Booster.predict() should check that the input X data makes sense #812, While predicting the model doesn't check if data dtypes have changed #3626)
- Migrate to
parametrize_with_checks
for scikit-learn integration tests ([python] Migrate to parametrize_with_checks for scikit-learn integration tests #2947)
R package:
- Rewrite R demos ([R-package] Rewrite R demos, replace with vignettes #1944)
-
lgb.convert_with_rules()
should validate rules ([R-package] lgb.convert_with_rules() should validate rules #2682) - Reduce duplication in Makevars.in, Makevars.win ([R-package] Reduce duplication in Makevars.in, Makevars.win #3249)
- Add an R GPU job in CI ([R-package] Add an R GPU job in CI #3780)
- Improve portability of OpenMP checks in R-package configure on macOS ([R-package] Improve portability of OpenMP checks in R-package configure on macOS #4537)
- Add CI job testing R package on Windows with UCRT toolchain ([R-package] Support R 4.2 #4881)
- Load back saved parameters with
save_model
to Booster object (Load back saved parameters with save_model to Booster object #2613) - Use macOS 11.x in R 4.x CI jobs ([ci] [R-package] use macOS 11.x in R 4.x CI jobs #4990)
- Add a CI job running
rchk
([R-package] [ci] Add a CI job testing the R package with rchk #4400) - Factor out custom R interface to lib_lightgbm ([R-package] Factor out custom R interface to lib_lightgbm #3016)
- Use
commandArgs
instead of hardcoded stuff in the installation script ([R-package] use commandArgs instead of hardcoded stuff in the installation script #2441) -
lgb.convert
functions should convert columns of type 'logical' ([R-package] lgb.convert() functions should convert columns of type 'logical' #2678) -
lgb.convert
functions should warn on unconverted columns of unsupported types ([R-package] lgb.convert functions should warn on unconverted columns of unsupported types #2681) -
lgb.prepare()
andlgb.prepare2()
should be simplified ([R-package] lgb.prepare() and lgb.prepare2() should be simplified #2683) -
lgb.prepare_rules()
andlgb.prepare_rules2()
should be simplified ([R-package] lgb.prepare() and lgb.prepare2() should be simplified #2684) - Remove
lgb.prepare()
andlgb.prepare_rules()
([R-package] Remove lgb.prepare() and lgb.prepare_rules() #3075) - CRAN-compliant installation configuration ([R-package] Create portable configuration with 'configure' scripts #2960)
- Add tests on R 4.0 ([R-package] [ci] Add tests on R 4.0 compatibility #3024)
- Add pkgdown documentation support ([R-package] Add pkgdown documentation support #1143)
- Cover 100% of R-to-C++ calls in R unit tests ([R-package] [ci] Cover 100% of R-to-C++ calls with unit tests #2944)
- Bump version of pkgdown (Bump version of pkgdown #3036)
- Run R CI in Windows environment (R unit tests are not run in CI #2335)
- Add unit tests for best metric iteration/value ([R-package] Add unit tests for best metric iteration/value #2525)
- Standardize R code on comma-first ([R-package] Standardize R code on comma-first #2373)
- Add additional linters to CI ([R-package] Add additional linters to CI #2477)
- Support roxygen 7.0.0+ ([R-package] package documentation using roxygen2 7.0.0+ #2569)
- Run R CI in Linux and Mac environments (R unit tests are not run in CI #2335)
New features
- More platforms support (Support more platforms (32-bit, arm, etc) #1129, Add support for ppc64le architecture #4736)
- CoreML support (CoreML support for LightGBM models #1074)
- Object importance (object importance (LeafInfluence) #1460)
- Include init_score in predict function (include init_score in predict method #1978)
- Hyper-parameter per feature/column (lightgbm parameters by feature? ex: categoricals/high cardinality #1938)
- Extracting decision path (Extracting decision path #2187)
- Support for extremely large model (OverflowError when training with 100k+ iterations #2265, Error saving very large LightGBM models #3858)
- Allow LightGBM to be easily used in external projects via modern CMake style with
find_package
andtarget_link_libraries
(Allow LightGBM to be easily used in external projects via modern CMake style withfind_package
andtarget_link_libraries
#4067, fatal error: ../../../external_libs/fmt/include/fmt/format.h: No such file or directory #3925) - Recalculate feature importance during the update process of a tree model (Recalculate feature importance during the update process of a tree model #2413)
- Merge Dataset objects on condition that they hold same binmapper (Merge Dataset objects on condition that they hold same binmapper #2579)
- Spike and slab feature sampling priors (feature weighted sampling) (Spike and slab feature sampling priors (feature weighted sampling) #2542)
- Customizable early stopping tolerance (Feature Request: customizable early_stopping_tolerance #2526)
- Stop training branch of tree once a specific feature is used (Stop training branch of tree once a specific feature is used #2518)
- Subsampling rows with replacement (Subsampling rows with replacement #1038)
- Arbitrary base learner ([Feature request] Arbitrary base learner #3180)
- Different quantization techniques (Different quantization techniques #3707)
- SHAP feature contribution for linear trees (SHAP feature contribution for linear trees #4002)
- [SWIG] Add support for int64_t ChunkedArray ([SWIG] Add support for int64_t ChunkedArray #4091)
- Monotonicity in quantile regression (Quantile LightGBM - inconsistent deciles #3447, Multiple Quantile Regression #4201)
- Add approx_contrib option for feature contributions (Add approx_contrib option for feature contributions #4219)
- Support forced splits with data and voting parallel versions of LightGBM (Support forced splits with data and voting parallel versions of LightGBM #4260)
- Support ignoring some features during training on constructed dataset (Support ignoring some features during training on constructed dataset #4317)
- Using random uniform sentinel features to avoid overfitting (Add random uniform sentinels to avoid overfitting #4622)
- Allow specifying probability measure for features (Probability measure for features #4605)
- extra_trees by feature (Allow extra_trees by variable #4700)
- Compute partial dependencies from learned trees (Compute partial dependencies from learned trees #4578)
- Boosting a linear model (Single-leaf trees with one-variable linear models in roots, like gblinear) (Could we add a booster similar to gblinear used by xgboost? #4459)
- Exactly control with
min_child_sample
(min_child_samples plays bad with weights #5236) - WebAssembly support (Compile LightGBM to WebAssembly output #5372)
- Support custom objective in refit (
Booster.refit()
fails when the booster used a custom objective functionfobj
#5609) - Multiple trees in a single boosting round (boosted random forest #6294)
- Allow JSON special characters in feature names (Lift restrictions on feature names ("LightGBMError: Do not support special JSON characters in feature name") #6202)
- Decouple boosting types (Decouple boosting types #3128, use dart and goss at the same time #2991)
- Expose number of bins used by the model while binning continuous features to C API (Expose number of bins used by the model while binning continuous features to C API #3406)
- Add C API function that returns all parameter names with their aliases (Add C API function that returns all parameter names with their aliases #2633)
- Pre-defined bin_upper_bounds (Even when forced_splits is set, the threshold is chosen from the bin_upper_bounds. #1829)
- Setup editorconfig (Setup editorconfig #2401)
- Colsample by node (Feature request: colsample by node #2315)
- Smarter Backoffs for MPI ring connection (Add Smarter Backoffs for MPI ring connection #2348)
- UTF-8 support for model file ([feature requests] support utf-8 characters in feature name #2478)
New algorithms:
- Regularized Greedy Forest (Regularized Greedy Forest is not in the lightgbm #315)
- Accelerated Gradient Boosting (Accelerated Gradient Boosting #1257)
- Multi-Layered Gradient Boosting Decision Trees (Multi-Layered Gradient Boosting Decision Trees #1423)
- Adaptive neural tree (Adaptive Neural Trees (ANT) #1542)
- Probabilistic Forecasting (Probabilistic Forecasting #3200)
- Probabilistic Random Forest ([Feature] Probabilistic Random Forest #1946)
- Sparrow (Sparrow #2001)
- Minimal Variance Sampling (MVS) in Stochastic Gradient Boosting (Minimal Variance Sampling in Stochastic Gradient Boosting #2644)
- Investigate possibility of borrowing some features/ideas from Explainable Boosted Machines (Investigate possibility of borrowing some features/ideas from Explainable Boosted Machines #3905)
- Feature Cycling as an option instead of Random Feature Sampling (Feature Cycling as an option instead of Random Feature Sampling #4066)
- Periodic Features (Periodic Features #4281)
- GPBoost ([Discussion] accuracy improvements #2790)
- Piece-wise linear tree (Piecewise linear trees #1315)
- Extremely randomized trees (Extremely randomized trees #2583)
Objective and metric functions:
- Multi-output regression (Support multi-output regression/classification #524)
- Earth Mover Distance (LightGBM Earth Mover's Distance #1256)
- Cox Proportional Hazard Regression (Cox Proportional Hazard Regression #1837)
- Native support for Focal Loss (Native support for Focal Loss #3706)
- Ranking metric for regression objective (objective = regression and metric = ndcg #1911)
- Density estimation (Feature request: density estimation #2056)
- Adding correlation metrics (Adding correlation metrics #4209)
- Add parameter to control maximum group size for Lambdarank (feature request: Add parameter to control maximum group size for Lambdarank #5053)
- Precision recall AUC (Add Precision Recall AUC as an metric for binary classification #3026)
- AUC Mu (Add AUC mu metric for multiclass training #2344)
Python package:
- Support access to constructed Dataset ([python-package] provide access to constructed Dataset in numpy format #5191)
- Support complex data types in categorical columns of pandas DataFrame (Support complex data types in categorical columns of pandas DataFrame #2134)
- First-class support for different data types (do not convert everything to float32/64 to save memory) (Support different data types (when load data from Python) #3459, Support h2o datatable and numpy types, including for categorical types #3386)
- Efficient native support of pandas.DataFrame with mixed dense and sparse columns (Efficient native support of pandas.DataFrame with mixed dense and sparse columns #4153)
- Include init_score on the Python Booster class (Include init_score on the Python Booster class #4065)
- Better support for Tree Plot with multi class (Feature Request & Question: Better Support for Tree Plot with multi class booster. #3061)
- Support specifying number of iterations in dataset evaluation (Support specifying number of iterations in dataset evaluation #4210)
- Compute metrics not on each iteration but with some fixed step (Compute metrics not on each iteration but with some fixed step #4107)
- Support saving and loading CVBooster ([python] support saving and loading CVBooster #3556)
- Add a function to plot tree with a case (Add a function to plot tree with a case #4784)
- Allow custom loggers that don't inherit from
logging.Logger
(Relax constraint on logger class inregister_logger
#4783) - Ensure all callbacks are pickleable ([python-package] ensure that all callbacks are pickleable #5080)
- Add support for pandas nullable types to the sklearn api (Add support for pandas nullable types to the sklearn api #4173)
- Support weight in refit (refit in Python does not support weights #3038)
- Keep cv predicted values (Add option to keep cv predicted values #283)
- Feature importance in CV (Lightgbm cv feature importance python #1445)
- Log redirect in python (Capture LGBM model's .fit messages and redirect them to python logger #1493)
- Make _CVBooster public for better stacking experience (Why is _CVBooster object hidden class? #2105)
Dask:
- Investigate how the gap between local and Dask predictions can be decreased (Investigate how the gap between local and Dask predictions can be decreased #3835)
- Allow customization of
num_threads
([dask] allow customization of num_threads #3714) - Add support for early stopping ([dask] Add support for early stopping in Dask interface #3712)
- Support
init_model
([dask] Support init_model #4063) - Make Dask training resilient to worker restarts during network setup ([dask] make Dask training resilient to worker restarts during network setup #3775)
- GPU support ([dask] GPU support #3776)
- Support MPI in Dask (Support MPI in Dask #3831)
- Support more operating systems ([dask] Support more operating systems #3782)
- Add
LGBMModel
([dask] Add LGBMModel #3845) - Add
train()
function ([dask] add train() function #3846) - Add
cv()
function ([dask] add cv() function #3847) - Support asynchronous workflows ([dask] Support asynchronous workflows #3929)
- Add
DaskDataset
([dask] add a DaskDataset #3944) - Enable larger
pred_contrib
results for multiclass classification with sparse matrices ([dask] preserve chunks in results of multi-class pred_contrib predictions on sparse matrices #4438) - Use or return all workers eval_set evaluation data (Use or return all workers eval_set evaluation data #4392)
- Drop 'not evaluated' placeholder from dask.py (Drop 'not evaluated' placeholder from dask.py #4393)
- Support custom objective functions ([dask] Support custom objective functions #3934)
- Resolve differences in result shape between
DaskLGBMClassifier.predict()
andLGBMClassifier.predict()
([dask] Result shape from DaskLGBMClassifier.predict(pred_contrib=True) for CSC matrices is inconsistent with LGBMClassifier #3881) - Support custom evaluation metrics ([dask] Support custom metric functions #3956)
- Support all LightGBM parallel tree learners (Support all LightGBM parallel tree learners in Dask #3834)
- Support
raw_score
inpredict()
([dask] support 'raw_score' in predict() #3793) - Support all LightGBM boosting types ([dask] Support all LightGBM boosting types #3896)
- Tutorial documentation ([dask] tutorial documentation #3814)
- Document how to save a Dask model ([docs] [dask] Document how to save a Dask model #3838)
- Support
init_score
([dask] support init_score #3807) - Search for ports only once per IP ([dask] Search for open ports only once per IP #3768)
- Support
pred_leaf
inpredict()
([dask] support 'pred_leaf' in predict() #3792) - Decide and document how users should provide a Dask client at training time ([RFC] [dask] decide and document how users should provide a Dask client at training time #3808)
- Use dictionaries instead of tuples for parts ([dask] use dictionaries instead of tuples for parts in Dask training #3795)
- Remove testing dependency on dask-ml ([dask] remove testing dependency on dask-ml #3796)
- Support 'pred_contrib' in
predict()
([dask] support 'pred_contrib' in predict() #3713) - Add support for LGBMRanker ([python-package] [dask] Add DaskLGBMRanker #3708)
Support DataTable in Dask (Support DataTable in Dask #3830)
R package:
- Add support for specifying training indices in
lgb.cv()
([R-package] add support for specifying training indices in lgb.cv() #3924) - Export callback functions ([R-package] Export callback functions #2479)
- Plotting in R-package (Plot trees in R #1222)
- Add support for saving weight values of a node in the R-package (Add support for saving weight values of a node in the R-package #2281)
- Check parameters in
cb.reset.parameters()
([R-package] Check parameters incb.reset.parameters()
#2665) - Refit method for R-package (Refit method for R-package #2369)
- Add the ability to predict on
lgb.Dataset
inPredictor$predict()
([R-package] Add the ability to predict onlgb.Dataset
inPredictor$predict()
#2666) - Allow use of MPI from the R package ([R-package] allow use of MPI for distributed training #3364)
- Allow data to live in memory mapped file ([R-package] Allow data to live in memory mapped file #2184)
- Add GPU support for CRAN package ([R-package] Add GPU support for CRAN package #3206)
- Add CUDA support for CRAN package ([R-package] Add CUDA support for CRAN package #3465)
- Add CUDA support for CMake-based package ([R-package] R build for CUDA #5378)
- Add function to generate a list of parameters ([R-package] Add function to generate a list of parameters #4195)
- Accept data frames as inputs ([R-package] Accept data frames as inputs #4323)
- Upgrade documentation site to
pkgdown >2.0
([docs] [R-package] upgrade R documentation to {pkgdown} 2.0 #4859) - Check size of custom objective function output ([R-package] Check size of custom objective function outputs #4905)
- Support CSR-format sparse matrices ([R-package] Cannot pass CSR single-row for prediction #4966)
- Add flag of displaying train loss for
lgb.cv()
([R-package] add flag of displaying train loss for lgb.cv() #4911) - Work directly with
readRDS()
andsaveRDS()
([R-package] Request: work with R serialization functions #4296) - Support trees with linear models at leaves ([R-package] Support trees with linear models at leaves #3319)
- Add support for non-ASCII feature names ([R-package] Add support for non-ASCII feature names #2983)
- Release to CRAN ([R-Package] CRAN issues #629)
- Exclude training data from being checked for early stopping ([R-package] exclude training data from being checked for early stopping #2472)
- first_metric_only parameter for R-package (first_metric_only parameter for R-package #2368)
- Build a 32-bit version of LightGBM for the R package ([R-package] 32-bit library support #3187)
- Ability to control the printed messages ([R-package] Ability to control the printed messages? #1440)
New language wrappers:
- MATLAB support (MATLAB Support / Wrapper #743)
- Java support (like xgboost4j) (Do you have the plan to support Java like xgboost4j? #909)
- Go support (predict part can be already found in https://github.com/dmitryikh/leaves package) (Are there plans for a go aka golang wrappers or package? #2515)
- Ruby support (Ruby Library #2367)
Input enhancements:
- Streaming data allocation (improve the sparse streaming support and expose
ChunkedArray
in C API) ([feature] Streaming data allocation #3995, [SWIG] Add streaming data support + cpp tests #3997 (comment)) - String as categorical input directly ([CLI] Categorical: Read string and convert to int on the fly #789)
- AWS S3 support (Read from AWS S3 #1039)
- H2O datatable direct support (not via
to_numpy()
method as it currently is) (implement datatable ingest directly into lightgbm #2003) - Multiple file as input (Multi file as input for LightGBM #2031)
- Parquet file support ([hdfs] support parquet file #1286)
- Enable use of constructed Dataset in predict() methods (Enable use of constructed Dataset in predict() methods #4546, [R] Allow predictions on lgb.Dataset objects #1939, [python-package] How do I use lgb.Dataset() with lgb.Predict() without using pandas df or np array? #6285)
- support scipy sparse arrays ([python-package] [c++] support scipy sparse arrays #6352)
- Apache Arrow support (Create Dataset from Arrow format #3369)
- Validation dataset creation via Sequence (Validation dataset creation via Sequence #4184)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment