[RFC] provide Python/R implementations of all the built-in objectives? #6440
Open
Description
Summary
Should we provide example Python (and maybe R) implementations of LightGBM's objective functions which exactly match the behavior of the builtin objectives from the C++ side?
Motivation
Over the years of maintaining LightGBM, I've seen significant interest in implementing LightGBM's built-in objective functions in Python, for purposes like:
- learning how LightGBM works (for people who are not comfortable with C++)
- making it easier to measure the difference between custom objectives and LightGBM builtin ones
- (e.g. if you have a Python function that exactly matches the builtin, then you can modify it and know any performance differences are due to your modifications)
See "References" for evidence.
Description
I am NOT proposing adding such implementations to any library that we publish.
Instead, I'm thinking of something like the following:
- new directory in
examples/
containing these implementations - tests that run in CI which compare the results to those calculated by the C++ side
- those implementations accounting for the main concerns that confuse people:
- calculating an
init_score
ifDataset
doesn't have one - correctly using sample weights
- correctly respecting
boost_from_average
- calculating an
Things that do not necessarily need to be in scope for the first versions of implementations:
- distributed training / collective operations
- respect for
deterministic
parameter - anything related to quantized training
- exact numerical precision (being within, say,
1e-6
, would probably good enough to start)
References
GitHub posts that could be summarized as "how do I replicate a built-in LightGBM objective in Python"?
- [Question] Large difference between builtin softmax and custom softmax objective #6219
- Custom loss reconstruction for Tweedie/Regression_l1 loss #6160
- Custom loss with dependent samples #6145
- I cannot reproduce results of quantile regression when using a custom metric or objective. #6062
- [python-package] How do I reproduce LightGBM's huber loss with a custom objective? #6041
- [python-package] How do I reproduce LightGBM's L2 loss with a custom objective? #6040
- Multiclass classification doesn't improve with custom objective #5839
- Custom LambdaRank NDCG not matching built-in code #5735
- Cannot replicate regression via custom loss for
colsample_bytree != 1
#5543 - [python-package] Where is the code for the 'quantile' objective, and how do I pass a custom objective to LGBMRegressor? #5524
- Custom Log Loss does not have the same loss curves #5373
- Custom objective does not have the same loss curves. #5350
- Custom Loss Function for LGBMRegressor #5256
- [python-package] custom objective function returns strange leaf node values #5114
- Documentation for custom objective functions (hessian) #5043
- [python-package] Custom multiclass loss function doesn't work #4981
- multi_logloss differs between native and custom objective function (identical to native objective function) #4211
- Different result when using self-defined objective #4077
- Custom huber loss in LightGBM #3532
- Reproducing log loss with custom objective #3312
- Unable to replicate regression objective with LGBM_BoosterUpdateOneIterCustom #3052
- Weighted Custom Loss Function Different Training Loss #2834
- performance of lambdarank using custom objective is poor when compared with in-built lambdarank #2239
- Support for multiple custom eval metrics #2182
And Stack Overflow: