Description
Summary
I'd like to start a discussion on this question:
Should we do a v3.3.0 release of LightGBM some time in June?
Motivation
My primary motivation for this proposal is that user reports in the last two months have revealed critical issues with the R package (#4007, #4045, #4216, #4259, #4305), and these are fixed by recent PRs related to #3016 (especially #4155 and #4247).
But looking at the PRs that have been merged since LightGBM 3.2.1, there are a lot of other useful fixes, including:
- fix for issue in data parallel learning when feature distributions on partitions don't overlap ([fix] Fix bug in data distributed learning with local empty leaf #4185)
- more accurate handoffs between C++ side and wrapper packages ([R-package] Handle integer types more accurate in R-to-C interface #4291, [python] Handle integer types more accurate in Python-to-C interface #4292)
- changes to MRO to match scikit-learn's preferences ([python][scikit-learn] change MRO #3192)
- fix warnings from CUDA builds ([CUDA] Add CUDA_ARCHITECTURES to fix CMake warnings (#3754) #4268)
There has only been one technically "breaking" PR merged, and I think it's ok to include in a minor release (#4197).
Proposal
I'd like to get other maintainers' thoughts on the following proposal.
- Do not merge Target and Count encodings for categorical features #3234 yet
- Prepare a 3.3.0 release, to be released some time in June
- Complete any fixes / feature requests that maintainers suggest before that release
Personally, I'd like to include at least the following in a 3.3.0 release:
-
C++
- ArrayArgs::Partition causes UB when called with interval of size 1 #4272
- incorrect calculation of gamma metric in weighted training (Regression Gamma Loss Gradient calculation is incorrect with weight #4174 / fix calculation of weighted gamma loss (fixes #4174) #4283)
-
Dask
- eval sets ([dask] add support for eval sets and custom eval functions #4101)
- (not critical) inconsistencies in leaf counts when performing multiclass classification ([dask] multiclass classification gives different samples for same split #4220)
-
Python package
- fix unexplained jupyter kernel restarts (Running fit method on LGBMRegressor kills Jupyter Kernel #4301)
-
R package
- prevent segfaults when objects are serialized / deserialized ([R-package] R handles produce segmentation faults when de-serialized #4208)
- a deprecation warning for
...
in the R package ([RFC] [R-package] Remove support for passing parameters through '...' #4226 (comment))- [R-package] introduce Dataset methods set_field() and get_field() #4571
- [R-package] preserve uses of '...' in Dataset slice() method #4581
- [R-package] add deprecation warnings about some uses of '...' #4522
- [R-package] remove unused '...' in Booster constructor #4523
- [R-package] add deprecation warnings on uses of '...' in predict() and reset_parameter() #4548
- [R-package] allow construction of Dataset from CSV without header (fixes #4553) #4554
- [R-package] deprecate uses of '...' in Dataset slice() method #4572
- [R-package] deprecate the use of 'info' in Dataset #4573
- fix memory leaks ([R-package] R memory leaks #4282)
- prevent session crashes on Windows ([R-package] R package crashes on windows when loaded together with {fansi} or anything that depends on it #4464)
- fix
predict()
being broken when using a file ([R-package] predict() breaks when using a Dataset stored in a file #4034) - updating how autoconf finds libomp on Mac ([R Package] autoconf.ac OpenMP test ignores LDFLAGS (fix provided). #4131)
I also think this release could be a good opportunity for maintainers to think carefully about what breaking changes might be made (in addition to #3234) in a 4.0.0 release, and to add deprecation warnings for them in the 3.3.0 release.
Thanks for your time and consideration!