-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update lgbm to v2.3.1 #5851
Update lgbm to v2.3.1 #5851
Conversation
No luck.... It seems that there're some API changes in native lgbm v3.* so we need to update Microsoft.ML.LightGbm correspondingly. Feel free to set this as low priority or close it if there's no recent plan for that. |
Yeah. I dont think they would be hard to do, they are being tracked and are summarized here #5447. There are only like 4 methods we need to update. Biggest issue is that The docs say:
I'm honestly not sure of the difference. But I dont think the other changes will be too bad honestly (from a 10 minute quick glance) |
Well, looked a bit more, it will be harder then I originally thought. You are right it will probably be a decent re-write of our Light GBM code. Some methods have been removed, some have been changed, so we would have to figure it all out again. I'm going to close this PR for now as we have the issue that is tracking it. @briacht for visibility and priority planning. |
Synced with @LittleLittleCloud offline and we decided to try updating to the latest of version 2, 2.3.1. Reopening to try it. |
Changed to latest of version 2
@LittleLittleCloud looks like only 1 test is failing with this version. TestLightGbmRanking. This is the failure
I don't have time to look into it now. If you do feel free to do so. If not let me know and I can close this PR for now. |
Looks like @artidoro is the one who creates Hi @artidoro, could you share more info on how that file generated? We are working on an upgrade of LGBM (from v2.2.3 to v2.3.1) but one of the lgbm-related test fails. We'd like to know how that baseline file created and is it safe to simply update that baseline with latest content from lgbm v2.3.1 And maybe @Ivanidzo4ka also knows some circumstances? |
The main thing I am worried about is how far apart the values are. 0.67846173 and 24 aren't really close. It depends on what the numbers are meaning I guess. |
You can run test locally, or get artifacts from test run and just update baseline file with file generated by test. In ML world no one can guarantee you what algorithm from version 1 and version 2 would yield same results. File/Test which gives you trouble is feature contribution and it's checks out of 4 features which one is important and which one is not. From your messages it looks like difference happens in feature contributions section, and I would say it something expecting. New algorithm picks features differently, so their importance is differ from previous algo. |
I did not look at this in great depth, but the outputs should not change dramatically. You should check what those numbers mean. In particular, if the statistics of the model outputs (accuracy, etc.) remain the same but other numbers such as features, tree structure or something else change it might be due to the newer version of the package. |
@artidoro @Ivanidzo4ka @michaelgsharp After closely looking at the FeatureContributions between the old (v2.2.3) and new lgbm(v2.3.1), I find out the difference between these two vectors is not huge. And it's quite reasonable to have this difference when updating lgbm binary. The final output of FeatureContributions of v2.2.3 and v2.3.1 are
From that table, we can find out that in both v2.2.3 and v2.3.1, of all the 6 features used to train the ranking model, the last feature's weigh is always the largest, which can be explained by the actual object function used in all feature contribution tests.
Therefore, I think it's safe to just overwrite the baseline with the result from the newer lgbm trainer, as the error between the old and new lgbm trainer falls into an acceptable range. Let me know if you have any question or other opinion though |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report
@@ Coverage Diff @@
## main #5851 +/- ##
==========================================
- Coverage 68.35% 68.26% -0.09%
==========================================
Files 1134 1134
Lines 241910 242028 +118
Branches 25289 25306 +17
==========================================
- Hits 165347 165230 -117
- Misses 69919 70156 +237
+ Partials 6644 6642 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
|
* remotes/official/main: Update lgbm to v2.3.1 (dotnet#5851) Speed-up bitmap operations on images. Fixes dotnet#5856 (dotnet#5857) Onnx recursion limit (dotnet#5840) Speed up the inference of the saved_model(s). Fixes dotnet#5847 (dotnet#5848) Signed-off-by: darth-vader-lg <luigi.generale@gmail.com>
Hopefully this is the only change need to be made
So we can do the best job, please check:
Fixes #nnnn
in your description to cause GitHub to automatically close the issue(s) when your PR is merged.