Update lgbm to v2.3.1 #5851

LittleLittleCloud · 2021-06-18T06:57:02Z

Hopefully this is the only change need to be made

So we can do the best job, please check:

There's a descriptive title that will make sense to other developers some time from now.
There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
You have included any necessary tests in the same PR.

LittleLittleCloud · 2021-06-23T20:39:02Z

No luck.... It seems that there're some API changes in native lgbm v3.* so we need to update Microsoft.ML.LightGbm correspondingly. Feel free to set this as low priority or close it if there's no recent plan for that.

michaelgsharp · 2021-06-23T22:33:45Z

Yeah. I dont think they would be hard to do, they are being tracked and are summarized here #5447. There are only like 4 methods we need to update. Biggest issue is that LGBM_BoosterSaveModelToString now needs int feature_importance_type.

The docs say:

Type of feature importance, can be C_API_FEATURE_IMPORTANCE_SPLIT or C_API_FEATURE_IMPORTANCE_GAIN

I'm honestly not sure of the difference. But I dont think the other changes will be too bad honestly (from a 10 minute quick glance)

michaelgsharp · 2021-06-23T22:39:29Z

Well, looked a bit more, it will be harder then I originally thought. You are right it will probably be a decent re-write of our Light GBM code. Some methods have been removed, some have been changed, so we would have to figure it all out again. I'm going to close this PR for now as we have the issue that is tracking it.

@briacht for visibility and priority planning.

michaelgsharp · 2021-06-23T22:51:32Z

Synced with @LittleLittleCloud offline and we decided to try updating to the latest of version 2, 2.3.1. Reopening to try it.

Changed to latest of version 2

michaelgsharp · 2021-06-24T00:05:46Z

@LittleLittleCloud looks like only 1 test is failing with this version. TestLightGbmRanking. This is the failure

        *** Failure #1: Values to compare are 0.67846173 and 24
       	  AllowedVariance: 1E-06
        	 delta: -23.321538
        	 delta2: -23.321538

I don't have time to look into it now. If you do feel free to do so. If not let me know and I can close this PR for now.

LittleLittleCloud · 2021-06-24T18:23:51Z

Looks like @artidoro is the one who creates LightGbmRanking.tsv under test\BaselineOutput\Common\FeatureContribution\LightGbmRanking.tsv in this PR. (A long long time ago)

Hi @artidoro, could you share more info on how that file generated? We are working on an upgrade of LGBM (from v2.2.3 to v2.3.1) but one of the lgbm-related test fails. We'd like to know how that baseline file created and is it safe to simply update that baseline with latest content from lgbm v2.3.1

And maybe @Ivanidzo4ka also knows some circumstances?

michaelgsharp · 2021-06-24T18:37:08Z

The main thing I am worried about is how far apart the values are. 0.67846173 and 24 aren't really close. It depends on what the numbers are meaning I guess.

Ivanidzo4ka · 2021-06-24T19:00:32Z

You can run test locally, or get artifacts from test run and just update baseline file with file generated by test.

In ML world no one can guarantee you what algorithm from version 1 and version 2 would yield same results.
But there is expectation what they would produce same metrics.

File/Test which gives you trouble is feature contribution and it's checks out of 4 features which one is important and which one is not. From your messages it looks like difference happens in feature contributions section, and I would say it something expecting. New algorithm picks features differently, so their importance is differ from previous algo.

artidoro · 2021-06-24T19:04:38Z

I did not look at this in great depth, but the outputs should not change dramatically. You should check what those numbers mean. In particular, if the statistics of the model outputs (accuracy, etc.) remain the same but other numbers such as features, tree structure or something else change it might be due to the newer version of the package.

LittleLittleCloud · 2021-06-25T02:45:52Z

@artidoro @Ivanidzo4ka @michaelgsharp

After closely looking at the FeatureContributions between the old (v2.2.3) and new lgbm(v2.3.1), I find out the difference between these two vectors is not huge. And it's quite reasonable to have this difference when updating lgbm binary.

The final output of FeatureContributions of v2.2.3 and v2.3.1 are

FeatureContributions	lgbm v2.2.3	lgbm v2.3.1
row 1	0, 0, 0, 0, -0.08, 0.85	0, 0, 0, 0, -0.57, 2.16
row 2	0, 0, 0, 0.43, 0, -8.97	1.25, 0, 0, 0, 0, -11.36
row 3	0, -0.44, 0, 0, 0, 4.2	0, 0, 0, 0, -0.74, 7.25
row 4	0, 0, 0, 0.43, 0, -9.1	0, 0, 0, 0.11, 0, -12

From that table, we can find out that in both v2.2.3 and v2.3.1, of all the 6 features used to train the ranking model, the last feature's weigh is always the largest, which can be explained by the actual object function used in all feature contribution tests.

y = 10x1 + 10x2vBuff + 20x3 + e.
where x1, x3 are of number type, and x2vBuff is an array which length is four.

Therefore, I think it's safe to just overwrite the baseline with the result from the newer lgbm trainer, as the error between the old and new lgbm trainer falls into an acceptable range. Let me know if you have any question or other opinion though

michaelgsharp

codecov · 2021-06-25T19:24:56Z

Codecov Report

Merging #5851 (9f3d83b) into main (ff01708) will decrease coverage by 0.08%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #5851      +/-   ##
==========================================
- Coverage   68.35%   68.26%   -0.09%     
==========================================
  Files        1134     1134              
  Lines      241910   242028     +118     
  Branches    25289    25306      +17     
==========================================
- Hits       165347   165230     -117     
- Misses      69919    70156     +237     
+ Partials     6644     6642       -2

Flag	Coverage Δ
Debug	`68.26% <ø> (-0.09%)`	⬇️
production	`62.94% <ø> (+0.01%)`	⬆️
test	`88.82% <ø> (-0.48%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...est/Microsoft.ML.Tests/FeatureContributionTests.cs	`98.68% <ø> (ø)`
...ft.ML.Core.Tests/UnitTests/TestResourceDownload.cs	`0.00% <0.00%> (-75.52%)`	⬇️
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs	`79.48% <0.00%> (-20.52%)`	⬇️
...osoft.ML.Recommender/SafeTrainingAndModelBuffer.cs	`61.97% <0.00%> (-16.91%)`	⬇️
...osoft.ML.Recommender/MatrixFactorizationTrainer.cs	`58.10% <0.00%> (-13.97%)`	⬇️
...ests/TrainerEstimators/MatrixFactorizationTests.cs	`83.59% <0.00%> (-13.48%)`	⬇️
test/Microsoft.ML.FSharp.Tests/SmokeTests.fs	`77.77% <0.00%> (-10.46%)`	⬇️
src/Microsoft.ML.Data/Commands/TrainTestCommand.cs	`82.64% <0.00%> (-9.10%)`	⬇️
....ML.Data/Scorers/SchemaBindablePredictorWrapper.cs	`67.82% <0.00%> (-7.18%)`	⬇️
src/Microsoft.ML.Core/Prediction/TrainerInfo.cs	`93.75% <0.00%> (-6.25%)`	⬇️
... and 32 more

* remotes/official/main: Update lgbm to v2.3.1 (dotnet#5851) Speed-up bitmap operations on images. Fixes dotnet#5856 (dotnet#5857) Onnx recursion limit (dotnet#5840) Speed up the inference of the saved_model(s). Fixes dotnet#5847 (dotnet#5848) Signed-off-by: darth-vader-lg <luigi.generale@gmail.com>

Update Versions.props

3ff9963

LittleLittleCloud mentioned this pull request Jun 18, 2021

[ Suggest ] Update LightGbm to v2.3.1 #5850

Closed

Update Versions.props

5e5a8cd

LittleLittleCloud changed the title ~~Update lgbm to v3.2.1~~ Update lgbm to v3.1.1 Jun 19, 2021

ericstj requested a review from michaelgsharp June 21, 2021 20:54

michaelgsharp closed this Jun 23, 2021

michaelgsharp reopened this Jun 23, 2021

Update Versions.props

1e550a4

Changed to latest of version 2

LittleLittleCloud changed the title ~~Update lgbm to v3.1.1~~ Update lgbm to v2.3.1 Jun 24, 2021

update baseline output for light gbm ranking

9f3d83b

michaelgsharp approved these changes Jun 25, 2021

View reviewed changes

LittleLittleCloud merged commit 1b3cb77 into main Jun 25, 2021

michaelgsharp deleted the LittleLittleCloud-patch-1 branch July 13, 2021 22:56

torronen mentioned this pull request Jan 30, 2022

Q: Roadmap for LightGBM interface in .NET #6065

Closed

ghost locked as resolved and limited conversation to collaborators Mar 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update lgbm to v2.3.1 #5851

Update lgbm to v2.3.1 #5851

LittleLittleCloud commented Jun 18, 2021 •

edited

Loading

LittleLittleCloud commented Jun 23, 2021

michaelgsharp commented Jun 23, 2021

michaelgsharp commented Jun 23, 2021

michaelgsharp commented Jun 23, 2021

michaelgsharp commented Jun 24, 2021

LittleLittleCloud commented Jun 24, 2021 •

edited

Loading

michaelgsharp commented Jun 24, 2021

Ivanidzo4ka commented Jun 24, 2021

artidoro commented Jun 24, 2021

LittleLittleCloud commented Jun 25, 2021 •

edited

Loading

michaelgsharp left a comment

codecov bot commented Jun 25, 2021 •

edited

Loading

Update lgbm to v2.3.1 #5851

Update lgbm to v2.3.1 #5851

Conversation

LittleLittleCloud commented Jun 18, 2021 • edited Loading

Hopefully this is the only change need to be made

LittleLittleCloud commented Jun 23, 2021

michaelgsharp commented Jun 23, 2021

michaelgsharp commented Jun 23, 2021

michaelgsharp commented Jun 23, 2021

michaelgsharp commented Jun 24, 2021

LittleLittleCloud commented Jun 24, 2021 • edited Loading

michaelgsharp commented Jun 24, 2021

Ivanidzo4ka commented Jun 24, 2021

artidoro commented Jun 24, 2021

LittleLittleCloud commented Jun 25, 2021 • edited Loading

michaelgsharp left a comment

Choose a reason for hiding this comment

codecov bot commented Jun 25, 2021 • edited Loading

Codecov Report

LittleLittleCloud commented Jun 18, 2021 •

edited

Loading

LittleLittleCloud commented Jun 24, 2021 •

edited

Loading

LittleLittleCloud commented Jun 25, 2021 •

edited

Loading

codecov bot commented Jun 25, 2021 •

edited

Loading