Add deterministic option for LightGBM by michaelgsharp · Pull Request #7415 · dotnet/machinelearning

michaelgsharp · 2025-03-12T00:11:07Z

Adds the LightGBM deterministic option to the LightGBM Options.

Copilot

Pull Request Overview

This PR adds a deterministic option along with related options for the LightGBM trainer to ensure reproducible training outcomes. Key changes include:

Adding new options (Deterministic, ForceRowWise, and ForceColumnWise) in the LightGBM trainer options.
Updating the options mapping dictionary in the trainer base.
Updating tests to set the new options for LightGBM estimators.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
src/Microsoft.ML.LightGbm/LightGbmTrainerBase.cs	Added dictionary entries and properties for deterministic options.
test/Microsoft.ML.Tests/TrainerEstimators/TreeEstimators.cs	Updated tests to initialize Deterministic and ForceRowWise options.

Comments suppressed due to low confidence (2)

src/Microsoft.ML.LightGbm/LightGbmTrainerBase.cs:243

[nitpick] Consider clarifying the help text for 'Deterministic' to state that it ensures reproducible training outcomes, rather than mentioning 'stable results'.

/// Setting this to true should ensure the stable results when using the same data and the same parameters and different num_threads.

test/Microsoft.ML.Tests/TrainerEstimators/TreeEstimators.cs:69

Consider adding a test case that explicitly sets and verifies the behavior of the 'ForceColumnWise' option, as it is a new addition not covered in the existing tests.

Deterministic = true,

tarekgh · 2025-03-12T00:40:30Z

src/Microsoft.ML.LightGbm/LightGbmTrainerBase.cs

+            /// Setting this to true should ensure the stable results when using the same data and the same parameters and different num_threads.
+            /// </summary>
+            [Argument(ArgumentType.AtMostOnce, HelpText = "Whether to use deterministic algorithm.")]
+            public bool Deterministic = false;


public bool Deterministic = false;

I know this class is exposing fields directly, but I am wondering can the new added stuff be properties?

I believe we use reflection to instantiate these (like when you run this from ML.NET command line), and if its expecting fields it wouldn't work. I will double check on that before I merge this in.

If it is doing that, its possible that we could change things to either do both or only do properites and update all the options as well.

ericstj · 2025-03-12T01:20:42Z

Microsoft.ML.RunTests.TestEntryPoints.EntryPointCatalog is failing on all legs. https://helixr1107v0xdeko0k025g8.blob.core.windows.net/dotnet-machinelearning-refs-pull-7415-merge-504196c4ce1c401a92/Microsoft.ML.Core.Tests/1/console.55b4bee6.log?helixlogtype=result

�[m�[30;1m      Output:
�[m�[37m        Comparing /private/tmp/helix/working/A4860978/w/ADD50A36/e/TestOutput/../Common/EntryPoints/core_ep-list.tsv and /tmp/helix/working/A4860978/p/test/BaselineOutput/Common/EntryPoints/core_ep-list.tsv
�[m�[37m        Output matches baseline: '../Common/EntryPoints/core_ep-list.tsv'
�[m�[37m        Comparing /private/tmp/helix/working/A4860978/w/ADD50A36/e/TestOutput/../Common/EntryPoints/core_manifest.json and /tmp/helix/working/A4860978/p/test/BaselineOutput/Common/EntryPoints/netcoreapp/core_manifest.json
�[m�[37m        *** Failure #1: Output and baseline mismatch at line 11895, expected '          "Name": "ParallelTrainer",' but got '          "Name": "Deterministic",' : '../Common/EntryPoints/core_manifest.json'

Looks like you need to update core_ep-list.tsv @michaelgsharp

michaelgsharp · 2025-03-12T03:42:18Z

@ericstj good catch. Its been updated.

codecov · 2025-03-12T05:10:54Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.96%. Comparing base (c36975c) to head (cd49ab2).
Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7415      +/-   ##
==========================================
- Coverage   68.97%   68.96%   -0.01%     
==========================================
  Files        1481     1481              
  Lines      273696   273708      +12     
  Branches    28285    28285              
==========================================
- Hits       188782   188769      -13     
- Misses      77526    77546      +20     
- Partials     7388     7393       +5

Flag	Coverage Δ
Debug	`68.96% <100.00%> (-0.01%)`	⬇️
production	`63.26% <100.00%> (-0.01%)`	⬇️
test	`89.46% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/Microsoft.ML.LightGbm/LightGbmTrainerBase.cs	`80.30% <100.00%> (-0.03%)`	⬇️
...osoft.ML.Tests/TrainerEstimators/TreeEstimators.cs	`97.85% <100.00%> (+<0.01%)`	⬆️

... and 10 files with indirect coverage changes

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

michaelgsharp · 2025-03-14T04:35:29Z

/ba-g failed tests are known failures and build analysis still isn't configured correctly to go green.

add deterministic option

7186fa2

michaelgsharp requested review from LittleLittleCloud and tarekgh March 12, 2025 00:11

michaelgsharp self-assigned this Mar 12, 2025

Copilot AI review requested due to automatic review settings March 12, 2025 00:11

Copilot AI reviewed Mar 12, 2025

View reviewed changes

LittleLittleCloud approved these changes Mar 12, 2025

View reviewed changes

tarekgh reviewed Mar 12, 2025

View reviewed changes

tarekgh approved these changes Mar 12, 2025

View reviewed changes

updated core manifest

4778f16

build-analysis bot mentioned this pull request Mar 12, 2025

SdcaLogisticRegression failing with LogLoss value above 0.5 on Apple M1 #7343

Open

updated netfx core manifest

cd49ab2

build-analysis bot mentioned this pull request Mar 12, 2025

AutoMLExperiment_return_current_best_trial_when_ct_is_canceled_with_trial_completed_Async fails in CI #7418

Open

michaelgsharp merged commit adad40c into dotnet:main Mar 14, 2025
23 of 25 checks passed

michaelgsharp deleted the light-gbm-deterministic branch March 14, 2025 04:37

github-actions bot locked and limited conversation to collaborators Apr 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add deterministic option for LightGBM#7415

Add deterministic option for LightGBM#7415
michaelgsharp merged 3 commits intodotnet:mainfrom
michaelgsharp:light-gbm-deterministic

michaelgsharp commented Mar 12, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

tarekgh Mar 12, 2025

Uh oh!

michaelgsharp Mar 12, 2025

Uh oh!

ericstj commented Mar 12, 2025 •

edited

Loading

Uh oh!

michaelgsharp commented Mar 12, 2025

Uh oh!

codecov bot commented Mar 12, 2025 •

edited

Loading

Uh oh!

michaelgsharp commented Mar 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

michaelgsharp commented Mar 12, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

tarekgh Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

michaelgsharp Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

ericstj commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelgsharp commented Mar 12, 2025

Uh oh!

codecov bot commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

michaelgsharp commented Mar 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ericstj commented Mar 12, 2025 •

edited

Loading

codecov bot commented Mar 12, 2025 •

edited

Loading