Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default EvaluationMetric for LightGbm trainers to conform to d… #3859

Merged
merged 1 commit into from
Jul 1, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/Microsoft.ML.LightGbm/LightGbmBinaryTrainer.cs
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ public enum EvaluateMetricType
[Argument(ArgumentType.AtMostOnce,
HelpText = "Evaluation metrics.",
ShortName = "em")]
public EvaluateMetricType EvaluationMetric = EvaluateMetricType.Logloss;
public EvaluateMetricType EvaluationMetric = EvaluateMetricType.Default;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default [](start = 76, length = 7)

Isn't this a breaking change?
cc @eerhardt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the other option is to have an inconsistent user experience. I talked to @ebarsoumMS about this. Let's discuss and reach a conclusion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not an "API breaking change". I think it falls into the scenarios that @TomFinley listed here #3602 (comment).

However even many years later sometimes we still have somewhat troublesome defaults running around

Here, if there is a better default value, I think it is acceptable to change the default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this doesn't actually change training behavior, nor the metrics calculated by ML.NET evaluators. Just changes the metric that LightGbm calculates internally.

In ML.NET, when we do the following (e.g. for binary classification)

var transformedTestData = model.Transform(testData);
var metrics = mlContext.BinaryClassification.Evaluate(transformedTestData);

the evaluator computes all relevant metrics for binary classification regardless of what is specified by LightGbm's EvaluationMetric parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may control LightGBM's early stopping, but otherwise I think this is a NOOP change. ML.NET doesn't relay the stdout from LightGBM to the user, and ML.NET uses its own evaluators for computing the final metrics.

Users could benefit from ML.NET relaying this info back to the user. This would allow a GUI to show the learning curves in real time (or as text output from a CLI):


static Options()
{
Expand Down
2 changes: 1 addition & 1 deletion src/Microsoft.ML.LightGbm/LightGbmMulticlassTrainer.cs
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ public enum EvaluateMetricType
[Argument(ArgumentType.AtMostOnce,
HelpText = "Evaluation metrics.",
ShortName = "em")]
public EvaluateMetricType EvaluationMetric = EvaluateMetricType.Error;
public EvaluateMetricType EvaluationMetric = EvaluateMetricType.Default;

static Options()
{
Expand Down
2 changes: 1 addition & 1 deletion src/Microsoft.ML.LightGbm/LightGbmRankingTrainer.cs
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ public enum EvaluateMetricType
[Argument(ArgumentType.AtMostOnce,
HelpText = "Evaluation metrics.",
ShortName = "em")]
public EvaluateMetricType EvaluationMetric = EvaluateMetricType.NormalizedDiscountedCumulativeGain;
public EvaluateMetricType EvaluationMetric = EvaluateMetricType.Default;

static Options()
{
Expand Down
2 changes: 1 addition & 1 deletion src/Microsoft.ML.LightGbm/LightGbmRegressionTrainer.cs
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ public enum EvaluateMetricType
[Argument(ArgumentType.AtMostOnce,
HelpText = "Evaluation metrics.",
ShortName = "em")]
public EvaluateMetricType EvaluationMetric = EvaluateMetricType.RootMeanSquaredError;
public EvaluateMetricType EvaluationMetric = EvaluateMetricType.Default;

static Options()
{
Expand Down
8 changes: 4 additions & 4 deletions test/BaselineOutput/Common/EntryPoints/core_manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -11297,7 +11297,7 @@
"Required": false,
"SortOrder": 150.0,
"IsNullable": false,
"Default": "Logloss"
"Default": "Default"
},
{
"Name": "MaximumBinCountPerFeature",
Expand Down Expand Up @@ -11782,7 +11782,7 @@
"Required": false,
"SortOrder": 150.0,
"IsNullable": false,
"Default": "Error"
"Default": "Default"
},
{
"Name": "MaximumBinCountPerFeature",
Expand Down Expand Up @@ -12279,7 +12279,7 @@
"Required": false,
"SortOrder": 150.0,
"IsNullable": false,
"Default": "NormalizedDiscountedCumulativeGain"
"Default": "Default"
},
{
"Name": "MaximumBinCountPerFeature",
Expand Down Expand Up @@ -12737,7 +12737,7 @@
"Required": false,
"SortOrder": 150.0,
"IsNullable": false,
"Default": "RootMeanSquaredError"
"Default": "Default"
},
{
"Name": "MaximumBinCountPerFeature",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,10 @@ Virtual memory usage(MB): %Number%
[1] 'Loading data for LightGBM' started.
[1] 'Loading data for LightGBM' finished in %Time%.
[2] 'Training with LightGBM' started.
[2] (%Time%) Iteration: 50 Training-rmse: 6.09160118577349
[2] (%Time%) Iteration: 50 Training-: 37.107605006517
[2] 'Training with LightGBM' finished in %Time%.
[3] 'Loading data for LightGBM #2' started.
[3] 'Loading data for LightGBM #2' finished in %Time%.
[4] 'Training with LightGBM #2' started.
[4] (%Time%) Iteration: 50 Training-rmse: 5.26343689176522
[4] (%Time%) Iteration: 50 Training-: 27.7037679135951
[4] 'Training with LightGBM #2' finished in %Time%.
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Virtual memory usage(MB): %Number%
[1] 'Loading data for LightGBM' started.
[1] 'Loading data for LightGBM' finished in %Time%.
[2] 'Training with LightGBM' started.
[2] (%Time%) Iteration: 50 Training-rmse: 5.10533343749577
[2] (%Time%) Iteration: 50 Training-: 26.0644295080124
[2] 'Training with LightGBM' finished in %Time%.
[3] 'Saving model' started.
[3] 'Saving model' finished in %Time%.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
LightGBMR
L1(avg) L2(avg) RMS(avg) Loss-fn(avg) R Squared /iter /lr /nl /mil /v /nt Learner Name Train Dataset Test Dataset Results File Run Time Physical Memory Virtual Memory Command Line Settings
26.59978 1393.326 37.32081 1393.326 0.923402 50 0.2 20 10 + 1 LightGBMR %Data% %Output% 99 0 0 maml.exe CV tr=LightGBMR{nt=1 iter=50 em=RootMeanSquaredError v=+ lr=0.2 mil=10 nl=20} threads=- dout=%Output% loader=Text{col=Label:R4:11 col=Features:R4:0-10 sep=; header+} data=%Data% seed=1 /iter:50;/lr:0.2;/nl:20;/mil:10;/v:+;/nt:1
L1(avg) L2(avg) RMS(avg) Loss-fn(avg) R Squared /em /iter /lr /nl /mil /v /nt Learner Name Train Dataset Test Dataset Results File Run Time Physical Memory Virtual Memory Command Line Settings
26.59978 1393.326 37.32081 1393.326 0.923402 RootMeanSquaredError 50 0.2 20 10 + 1 LightGBMR %Data% %Output% 99 0 0 maml.exe CV tr=LightGBMR{nt=1 iter=50 em=RootMeanSquaredError v=+ lr=0.2 mil=10 nl=20} threads=- dout=%Output% loader=Text{col=Label:R4:11 col=Features:R4:0-10 sep=; header+} data=%Data% seed=1 /em:RootMeanSquaredError;/iter:50;/lr:0.2;/nl:20;/mil:10;/v:+;/nt:1

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
LightGBMR
L1(avg) L2(avg) RMS(avg) Loss-fn(avg) R Squared /iter /lr /nl /mil /v /nt Learner Name Train Dataset Test Dataset Results File Run Time Physical Memory Virtual Memory Command Line Settings
3.428896 25.23601 5.023546 25.23601 0.998616 50 0.2 20 10 + 1 LightGBMR %Data% %Data% %Output% 99 0 0 maml.exe TrainTest test=%Data% tr=LightGBMR{nt=1 iter=50 em=RootMeanSquaredError v=+ lr=0.2 mil=10 nl=20} dout=%Output% loader=Text{col=Label:R4:11 col=Features:R4:0-10 sep=; header+} data=%Data% out=%Output% seed=1 /iter:50;/lr:0.2;/nl:20;/mil:10;/v:+;/nt:1
L1(avg) L2(avg) RMS(avg) Loss-fn(avg) R Squared /em /iter /lr /nl /mil /v /nt Learner Name Train Dataset Test Dataset Results File Run Time Physical Memory Virtual Memory Command Line Settings
3.428896 25.23601 5.023546 25.23601 0.998616 RootMeanSquaredError 50 0.2 20 10 + 1 LightGBMR %Data% %Data% %Output% 99 0 0 maml.exe TrainTest test=%Data% tr=LightGBMR{nt=1 iter=50 em=RootMeanSquaredError v=+ lr=0.2 mil=10 nl=20} dout=%Output% loader=Text{col=Label:R4:11 col=Features:R4:0-10 sep=; header+} data=%Data% out=%Output% seed=1 /em:RootMeanSquaredError;/iter:50;/lr:0.2;/nl:20;/mil:10;/v:+;/nt:1