-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Upgrade all regressors to use TT #3319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3319 +/- ##
==========================================
- Coverage 72.7% 72.69% -0.01%
==========================================
Files 807 807
Lines 145172 145172
Branches 16225 16225
==========================================
- Hits 105545 105536 -9
- Misses 35215 35221 +6
- Partials 4412 4415 +3
|
// Label: 0.155, Prediction: 0.164 | ||
// Label: 0.515, Prediction: 0.470 | ||
// Label: 0.566, Prediction: 0.501 | ||
// Label: 0.096, Prediction: 0.138"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @rogancarr tried to make sure we have some distinction between real output and just comment, and he put three (3) spaces for actual output.
Would be nice you can keep that pattern.
EVERYWHERE #Closed
// Create testing examples. Use different random seed to make it different from training data. | ||
var testData = mlContext.Data.LoadFromEnumerable(GenerateRandomDataPoints(500, seed:123)); | ||
// Create testing data. Use different random seed to make it different from training data. | ||
var testData = mlContext.Data.LoadFromEnumerable(GenerateRandomDataPoints(500, seed: 123)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
500 [](start = 86, length = 3)
is this too much for testing? Would 100 be better? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt it's great idea to have this lines in our samples. In reply to: 483336891 [](ancestors = 483336891) Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/OnlineGradientDescent.cs:43 in 6721259. [](commit_id = 6721259, deletion_comment = False) |
|
||
namespace Samples.Dynamic.Trainers.Regression | ||
{ | ||
public static class OrdinaryLeastSquaresAdvanced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OrdinaryLeastSquaresAdvanced [](start = 24, length = 28)
is there a tt for this? #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// MedianHomeValue CrimesPerCapita PercentResidental PercentNonRetail CharlesRiver NitricOxides RoomsPerDwelling PercentPre40s | ||
// 24.00 0.00632 18.00 2.310 0 0.5380 6.5750 65.20 | ||
// 21.60 0.02731 00.00 7.070 0 0.4690 6.4210 78.90 | ||
// 34.70 0.02729 00.00 7.070 0 0.4690 7.1850 61.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this! #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your code no one printing actual content of the dataview/file, which is not align with how we write samples right now (right now we have code which prints something to console and //Expected output comments).
No one prints this lines, right?
So can we remove them.
And in general can we get rid of calling Download file and run in-memory sample instead?
You spend good chunk of time in attempts to convince me what everything should be IN-MEMORY NO EXCEPTION.
It's weird to see what you actually have references on SampleUtils.
Make sense?
In reply to: 275581463 [](ancestors = 275581463,275458742)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is not an API sample at all. This is a record of a meaningful pipeline.
In reply to: 275582648 [](ancestors = 275582648,275581463,275458742)
LabelColumnName = nameof(DataPoint.Label), | ||
FeatureColumnName = nameof(DataPoint.Features), | ||
L2Regularization = 0.1f, | ||
CalculateStatistics = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your one-line comments for the options in the other files were so nice :) #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. But I add them only if I don't need to dig into their code to understand their meanings.
In reply to: 275458769 [](ancestors = 275458769)
@@ -39,7 +42,7 @@ namespace Samples.Dynamic.Trainers.Regression | |||
var model = pipeline.Fit(trainingData); | |||
|
|||
// Create testing data. Use different random seed to make it different from training data. | |||
var testData = mlContext.Data.LoadFromEnumerable(GenerateRandomDataPoints(500, seed:123)); | |||
var testData = mlContext.Data.LoadFromEnumerable(GenerateRandomDataPoints(500, seed: 123)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
500 [](start = 86, length = 3)
100, maybe? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Label: 0.155, Prediction: 0.164 | ||
// Label: 0.515, Prediction: 0.470 | ||
// Label: 0.566, Prediction: 0.501 | ||
// Label: 0.096, Prediction: 0.138 | ||
|
||
// Evaluate the overall metrics | ||
var metrics = mlContext.Regression.Evaluate(transformedTestData); | ||
Microsoft.ML.SamplesUtils.ConsoleUtils.PrintMetrics(metrics); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SamplesUtils [](start = 25, length = 12)
please remove SamplesUtils and just print the metric here with regular console.writeline. we're deprecating SamplesUtils altogether. #Closed
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra lines #WontFix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's caused by TT system. There is no such a line in my ttinclude and tt files.
In reply to: 275562731 [](ancestors = 275562731)
/// <format type="text/markdown"> | ||
/// <] | ||
/// ]]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't include this sample until we fix #2425 #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two major meanings to have a sample.
- Learn how to call it
- Explore from that sample.
A sample with bad prediction ability doesn't break any of them.
In reply to: 275564038 [](ancestors = 275564038)
/// <format type="text/markdown"> | ||
/// <] | ||
/// ]]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also needs #2425 first #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Shahab. Have documentation which states "this learner is broken here is tracker number" is worse than no documentation at all.
I would probably even make OGD internal until we figure out why it's so bad.
In reply to: 275592181 [](ancestors = 275592181,275564113)
|
||
namespace Samples.Dynamic.Trainers.Regression | ||
{ | ||
public static class StochasticDualCoordinateAscent | ||
public static class Sdca |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we kept going back and forth for sdca. The final version is using the acronym. please also change the filenames to Sdca (don't forget to update xmls referring this file) #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code won't compile if we do In reply to: 483331666 [](ancestors = 483331666) Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/RegressionSamplesTemplate.ttinclude:69 in 6721259. [](commit_id = 6721259, deletion_comment = False) |
Yes. It'd be much worse if we show numbers. Let me spend another 10 mins on tuning its parameters. [Update] I gave up. This is the worest linear trainer I have ever seen. In reply to: 483337067 [](ancestors = 483337067,483336891) Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/OnlineGradientDescent.cs:43 in 6721259. [](commit_id = 6721259, deletion_comment = False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with everything except OGD.
That thing makes me worry. I doubt it's great idea to have documentation which claims it's currently under construction. I would rather prefer to hide whole trainer from users until we figure out problem/ in what conditions it works fine. We have plenty of options to for users anyway.
Part of #2522.