Skip to content

Upgrade all regressors to use TT #3319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Apr 16, 2019
Merged

Upgrade all regressors to use TT #3319

merged 27 commits into from
Apr 16, 2019

Conversation

wschin
Copy link
Member

@wschin wschin commented Apr 12, 2019

Part of #2522.

@wschin wschin added the documentation Related to documentation of ML.NET label Apr 12, 2019
@wschin wschin self-assigned this Apr 12, 2019
@wschin wschin changed the title [WIP] Upgrade all regressors to use TT Upgrade all regressors to use TT Apr 13, 2019
@wschin wschin requested review from shmoradims, Ivanidzo4ka, zeahmed and artidoro and removed request for shmoradims and Ivanidzo4ka April 13, 2019 00:00
@codecov
Copy link

codecov bot commented Apr 13, 2019

Codecov Report

Merging #3319 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3319      +/-   ##
==========================================
- Coverage    72.7%   72.69%   -0.01%     
==========================================
  Files         807      807              
  Lines      145172   145172              
  Branches    16225    16225              
==========================================
- Hits       105545   105536       -9     
- Misses      35215    35221       +6     
- Partials     4412     4415       +3
Flag Coverage Δ
#Debug 72.69% <ø> (-0.01%) ⬇️
#production 68.23% <ø> (-0.01%) ⬇️
#test 88.97% <ø> (-0.01%) ⬇️
Impacted Files Coverage Δ
...oft.ML.StandardTrainers/StandardTrainersCatalog.cs 92.34% <ø> (ø) ⬆️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs 89.26% <0%> (-0.63%) ⬇️
...soft.ML.TestFramework/DataPipe/TestDataPipeBase.cs 73.7% <0%> (-0.34%) ⬇️
...StandardTrainers/Standard/LinearModelParameters.cs 60.05% <0%> (-0.27%) ⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 86.1% <0%> (-0.16%) ⬇️

// Label: 0.155, Prediction: 0.164
// Label: 0.515, Prediction: 0.470
// Label: 0.566, Prediction: 0.501
// Label: 0.096, Prediction: 0.138";
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @rogancarr tried to make sure we have some distinction between real output and just comment, and he put three (3) spaces for actual output.
Would be nice you can keep that pattern.
EVERYWHERE #Closed

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Apr 15, 2019

        float randomFloat() => (float)random.NextDouble();

it used to be var.
what is wrong with var? #Resolved


Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/RegressionSamplesTemplate.ttinclude:69 in 6721259. [](commit_id = 6721259, deletion_comment = False)

// Create testing examples. Use different random seed to make it different from training data.
var testData = mlContext.Data.LoadFromEnumerable(GenerateRandomDataPoints(500, seed:123));
// Create testing data. Use different random seed to make it different from training data.
var testData = mlContext.Data.LoadFromEnumerable(GenerateRandomDataPoints(500, seed: 123));
Copy link
Member

@sfilipi sfilipi Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

500 [](start = 86, length = 3)

is this too much for testing? Would 100 be better? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do 5.


In reply to: 275452646 [](ancestors = 275452646)

@Ivanidzo4ka
Copy link
Contributor

        // TODO #2425: OGD is missing baseline tests and seems numerically unstable

I doubt it's great idea to have this lines in our samples.


In reply to: 483336891 [](ancestors = 483336891)


Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/OnlineGradientDescent.cs:43 in 6721259. [](commit_id = 6721259, deletion_comment = False)


namespace Samples.Dynamic.Trainers.Regression
{
public static class OrdinaryLeastSquaresAdvanced
Copy link
Member

@sfilipi sfilipi Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OrdinaryLeastSquaresAdvanced [](start = 24, length = 28)

is there a tt for this? #ByDesign

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No for the reason to LightGbmAdvanced.cs.


In reply to: 275458236 [](ancestors = 275458236)

// MedianHomeValue CrimesPerCapita PercentResidental PercentNonRetail CharlesRiver NitricOxides RoomsPerDwelling PercentPre40s
// 24.00 0.00632 18.00 2.310 0 0.5380 6.5750 65.20
// 21.60 0.02731 00.00 7.070 0 0.4690 6.4210 78.90
// 34.70 0.02729 00.00 7.070 0 0.4690 7.1850 61.10
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this! #ByDesign

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm, I don't quite understand this comment.


In reply to: 275458742 [](ancestors = 275458742)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your code no one printing actual content of the dataview/file, which is not align with how we write samples right now (right now we have code which prints something to console and //Expected output comments).
No one prints this lines, right?
So can we remove them.
And in general can we get rid of calling Download file and run in-memory sample instead?

You spend good chunk of time in attempts to convince me what everything should be IN-MEMORY NO EXCEPTION.
It's weird to see what you actually have references on SampleUtils.

Make sense?


In reply to: 275581463 [](ancestors = 275581463,275458742)

Copy link
Member Author

@wschin wschin Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is not an API sample at all. This is a record of a meaningful pipeline.


In reply to: 275582648 [](ancestors = 275582648,275581463,275458742)

LabelColumnName = nameof(DataPoint.Label),
FeatureColumnName = nameof(DataPoint.Features),
L2Regularization = 0.1f,
CalculateStatistics = false
Copy link
Member

@sfilipi sfilipi Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your one-line comments for the options in the other files were so nice :) #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. But I add them only if I don't need to dig into their code to understand their meanings.


In reply to: 275458769 [](ancestors = 275458769)

@@ -39,7 +42,7 @@ namespace Samples.Dynamic.Trainers.Regression
var model = pipeline.Fit(trainingData);

// Create testing data. Use different random seed to make it different from training data.
var testData = mlContext.Data.LoadFromEnumerable(GenerateRandomDataPoints(500, seed:123));
var testData = mlContext.Data.LoadFromEnumerable(GenerateRandomDataPoints(500, seed: 123));
Copy link
Member

@sfilipi sfilipi Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

500 [](start = 86, length = 3)

100, maybe? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do 5 everywhere.


In reply to: 275458878 [](ancestors = 275458878)

// Label: 0.155, Prediction: 0.164
// Label: 0.515, Prediction: 0.470
// Label: 0.566, Prediction: 0.501
// Label: 0.096, Prediction: 0.138

// Evaluate the overall metrics
var metrics = mlContext.Regression.Evaluate(transformedTestData);
Microsoft.ML.SamplesUtils.ConsoleUtils.PrintMetrics(metrics);
Copy link

@shmoradims shmoradims Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SamplesUtils [](start = 25, length = 12)

please remove SamplesUtils and just print the metric here with regular console.writeline. we're deprecating SamplesUtils altogether. #Closed

}

Copy link

@shmoradims shmoradims Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra lines #WontFix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's caused by TT system. There is no such a line in my ttinclude and tt files.


In reply to: 275562731 [](ancestors = 275562731)

/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[OnlineGradientDescent](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/OnlineGradientDescent.cs)]
/// ]]>
Copy link

@shmoradims shmoradims Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't include this sample until we fix #2425 #ByDesign

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two major meanings to have a sample.

  1. Learn how to call it
  2. Explore from that sample.
    A sample with bad prediction ability doesn't break any of them.

In reply to: 275564038 [](ancestors = 275564038)

/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[OnlineGradientDescent](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/OnlineGradientDescentWithOptions.cs)]
/// ]]>
Copy link

@shmoradims shmoradims Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also needs #2425 first #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2425 is independent to this PR, right? Goal of this PR is building template. Afterwards, we can still do what we want.


In reply to: 275564113 [](ancestors = 275564113)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Shahab. Have documentation which states "this learner is broken here is tracker number" is worse than no documentation at all.
I would probably even make OGD internal until we figure out why it's so bad.


In reply to: 275592181 [](ancestors = 275592181,275564113)


namespace Samples.Dynamic.Trainers.Regression
{
public static class StochasticDualCoordinateAscent
public static class Sdca
Copy link

@shmoradims shmoradims Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we kept going back and forth for sdca. The final version is using the acronym. please also change the filenames to Sdca (don't forget to update xmls referring this file) #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.


In reply to: 275564867 [](ancestors = 275564867)

@wschin
Copy link
Member Author

wschin commented Apr 16, 2019

        float randomFloat() => (float)random.NextDouble();

Code won't compile if we do var. I will remove randomFloat and just use random.NextDouble.


In reply to: 483331666 [](ancestors = 483331666)


Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/RegressionSamplesTemplate.ttinclude:69 in 6721259. [](commit_id = 6721259, deletion_comment = False)

@wschin
Copy link
Member Author

wschin commented Apr 16, 2019

        // TODO #2425: OGD is missing baseline tests and seems numerically unstable

Yes. It'd be much worse if we show numbers. Let me spend another 10 mins on tuning its parameters.

[Update] I gave up. This is the worest linear trainer I have ever seen.


In reply to: 483337067 [](ancestors = 483337067,483336891)


Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/OnlineGradientDescent.cs:43 in 6721259. [](commit_id = 6721259, deletion_comment = False)

Copy link

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with everything except OGD.
That thing makes me worry. I doubt it's great idea to have documentation which claims it's currently under construction. I would rather prefer to hide whole trainer from users until we figure out problem/ in what conditions it works fine. We have plenty of options to for users anyway.

@wschin wschin merged commit 2e99197 into dotnet:master Apr 16, 2019
@wschin wschin deleted the tt-reg branch April 16, 2019 19:32
@ghost ghost locked as resolved and limited conversation to collaborators Mar 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants