Upgrade all regressors to use TT #3319

wschin · 2019-04-12T22:55:40Z

Part of #2522.

codecov · 2019-04-13T01:16:32Z

Codecov Report

Merging #3319 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3319      +/-   ##
==========================================
- Coverage    72.7%   72.69%   -0.01%     
==========================================
  Files         807      807              
  Lines      145172   145172              
  Branches    16225    16225              
==========================================
- Hits       105545   105536       -9     
- Misses      35215    35221       +6     
- Partials     4412     4415       +3

Flag	Coverage Δ
#Debug	`72.69% <ø> (-0.01%)`	⬇️
#production	`68.23% <ø> (-0.01%)`	⬇️
#test	`88.97% <ø> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
...oft.ML.StandardTrainers/StandardTrainersCatalog.cs	`92.34% <ø> (ø)`	⬆️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs	`89.26% <0%> (-0.63%)`	⬇️
...soft.ML.TestFramework/DataPipe/TestDataPipeBase.cs	`73.7% <0%> (-0.34%)`	⬇️
...StandardTrainers/Standard/LinearModelParameters.cs	`60.05% <0%> (-0.27%)`	⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs	`86.1% <0%> (-0.16%)`	⬇️

Ivanidzo4ka · 2019-04-15T16:49:26Z

docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/FastForest.tt

+            // Label: 0.155, Prediction: 0.164
+            // Label: 0.515, Prediction: 0.470
+            // Label: 0.566, Prediction: 0.501
+            // Label: 0.096, Prediction: 0.138";


I think @rogancarr tried to make sure we have some distinction between real output and just comment, and he put three (3) spaces for actual output.
Would be nice you can keep that pattern.
EVERYWHERE #Closed

Ivanidzo4ka · 2019-04-15T16:50:58Z

        float randomFloat() => (float)random.NextDouble();

it used to be var.
what is wrong with var? #Resolved

Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/RegressionSamplesTemplate.ttinclude:69 in 6721259. [](commit_id = 6721259, deletion_comment = False)

sfilipi · 2019-04-15T16:52:08Z

docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/FastForest.cs

-            // Create testing examples. Use different random seed to make it different from training data.
-            var testData = mlContext.Data.LoadFromEnumerable(GenerateRandomDataPoints(500, seed:123));
+            // Create testing data. Use different random seed to make it different from training data.
+            var testData = mlContext.Data.LoadFromEnumerable(GenerateRandomDataPoints(500, seed: 123));


500 [](start = 86, length = 3)

is this too much for testing? Would 100 be better? #Resolved

Let's do 5.

In reply to: 275452646 [](ancestors = 275452646)

Ivanidzo4ka · 2019-04-15T17:06:08Z

        // TODO #2425: OGD is missing baseline tests and seems numerically unstable

I doubt it's great idea to have this lines in our samples.

In reply to: 483336891 [](ancestors = 483336891)

Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/OnlineGradientDescent.cs:43 in 6721259. [](commit_id = 6721259, deletion_comment = False)

sfilipi · 2019-04-15T17:07:10Z

docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/OrdinaryLeastSquaresAdvanced.cs

+
+namespace Samples.Dynamic.Trainers.Regression
+{
+    public static class OrdinaryLeastSquaresAdvanced


OrdinaryLeastSquaresAdvanced [](start = 24, length = 28)

is there a tt for this? #ByDesign

No for the reason to LightGbmAdvanced.cs.

In reply to: 275458236 [](ancestors = 275458236)

Ivanidzo4ka · 2019-04-15T17:08:34Z

docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/OrdinaryLeastSquaresAdvanced.cs

+            // MedianHomeValue    CrimesPerCapita    PercentResidental    PercentNonRetail    CharlesRiver    NitricOxides    RoomsPerDwelling    PercentPre40s
+            // 24.00              0.00632            18.00                2.310               0               0.5380          6.5750              65.20
+            // 21.60              0.02731            00.00                7.070               0               0.4690          6.4210              78.90
+            // 34.70              0.02729            00.00                7.070               0               0.4690          7.1850              61.10


and this! #ByDesign

Humm, I don't quite understand this comment.

In reply to: 275458742 [](ancestors = 275458742)

In your code no one printing actual content of the dataview/file, which is not align with how we write samples right now (right now we have code which prints something to console and //Expected output comments).
No one prints this lines, right?
So can we remove them.
And in general can we get rid of calling Download file and run in-memory sample instead?

You spend good chunk of time in attempts to convince me what everything should be IN-MEMORY NO EXCEPTION.
It's weird to see what you actually have references on SampleUtils.

Make sense?

In reply to: 275581463 [](ancestors = 275581463,275458742)

This file is not an API sample at all. This is a record of a meaningful pipeline.

In reply to: 275582648 [](ancestors = 275582648,275581463,275458742)

sfilipi · 2019-04-15T17:08:37Z

.../samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/OrdinaryLeastSquaresWithOptions.tt

+                LabelColumnName = nameof(DataPoint.Label),
+                FeatureColumnName = nameof(DataPoint.Features),
+                L2Regularization = 0.1f,
+                CalculateStatistics = false


your one-line comments for the options in the other files were so nice :) #Pending

Thanks. But I add them only if I don't need to dig into their code to understand their meanings.

In reply to: 275458769 [](ancestors = 275458769)

sfilipi · 2019-04-15T17:08:52Z

...samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/RegressionSamplesTemplate.ttinclude

@@ -39,7 +42,7 @@ namespace Samples.Dynamic.Trainers.Regression
            var model = pipeline.Fit(trainingData);

            // Create testing data. Use different random seed to make it different from training data.
-            var testData = mlContext.Data.LoadFromEnumerable(GenerateRandomDataPoints(500, seed:123));
+            var testData = mlContext.Data.LoadFromEnumerable(GenerateRandomDataPoints(500, seed: 123));


500 [](start = 86, length = 3)

100, maybe? #Resolved

I will do 5 everywhere.

In reply to: 275458878 [](ancestors = 275458878)

shmoradims · 2019-04-15T22:04:27Z

docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/FastForest.cs

+            // Label: 0.155, Prediction: 0.164
+            // Label: 0.515, Prediction: 0.470
+            // Label: 0.566, Prediction: 0.501
+            // Label: 0.096, Prediction: 0.138

            // Evaluate the overall metrics
            var metrics = mlContext.Regression.Evaluate(transformedTestData);
            Microsoft.ML.SamplesUtils.ConsoleUtils.PrintMetrics(metrics);


SamplesUtils [](start = 25, length = 12)

please remove SamplesUtils and just print the metric here with regular console.writeline. we're deprecating SamplesUtils altogether. #Closed

shmoradims · 2019-04-15T22:05:46Z

docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/FastTreeTweedie.cs

+}
+


extra lines #WontFix

It's caused by TT system. There is no such a line in my ttinclude and tt files.

In reply to: 275562731 [](ancestors = 275562731)

shmoradims · 2019-04-15T22:11:21Z

src/Microsoft.ML.StandardTrainers/StandardTrainersCatalog.cs

+        /// <format type="text/markdown">
+        /// <![CDATA[
+        /// [!code-csharp[OnlineGradientDescent](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/OnlineGradientDescent.cs)]
+        /// ]]>


please don't include this sample until we fix #2425 #ByDesign

There are two major meanings to have a sample.

Learn how to call it

Explore from that sample.
A sample with bad prediction ability doesn't break any of them.

In reply to: 275564038 [](ancestors = 275564038)

shmoradims · 2019-04-15T22:11:38Z

src/Microsoft.ML.StandardTrainers/StandardTrainersCatalog.cs

+        /// <format type="text/markdown">
+        /// <![CDATA[
+        /// [!code-csharp[OnlineGradientDescent](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/OnlineGradientDescentWithOptions.cs)]
+        /// ]]>


also needs #2425 first #Pending

#2425 is independent to this PR, right? Goal of this PR is building template. Afterwards, we can still do what we want.

In reply to: 275564113 [](ancestors = 275564113)

I agree with Shahab. Have documentation which states "this learner is broken here is tracker number" is worse than no documentation at all.
I would probably even make OGD internal until we figure out why it's so bad.

In reply to: 275592181 [](ancestors = 275592181,275564113)

shmoradims · 2019-04-15T22:14:34Z

docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/StochasticDualCoordinateAscent.cs


 namespace Samples.Dynamic.Trainers.Regression
 {
-    public static class StochasticDualCoordinateAscent
+    public static class Sdca


we kept going back and forth for sdca. The final version is using the acronym. please also change the filenames to Sdca (don't forget to update xmls referring this file) #Closed

Sure.

In reply to: 275564867 [](ancestors = 275564867)

wschin · 2019-04-16T00:13:59Z

        float randomFloat() => (float)random.NextDouble();

Code won't compile if we do var. I will remove randomFloat and just use random.NextDouble.

In reply to: 483331666 [](ancestors = 483331666)

Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/RegressionSamplesTemplate.ttinclude:69 in 6721259. [](commit_id = 6721259, deletion_comment = False)

wschin · 2019-04-16T00:20:15Z

        // TODO #2425: OGD is missing baseline tests and seems numerically unstable

Yes. It'd be much worse if we show numbers. Let me spend another 10 mins on tuning its parameters.

[Update] I gave up. This is the worest linear trainer I have ever seen.

In reply to: 483337067 [](ancestors = 483337067,483336891)

Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/OnlineGradientDescent.cs:43 in 6721259. [](commit_id = 6721259, deletion_comment = False)

shmoradims

Ivanidzo4ka

I'm fine with everything except OGD.
That thing makes me worry. I doubt it's great idea to have documentation which claims it's currently under construction. I would rather prefer to hide whole trainer from users until we figure out problem/ in what conditions it works fine. We have plenty of options to for users anyway.

wschin added 8 commits April 12, 2019 14:52

Add Gam simple samples

4020dfd

Add advanced Gam samples back

746e323

Update FastForest samples

2a696f0

Remove items added by accident

c3da068

Update FastTree

4bb7f85

Replace tab with spaces

d8c48f4

Rename advanced sample files

8083107

Update LightGbm

21c4876

wschin added the documentation Related to documentation of ML.NET label Apr 12, 2019

wschin self-assigned this Apr 12, 2019

wschin added 5 commits April 12, 2019 16:23

Add FastTreeTweedie and fix format

c6d6b18

Rename two files

6864142

Update OLS and fix build

9c246d1

Fix csproj

3cfe768

Upgrade SDCA

06f67b7

wschin changed the title ~~[WIP] Upgrade all regressors to use TT~~ Upgrade all regressors to use TT Apr 13, 2019

wschin requested review from shmoradims, Ivanidzo4ka, zeahmed and artidoro and removed request for shmoradims and Ivanidzo4ka April 13, 2019 00:00

shmoradims mentioned this pull request Apr 13, 2019

Docs and samples for the API reference site (P0 & P1 Trainers) #2522

Closed

wschin added 2 commits April 12, 2019 17:28

Merge branch 'master' into tt-reg

e64a6df

Add missing links to docs

6721259

Ivanidzo4ka reviewed Apr 15, 2019

View reviewed changes

sfilipi reviewed Apr 15, 2019

View reviewed changes

Ivanidzo4ka reviewed Apr 15, 2019

View reviewed changes

sfilipi reviewed Apr 15, 2019

View reviewed changes

shmoradims reviewed Apr 15, 2019

View reviewed changes

Add spaces back

02eea47

shmoradims reviewed Apr 15, 2019

View reviewed changes

wschin added 4 commits April 15, 2019 15:33

Reduce test set's size

53a2beb

Move class comment to example()

29b556d

Fix LightGBM nuget reference

775c3b6

Fix some comments

5ef7b7b

wschin added 3 commits April 16, 2019 09:24

Address comments

8364916

Avoid local function in ttinclude

6e038f1

Rename sample files

952ee58

wschin mentioned this pull request Apr 16, 2019

We have zero tests for OnlineGradientDescent #2425

Open

shmoradims approved these changes Apr 16, 2019

View reviewed changes

wschin added 4 commits April 16, 2019 10:14

Update file names

8e6386c

Update OGD doc strings

efe91ed

Merge branch 'master' into tt-reg

04c99cf

Fix csproj

aadb4ff

Ivanidzo4ka approved these changes Apr 16, 2019

View reviewed changes

wschin merged commit 2e99197 into dotnet:master Apr 16, 2019

wschin deleted the tt-reg branch April 16, 2019 19:32

ghost locked as resolved and limited conversation to collaborators Mar 22, 2022

Upgrade all regressors to use TT #3319

Upgrade all regressors to use TT #3319

Uh oh!

Conversation

wschin commented Apr 12, 2019

Uh oh!

codecov bot commented Apr 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Ivanidzo4ka Apr 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ivanidzo4ka commented Apr 15, 2019 • edited by wschin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sfilipi Apr 15, 2019 • edited by wschin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ivanidzo4ka commented Apr 15, 2019

Uh oh!

sfilipi Apr 15, 2019 • edited by wschin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ivanidzo4ka Apr 15, 2019 • edited by wschin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wschin Apr 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfilipi Apr 15, 2019 • edited by wschin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfilipi Apr 15, 2019 • edited by wschin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shmoradims Apr 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shmoradims Apr 15, 2019 • edited by wschin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shmoradims Apr 15, 2019 • edited by wschin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shmoradims Apr 15, 2019 • edited by Ivanidzo4ka Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shmoradims Apr 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

codecov bot commented Apr 13, 2019 •

edited

Loading

Ivanidzo4ka Apr 15, 2019 •

edited

Loading

Ivanidzo4ka commented Apr 15, 2019 •

edited by wschin

Loading

sfilipi Apr 15, 2019 •

edited by wschin

Loading

sfilipi Apr 15, 2019 •

edited by wschin

Loading

Ivanidzo4ka Apr 15, 2019 •

edited by wschin

Loading

wschin Apr 15, 2019 •

edited

Loading

sfilipi Apr 15, 2019 •

edited by wschin

Loading

sfilipi Apr 15, 2019 •

edited by wschin

Loading

shmoradims Apr 15, 2019 •

edited

Loading

shmoradims Apr 15, 2019 •

edited by wschin

Loading

shmoradims Apr 15, 2019 •

edited by wschin

Loading

shmoradims Apr 15, 2019 •

edited by Ivanidzo4ka

Loading

shmoradims Apr 15, 2019 •

edited

Loading

wschin commented Apr 16, 2019 •

edited

Loading

wschin commented Apr 16, 2019 •

edited

Loading