-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Added samples for tree regression trainers. #2999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
var trainingData = mlContext.Data.LoadFromEnumerable(examples); | ||
|
||
// Define the trainer. | ||
var pipeline = mlContext.Regression.Trainers.FastForest(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe user wants to learn some little configuration. Could you somehow make it non-default? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought about this, and decided to keep it as defaults for the following reasons:
-
ML.NET has done some work with tuning the defaults so that users can have a good starting point. If we add parameters here, users would think that they have to worry about coming up with those parameters for their own scenario. And that's a negative. It's much easier for users to think that they don't need any parameter, until they actually know what they're doing.
-
Any .NET developer can easily find out what parameters are available. The real value is not to show them the parameters, but to explain why they might need to change say number of trees or leaves. We're already doing that in the parameter description, and it would be too much to go over those details in the sample body.
In reply to: 266665195 [](ancestors = 266665195)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is another example in FastForestWithOptions.cs
that shows the configuration.
In reply to: 267017735 [](ancestors = 267017735,266665195)
public static class FastForest | ||
{ | ||
// This example requires installation of additional NuGet package | ||
// <a href="https://www.nuget.org/packages/Microsoft.ML.FastTree/">Microsoft.ML.FastTree</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be xml doc. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xml looks busy, and it won't render as xml inside the example. This code goes in the example block.
In reply to: 266665386 [](ancestors = 266665386)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// xml are not used in .net examples, so we're sticking to // comments inside samples' code.
In reply to: 266722954 [](ancestors = 266722954,266665386)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM.
var model = pipeline.Fit(trainingData); | ||
|
||
// Create testing examples. Use different random seed to make it different from training data. | ||
var testData = mlContext.Data.LoadFromEnumerable(GenerateRandomDataPoints(500, seed:123)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seed:123 [](start = 91, length = 8)
why are you setting the seed here, but not above? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've explained it in the comment line above. We want training and test set to be different. The first one uses default seed of 0. Here I use a different seed.
In reply to: 266723117 [](ancestors = 266723117)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we be explicit to show the seed:0
in the training case as well. Also, can seed:123
be changed to seed:1
? or maybe I should better ask if there is any significance of using seed:123
?
In reply to: 267018416 [](ancestors = 267018416,266723117)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we're splitting hairs here. 123 is just a random number like saying ABC.
In reply to: 267029054 [](ancestors = 267029054,267018416,266723117)
var gamModel = trainedPipeline.LastTransformer.Model; | ||
|
||
// Now investigate the bias and shape functions of the GAM model. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the space below. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
namespace Microsoft.ML.Samples.Dynamic.Trainers.Regression | ||
{ | ||
public static class GamWithOptions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GamWithOptions [](start = 24, length = 14)
Just an opinion, but for GAMs I'd use the other example here as well. Feels easier to understand with actual features and scenarios. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other sample is more of a GAM tutorial that already existed. Each model type (linear, tree, GAM, etc) has specific model inspection code (inspecting coefficients, bins, etc) that we're not covering in the API reference. The better way is to cover them in a tutorial "Making sense of GAM models" similar to the rest of cookbook, that all GAM APIs can use.
In reply to: 266723489 [](ancestors = 266723489)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should be consistent in both examples Gam
and GamWithOptions
?
In reply to: 267022906 [](ancestors = 267022906,266723489)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GAM sample has useful information so it's better to keep it, but I don't want to use it as a template for new GAM samples for the reasons above (we'll have two more GAM coming in classification).
In reply to: 267034698 [](ancestors = 267034698,267022906,266723489)
public float Label { get; set; } | ||
// Predicted score from the trainer. | ||
public float Score { get; set; } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be common piece of code in all the example (I assume). Did we think of templating it in some way so that in future if need to change it we can change it at one place? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static FastTreeRegressionTrainer FastTree(this RegressionCatalog.RegressionTrainers catalog, | ||
FastTreeRegressionTrainer.Options options) | ||
FastTreeRegressionTrainer.Options options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 0, length = 8)
nit: one more tab??? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report
@@ Coverage Diff @@
## master #2999 +/- ##
==========================================
+ Coverage 72.26% 72.35% +0.08%
==========================================
Files 802 803 +1
Lines 142901 143295 +394
Branches 16134 16155 +21
==========================================
+ Hits 103272 103678 +406
+ Misses 35210 35191 -19
- Partials 4419 4426 +7
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to #2522.
Samples for tree regression trainers: GAM, FastTree, FastForest, FastTreeTweedie.