Skip to content

Added samples for tree regression trainers. #2999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 19, 2019

Conversation

shmoradims
Copy link

Related to #2522.

Samples for tree regression trainers: GAM, FastTree, FastForest, FastTreeTweedie.

var trainingData = mlContext.Data.LoadFromEnumerable(examples);

// Define the trainer.
var pipeline = mlContext.Regression.Trainers.FastForest();
Copy link
Member

@wschin wschin Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe user wants to learn some little configuration. Could you somehow make it non-default? #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought about this, and decided to keep it as defaults for the following reasons:

  1. ML.NET has done some work with tuning the defaults so that users can have a good starting point. If we add parameters here, users would think that they have to worry about coming up with those parameters for their own scenario. And that's a negative. It's much easier for users to think that they don't need any parameter, until they actually know what they're doing.

  2. Any .NET developer can easily find out what parameters are available. The real value is not to show them the parameters, but to explain why they might need to change say number of trees or leaves. We're already doing that in the parameter description, and it would be too much to go over those details in the sample body.


In reply to: 266665195 [](ancestors = 266665195)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is another example in FastForestWithOptions.cs that shows the configuration.


In reply to: 267017735 [](ancestors = 267017735,266665195)

public static class FastForest
{
// This example requires installation of additional NuGet package
// <a href="https://www.nuget.org/packages/Microsoft.ML.FastTree/">Microsoft.ML.FastTree</a>.
Copy link
Member

@wschin wschin Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be xml doc. #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xml looks busy, and it won't render as xml inside the example. This code goes in the example block.


In reply to: 266665386 [](ancestors = 266665386)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// xml are not used in .net examples, so we're sticking to // comments inside samples' code.


In reply to: 266722954 [](ancestors = 266722954,266665386)

Copy link
Member

@wschin wschin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

var model = pipeline.Fit(trainingData);

// Create testing examples. Use different random seed to make it different from training data.
var testData = mlContext.Data.LoadFromEnumerable(GenerateRandomDataPoints(500, seed:123));
Copy link
Member

@sfilipi sfilipi Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seed:123 [](start = 91, length = 8)

why are you setting the seed here, but not above? #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've explained it in the comment line above. We want training and test set to be different. The first one uses default seed of 0. Here I use a different seed.


In reply to: 266723117 [](ancestors = 266723117)

Copy link
Contributor

@zeahmed zeahmed Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we be explicit to show the seed:0 in the training case as well. Also, can seed:123 be changed to seed:1? or maybe I should better ask if there is any significance of using seed:123?


In reply to: 267018416 [](ancestors = 267018416,266723117)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we're splitting hairs here. 123 is just a random number like saying ABC.


In reply to: 267029054 [](ancestors = 267029054,267018416,266723117)

var gamModel = trainedPipeline.LastTransformer.Model;

// Now investigate the bias and shape functions of the GAM model.

Copy link
Member

@sfilipi sfilipi Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the space below. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


In reply to: 266723313 [](ancestors = 266723313)


namespace Microsoft.ML.Samples.Dynamic.Trainers.Regression
{
public static class GamWithOptions
Copy link
Member

@sfilipi sfilipi Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GamWithOptions [](start = 24, length = 14)

Just an opinion, but for GAMs I'd use the other example here as well. Feels easier to understand with actual features and scenarios. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other sample is more of a GAM tutorial that already existed. Each model type (linear, tree, GAM, etc) has specific model inspection code (inspecting coefficients, bins, etc) that we're not covering in the API reference. The better way is to cover them in a tutorial "Making sense of GAM models" similar to the rest of cookbook, that all GAM APIs can use.


In reply to: 266723489 [](ancestors = 266723489)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should be consistent in both examples Gam and GamWithOptions?


In reply to: 267022906 [](ancestors = 267022906,266723489)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GAM sample has useful information so it's better to keep it, but I don't want to use it as a template for new GAM samples for the reasons above (we'll have two more GAM coming in classification).


In reply to: 267034698 [](ancestors = 267034698,267022906,266723489)

public float Label { get; set; }
// Predicted score from the trainer.
public float Score { get; set; }
}
Copy link
Contributor

@zeahmed zeahmed Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be common piece of code in all the example (I assume). Did we think of templating it in some way so that in future if need to change it we can change it at one place? #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're adding boilerplate code in order to make the samples self-contained without dependency on external files that users have to look up. please see #2726.


In reply to: 267029777 [](ancestors = 267029777)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean this: #3017
Obviously not going to be in this PR.


In reply to: 267038999 [](ancestors = 267038999,267029777)

public static FastTreeRegressionTrainer FastTree(this RegressionCatalog.RegressionTrainers catalog,
FastTreeRegressionTrainer.Options options)
FastTreeRegressionTrainer.Options options)
Copy link
Contributor

@zeahmed zeahmed Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    [](start = 0, length = 8)

nit: one more tab??? #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes


In reply to: 267035106 [](ancestors = 267035106)

@codecov
Copy link

codecov bot commented Mar 19, 2019

Codecov Report

Merging #2999 into master will increase coverage by 0.08%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master    #2999      +/-   ##
==========================================
+ Coverage   72.26%   72.35%   +0.08%     
==========================================
  Files         802      803       +1     
  Lines      142901   143295     +394     
  Branches    16134    16155      +21     
==========================================
+ Hits       103272   103678     +406     
+ Misses      35210    35191      -19     
- Partials     4419     4426       +7
Flag Coverage Δ
#Debug 72.35% <0%> (+0.08%) ⬆️
#production 68.06% <0%> (+0.06%) ⬆️
#test 88.52% <ø> (+0.04%) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.FastTree/FastTreeArguments.cs 85.38% <ø> (ø) ⬆️
src/Microsoft.ML.FastTree/TreeTrainersCatalog.cs 94.18% <ø> (ø) ⬆️
src/Microsoft.ML.SamplesUtils/ConsoleUtils.cs 0% <0%> (ø) ⬆️
...oft.ML.Transforms/Text/TextFeaturizingEstimator.cs 83.2% <0%> (-7.12%) ⬇️
.../Microsoft.ML.Data/Model/ModelOperationsCatalog.cs 97.27% <0%> (-2.73%) ⬇️
src/Microsoft.ML.Maml/MAML.cs 24.75% <0%> (-1.46%) ⬇️
...enarios/Api/Estimators/TrainSaveModelAndPredict.cs 95.45% <0%> (-0.2%) ⬇️
...Microsoft.ML.Data/DataLoadSave/TransformerChain.cs 87.01% <0%> (-0.11%) ⬇️
...s/Api/CookbookSamples/CookbookSamplesDynamicApi.cs 93.53% <0%> (-0.09%) ⬇️
...s/Scenarios/Api/CookbookSamples/CookbookSamples.cs 99.49% <0%> (-0.01%) ⬇️
... and 24 more

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@shmoradims shmoradims merged commit 00a5b35 into dotnet:master Mar 19, 2019
@shmoradims shmoradims deleted the tree_regression_samples branch April 2, 2019 23:41
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants