Skip to content

Updated xml docs for tree-based trainers. #2970

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 15, 2019

Conversation

shmoradims
Copy link

Updated XML documentation for tree-based trainers (FastTree, FastForest, GAM, etc). Related to #2522.

Samples to come in a separate PR.

@shmoradims shmoradims requested review from sfilipi, rogancarr and zeahmed and removed request for sfilipi March 15, 2019 00:05
@@ -24,7 +24,7 @@
The output of the ensemble produced by MART on a given instance is the sum of the tree outputs.
</para>
<list type='bullet'>
<item><description>In case of a binary classification problem, the output is converted to a probability by using some form of calibration.</description></item>
Copy link
Contributor

@zeahmed zeahmed Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case [](start = 32, length = 4)

is the deletion of word case by mistake? #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch


In reply to: 265813438 [](ancestors = 265813438)

This learner is a generalization of Poisson, compound Poisson, and gamma regression.
</summary>
<!--
The following text describes the FastForest algorithm details.
Copy link
Contributor

@zeahmed zeahmed Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FastForest [](start = 37, length = 10)

GAMs??? #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, copy-paste error


In reply to: 265813553 [](ancestors = 265813553)

@@ -594,7 +643,7 @@ public abstract class BoostedTreeOptions : TreeOptions
public bool BestStepRankingRegressionTrees = false;

/// <summary>
/// Should we use line search for a step size.
/// Determines whether to we use line search for a step size.
Copy link
Contributor

@zeahmed zeahmed Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to we use [](start = 31, length = 9)

to we use -> to use #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


In reply to: 265814516 [](ancestors = 265814516)

@@ -611,11 +660,17 @@ public abstract class BoostedTreeOptions : TreeOptions
[Argument(ArgumentType.LastOccurenceWins, HelpText = "Minimum line search step size", ShortName = "minstep")]
public Double MinimumStepSize;

/// <summary>
/// The type of optimizer algorithm for setting <see cref="OptimizationAlgorithm"/>.
Copy link
Contributor

@zeahmed zeahmed Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of optimizer algorithm for setting . [](start = 12, length = 80)

I think only Types of optimization algorithms. will make more sense here. #Resolved

@codecov
Copy link

codecov bot commented Mar 15, 2019

Codecov Report

Merging #2970 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2970      +/-   ##
==========================================
+ Coverage   72.29%   72.29%   +<.01%     
==========================================
  Files         796      796              
  Lines      142349   142349              
  Branches    16051    16051              
==========================================
+ Hits       102905   102909       +4     
+ Misses      35063    35061       -2     
+ Partials     4381     4379       -2
Flag Coverage Δ
#Debug 72.29% <100%> (ø) ⬆️
#production 68.01% <100%> (ø) ⬆️
#test 88.48% <ø> (ø) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.FastTree/GamClassification.cs 89% <ø> (ø) ⬆️
src/Microsoft.ML.FastTree/FastTreeArguments.cs 85.38% <ø> (ø) ⬆️
src/Microsoft.ML.FastTree/FastTreeRegression.cs 54.5% <ø> (ø) ⬆️
src/Microsoft.ML.FastTree/TreeTrainersCatalog.cs 94.18% <ø> (ø) ⬆️
src/Microsoft.ML.FastTree/FastTreeRanking.cs 48.19% <ø> (ø) ⬆️
src/Microsoft.ML.FastTree/FastTreeTweedie.cs 56.29% <ø> (ø) ⬆️
src/Microsoft.ML.FastTree/GamTrainer.cs 90.38% <ø> (ø) ⬆️
...rc/Microsoft.ML.FastTree/FastTreeClassification.cs 78.19% <ø> (ø) ⬆️
...rc/Microsoft.ML.FastTree/RandomForestRegression.cs 59.9% <ø> (ø) ⬆️
src/Microsoft.ML.FastTree/GamRegression.cs 89.09% <ø> (ø) ⬆️
... and 5 more

Copy link
Contributor

@zeahmed zeahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it LGTM. I have left a few comments. Hope you will address those before merging.

@singlis
Copy link
Member

singlis commented Mar 15, 2019

/// The base class for all unsupervised learner inputs that support a weight column.

is it worth changing all "learner" instances to trainer in this file?
#Resolved


Refers to: src/Microsoft.ML.Data/Training/TrainerInputBase.cs:85 in 0966dca. [](commit_id = 0966dca, deletion_comment = False)

namespace Microsoft.ML.Samples.Dynamic.Trainers.Regression
{
public static class FastTree
{
Copy link
Member

@singlis singlis Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ [](start = 4, length = 1)

So do we need a Binary and Ranking examples for FastTree? #Resolved

Copy link
Author

@shmoradims shmoradims Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the samples will come in my next PR. I added this one as template for discussion. please see my email about in-memory samples.


In reply to: 265818226 [](ancestors = 265818226)

@singlis
Copy link
Member

singlis commented Mar 15, 2019

    /// <param name="featureToInputMap">A map from the feature shape functions (as described by the binUpperBounds and BinEffects)

see crefs for binUpperBounds and binEffects. I know this is internal -- but it would bound to the variable name if they ever change. #Resolved


Refers to: src/Microsoft.ML.FastTree/GamClassification.cs:189 in 0966dca. [](commit_id = 0966dca, deletion_comment = False)

Copy link
Member

@singlis singlis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@shmoradims
Copy link
Author

    /// <param name="featureToInputMap">A map from the feature shape functions (as described by the binUpperBounds and BinEffects)

done


In reply to: 473121003 [](ancestors = 473121003)


Refers to: src/Microsoft.ML.FastTree/GamClassification.cs:189 in 0966dca. [](commit_id = 0966dca, deletion_comment = False)

@shmoradims
Copy link
Author

/// The base class for all unsupervised learner inputs that support a weight column.

done


In reply to: 473118846 [](ancestors = 473118846)


Refers to: src/Microsoft.ML.Data/Training/TrainerInputBase.cs:85 in 0966dca. [](commit_id = 0966dca, deletion_comment = False)

var pipeline = mlContext.BinaryClassification.Trainers.FastTree();

// Train the model.
var model = pipeline.Fit(data);
Copy link
Member

@wschin wschin Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is minimal version without prediction. I personally like to see in-memory prediction, which is what will happen immediately in production.

Why in-memory prediction is important?
(1) User have no idea about the IDataView produced by the train model. If we don't tell them how to extract data into C# data structure, they will have to look for tutorials of IDataVIew, ITransformer, IDataView-C# bridge.
(2) Prediction format varies from different models and are ML.NET-specific, so it's also hard to figure out which one should be used.
(3) Prediction is how the trained model will be used. One might think scikit-learn doesn't do so, so we shouldn't. My suggestion is we should! Here is my reason ---- scikit-learn produces numpy data structures and everyone know how to manipulate them (by Googling for Numpy), but IDataView is not at that stage yet.

}
}

private class DataPoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good! An example with 50 features!

@shmoradims shmoradims merged commit b6f94bc into dotnet:master Mar 15, 2019
@shauheen shauheen added this to the 0319 milestone Mar 15, 2019
@shmoradims shmoradims deleted the tree_docs4 branch March 15, 2019 18:13
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants