-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Updated xml docs for tree-based trainers. #2970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/Microsoft.ML.FastTree/doc.xml
Outdated
@@ -24,7 +24,7 @@ | |||
The output of the ensemble produced by MART on a given instance is the sum of the tree outputs. | |||
</para> | |||
<list type='bullet'> | |||
<item><description>In case of a binary classification problem, the output is converted to a probability by using some form of calibration.</description></item> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case [](start = 32, length = 4)
is the deletion of word case
by mistake? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/Microsoft.ML.FastTree/doc.xml
Outdated
This learner is a generalization of Poisson, compound Poisson, and gamma regression. | ||
</summary> | ||
<!-- | ||
The following text describes the FastForest algorithm details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FastForest [](start = 37, length = 10)
GAMs??? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -594,7 +643,7 @@ public abstract class BoostedTreeOptions : TreeOptions | |||
public bool BestStepRankingRegressionTrees = false; | |||
|
|||
/// <summary> | |||
/// Should we use line search for a step size. | |||
/// Determines whether to we use line search for a step size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to we use [](start = 31, length = 9)
to we use
-> to use
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -611,11 +660,17 @@ public abstract class BoostedTreeOptions : TreeOptions | |||
[Argument(ArgumentType.LastOccurenceWins, HelpText = "Minimum line search step size", ShortName = "minstep")] | |||
public Double MinimumStepSize; | |||
|
|||
/// <summary> | |||
/// The type of optimizer algorithm for setting <see cref="OptimizationAlgorithm"/>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type of optimizer algorithm for setting . [](start = 12, length = 80)
I think only Types of optimization algorithms.
will make more sense here. #Resolved
Codecov Report
@@ Coverage Diff @@
## master #2970 +/- ##
==========================================
+ Coverage 72.29% 72.29% +<.01%
==========================================
Files 796 796
Lines 142349 142349
Branches 16051 16051
==========================================
+ Hits 102905 102909 +4
+ Misses 35063 35061 -2
+ Partials 4381 4379 -2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it LGTM. I have left a few comments. Hope you will address those before merging.
namespace Microsoft.ML.Samples.Dynamic.Trainers.Regression | ||
{ | ||
public static class FastTree | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ [](start = 4, length = 1)
So do we need a Binary and Ranking examples for FastTree? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all the samples will come in my next PR. I added this one as template for discussion. please see my email about in-memory samples.
In reply to: 265818226 [](ancestors = 265818226)
see crefs for binUpperBounds and binEffects. I know this is internal -- but it would bound to the variable name if they ever change. #Resolved Refers to: src/Microsoft.ML.FastTree/GamClassification.cs:189 in 0966dca. [](commit_id = 0966dca, deletion_comment = False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done In reply to: 473121003 [](ancestors = 473121003) Refers to: src/Microsoft.ML.FastTree/GamClassification.cs:189 in 0966dca. [](commit_id = 0966dca, deletion_comment = False) |
var pipeline = mlContext.BinaryClassification.Trainers.FastTree(); | ||
|
||
// Train the model. | ||
var model = pipeline.Fit(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is minimal version without prediction. I personally like to see in-memory prediction, which is what will happen immediately in production.
Why in-memory prediction is important?
(1) User have no idea about the IDataView produced by the train model. If we don't tell them how to extract data into C# data structure, they will have to look for tutorials of IDataVIew, ITransformer, IDataView-C# bridge.
(2) Prediction format varies from different models and are ML.NET-specific, so it's also hard to figure out which one should be used.
(3) Prediction is how the trained model will be used. One might think scikit-learn doesn't do so, so we shouldn't. My suggestion is we should! Here is my reason ---- scikit-learn produces numpy data structures and everyone know how to manipulate them (by Googling for Numpy), but IDataView is not at that stage yet.
} | ||
} | ||
|
||
private class DataPoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really good! An example with 50 features!
Updated XML documentation for tree-based trainers (FastTree, FastForest, GAM, etc). Related to #2522.
Samples to come in a separate PR.