-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Documentation for Multiclass SDCA #3433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Seems good just make sure the catalog extension methods are correctly documented such as "Create ..." and "Create with advanced options ..." |
/// | ||
/// | Output Column Name | Column Type | Description| | ||
/// | -- | -- | -- | | ||
/// | `Score` | array of<xref:System.Single> | The scores of all classes.Higher value means higher probability to fall into the associated class. If the i-th element has the lagest value, the predicted label index would be i.Note that i is zero-based index. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor - space between . and Higher #Resolved
/// | ||
/// | Output Column Name | Column Type | Description| | ||
/// | -- | -- | -- | | ||
/// | `Score` | array of<xref:System.Single> | The scores of all classes.Higher value means higher probability to fall into the associated class. If the i-th element has the lagest value, the predicted label index would be i.Note that i is zero-based index. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lagest [](start = 178, length = 6)
largest #Resolved
/// | ||
/// | Output Column Name | Column Type | Description| | ||
/// | -- | -- | -- | | ||
/// | `Score` | array of<xref:System.Single> | The scores of all classes.Higher value means higher probability to fall into the associated class. If the i-th element has the lagest value, the predicted label index would be i.Note that i is zero-based index. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another space after . #Resolved
/// The optimization algorithm is an extension of (http://jmlr.org/papers/volume14/shalev-shwartz13a/shalev-shwartz13a.pdf) following a similar path proposed in an earlier [paper](https://www.csie.ntu.edu.tw/~cjlin/papers/maxent_dual.pdf). | ||
/// It is usually much faster than [L-BFGS](https://en.wikipedia.org/wiki/Limited-memory_BFGS) and [truncated Newton methods](https://en.wikipedia.org/wiki/Truncated_Newton_method) for large-scale and sparse data set. | ||
/// | ||
/// Regularization is a method that can render an ill-posed problem more tractable by imposing constraints that provide information to supplement the data and that prevents overfitting by penalizing model's magnitude usually measured by some norm functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
norm [](start = 246, length = 4)
is it worth expanding norm functions to normalization functions? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nop. They are independent things. Norm is a name like sin/cos.
In reply to: 277115309 [](ancestors = 277115309)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are my general recommendations after reviewing this PR and speaking with @wschin:
- Describe any general properties of the algorithm in the base class.
- Add documentation for each derived class (in the code comments of the derived class) that specifies:
- which output columns are produced (we should already have this)
- the interpretation of the output columns i.e. for SdcaMulticlass
- SdcaMaximumEntropy
- score column is a genuine probabililty
- SdcaNonCalibrated
- score column is a raw value (the highest value of which indicates the class)
- SdcaMaximumEntropy
/// | Required NuGet in addition to Microsoft.ML | None | | ||
/// | ||
/// ### Scoring Function | ||
/// This model trains linear model to solve multiclass classification problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trains a linear model #Resolved
/// It assigns the $c$-th class a coefficient vector $\boldsymbol{w}_c \in {\mathbb R}^n$ and a bias $b_c \in {\mathbb R}$, for $c=1,\dots,m$. | ||
/// Given a feature vector $\boldsymbol{x} \in {\mathbb R}^n$, the $c$-th class's score would be $\hat{y}^c = \boldsymbol{w}_c^T \boldsymbol{x} + b_c$. | ||
/// | ||
/// If and only if the trained model is maximum entropy classifier, user can interpret the output score vector as the predicted class probabilities because [softmax function](https://en.wikipedia.org/wiki/Softmax_function) may be applied to post-process all classes' scores. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is a maximum entropy classifier #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
user --> you #Resolved
/// It assigns the $c$-th class a coefficient vector $\boldsymbol{w}_c \in {\mathbb R}^n$ and a bias $b_c \in {\mathbb R}$, for $c=1,\dots,m$. | ||
/// Given a feature vector $\boldsymbol{x} \in {\mathbb R}^n$, the $c$-th class's score would be $\hat{y}^c = \boldsymbol{w}_c^T \boldsymbol{x} + b_c$. | ||
/// | ||
/// If and only if the trained model is maximum entropy classifier, user can interpret the output score vector as the predicted class probabilities because [softmax function](https://en.wikipedia.org/wiki/Softmax_function) may be applied to post-process all classes' scores. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you interpret the score otherwise? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add
/// If $\boldsymbol{x}$ belongs to class $c$, then $\hat{y}^c$ should be much larger than 0.
/// In contrast, a $\hat{y}^c$ much smaller than 0 means the desired label should not be $c$.
when explaining the scoring function.
In reply to: 277018452 [](ancestors = 277018452)
/// Regularization works by adding the penalty on the magnitude of $\boldsymbol{w}_c$, $c=1,\dots,m$ to the error of the hypothesis. | ||
/// An accurate model with extreme coefficient values would be penalized more, but a less accurate model with more conservative values would be penalized less. | ||
/// | ||
/// This learner supports [elastic net regularization](https://en.wikipedia.org/wiki/Elastic_net_regularization): a linear combination of L1-norm (LASSO), $|| \boldsymbol{w}_c ||_1$, and L2-norm (ridge), $|| \boldsymbol{w}_c ||_2^2$ regularizations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This learner --> this trainer or algorithm #Resolved
/// | Required NuGet in addition to Microsoft.ML | None | | ||
/// | ||
/// ### Scoring Function | ||
/// This model trains linear model to solve multiclass classification problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trains a linear model #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// An accurate model with extreme coefficient values would be penalized more, but a less accurate model with more conservative values would be penalized less. | ||
/// | ||
/// This learner supports [elastic net regularization](https://en.wikipedia.org/wiki/Elastic_net_regularization): a linear combination of L1-norm (LASSO), $|| \boldsymbol{w}_c ||_1$, and L2-norm (ridge), $|| \boldsymbol{w}_c ||_2^2$ regularizations. | ||
/// L1-nrom and L2-norm regularizations have different effects and uses that are complementary in certain respects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nrom -> norm #Resolved
Codecov Report
@@ Coverage Diff @@
## master #3433 +/- ##
=========================================
Coverage ? 72.76%
=========================================
Files ? 808
Lines ? 145452
Branches ? 16244
=========================================
Hits ? 105838
Misses ? 35192
Partials ? 4422
|
Toward #2522.