Skip to content

Documentation for Multiclass SDCA #3433

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 20, 2019
Merged

Documentation for Multiclass SDCA #3433

merged 7 commits into from
Apr 20, 2019

Conversation

wschin
Copy link
Member

@wschin wschin commented Apr 19, 2019

Toward #2522.

@wschin wschin added the documentation Related to documentation of ML.NET label Apr 19, 2019
@wschin wschin self-assigned this Apr 19, 2019
@wschin wschin requested a review from natke April 19, 2019 15:23
@codemzs
Copy link
Member

codemzs commented Apr 19, 2019

Seems good just make sure the catalog extension methods are correctly documented such as "Create ..." and "Create with advanced options ..."

///
/// | Output Column Name | Column Type | Description|
/// | -- | -- | -- |
/// | `Score` | array of<xref:System.Single> | The scores of all classes.Higher value means higher probability to fall into the associated class. If the i-th element has the lagest value, the predicted label index would be i.Note that i is zero-based index. |
Copy link
Member

@singlis singlis Apr 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor - space between . and Higher #Resolved

///
/// | Output Column Name | Column Type | Description|
/// | -- | -- | -- |
/// | `Score` | array of<xref:System.Single> | The scores of all classes.Higher value means higher probability to fall into the associated class. If the i-th element has the lagest value, the predicted label index would be i.Note that i is zero-based index. |
Copy link
Member

@singlis singlis Apr 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lagest [](start = 178, length = 6)

largest #Resolved

///
/// | Output Column Name | Column Type | Description|
/// | -- | -- | -- |
/// | `Score` | array of<xref:System.Single> | The scores of all classes.Higher value means higher probability to fall into the associated class. If the i-th element has the lagest value, the predicted label index would be i.Note that i is zero-based index. |
Copy link
Member

@singlis singlis Apr 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another space after . #Resolved

/// The optimization algorithm is an extension of (http://jmlr.org/papers/volume14/shalev-shwartz13a/shalev-shwartz13a.pdf) following a similar path proposed in an earlier [paper](https://www.csie.ntu.edu.tw/~cjlin/papers/maxent_dual.pdf).
/// It is usually much faster than [L-BFGS](https://en.wikipedia.org/wiki/Limited-memory_BFGS) and [truncated Newton methods](https://en.wikipedia.org/wiki/Truncated_Newton_method) for large-scale and sparse data set.
///
/// Regularization is a method that can render an ill-posed problem more tractable by imposing constraints that provide information to supplement the data and that prevents overfitting by penalizing model's magnitude usually measured by some norm functions.
Copy link
Member

@singlis singlis Apr 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

norm [](start = 246, length = 4)

is it worth expanding norm functions to normalization functions? #Resolved

Copy link
Member Author

@wschin wschin Apr 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nop. They are independent things. Norm is a name like sin/cos.


In reply to: 277115309 [](ancestors = 277115309)

Copy link
Member

@singlis singlis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@natke natke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are my general recommendations after reviewing this PR and speaking with @wschin:

  • Describe any general properties of the algorithm in the base class.
  • Add documentation for each derived class (in the code comments of the derived class) that specifies:
    • which output columns are produced (we should already have this)
    • the interpretation of the output columns i.e. for SdcaMulticlass
      • SdcaMaximumEntropy
        • score column is a genuine probabililty
      • SdcaNonCalibrated
        • score column is a raw value (the highest value of which indicates the class)

/// | Required NuGet in addition to Microsoft.ML | None |
///
/// ### Scoring Function
/// This model trains linear model to solve multiclass classification problems.
Copy link
Contributor

@natke natke Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trains a linear model #Resolved

/// It assigns the $c$-th class a coefficient vector $\boldsymbol{w}_c \in {\mathbb R}^n$ and a bias $b_c \in {\mathbb R}$, for $c=1,\dots,m$.
/// Given a feature vector $\boldsymbol{x} \in {\mathbb R}^n$, the $c$-th class's score would be $\hat{y}^c = \boldsymbol{w}_c^T \boldsymbol{x} + b_c$.
///
/// If and only if the trained model is maximum entropy classifier, user can interpret the output score vector as the predicted class probabilities because [softmax function](https://en.wikipedia.org/wiki/Softmax_function) may be applied to post-process all classes' scores.
Copy link
Contributor

@natke natke Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is a maximum entropy classifier #Resolved

Copy link
Contributor

@natke natke Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user --> you #Resolved

/// It assigns the $c$-th class a coefficient vector $\boldsymbol{w}_c \in {\mathbb R}^n$ and a bias $b_c \in {\mathbb R}$, for $c=1,\dots,m$.
/// Given a feature vector $\boldsymbol{x} \in {\mathbb R}^n$, the $c$-th class's score would be $\hat{y}^c = \boldsymbol{w}_c^T \boldsymbol{x} + b_c$.
///
/// If and only if the trained model is maximum entropy classifier, user can interpret the output score vector as the predicted class probabilities because [softmax function](https://en.wikipedia.org/wiki/Softmax_function) may be applied to post-process all classes' scores.
Copy link
Contributor

@natke natke Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you interpret the score otherwise? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add

    /// If $\boldsymbol{x}$ belongs to class $c$, then $\hat{y}^c$ should be much larger than 0.
    /// In contrast, a $\hat{y}^c$ much smaller than 0 means the desired label should not be $c$.

when explaining the scoring function.


In reply to: 277018452 [](ancestors = 277018452)

/// Regularization works by adding the penalty on the magnitude of $\boldsymbol{w}_c$, $c=1,\dots,m$ to the error of the hypothesis.
/// An accurate model with extreme coefficient values would be penalized more, but a less accurate model with more conservative values would be penalized less.
///
/// This learner supports [elastic net regularization](https://en.wikipedia.org/wiki/Elastic_net_regularization): a linear combination of L1-norm (LASSO), $|| \boldsymbol{w}_c ||_1$, and L2-norm (ridge), $|| \boldsymbol{w}_c ||_2^2$ regularizations.
Copy link
Contributor

@natke natke Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This learner --> this trainer or algorithm #Resolved

/// | Required NuGet in addition to Microsoft.ML | None |
///
/// ### Scoring Function
/// This model trains linear model to solve multiclass classification problems.
Copy link
Contributor

@natke natke Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trains a linear model #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Will fix another related API below.


In reply to: 277019106 [](ancestors = 277019106)

/// An accurate model with extreme coefficient values would be penalized more, but a less accurate model with more conservative values would be penalized less.
///
/// This learner supports [elastic net regularization](https://en.wikipedia.org/wiki/Elastic_net_regularization): a linear combination of L1-norm (LASSO), $|| \boldsymbol{w}_c ||_1$, and L2-norm (ridge), $|| \boldsymbol{w}_c ||_2^2$ regularizations.
/// L1-nrom and L2-norm regularizations have different effects and uses that are complementary in certain respects.
Copy link
Contributor

@natke natke Apr 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nrom -> norm #Resolved

@wschin wschin merged commit 27b6cf3 into dotnet:master Apr 20, 2019
@wschin wschin deleted the mcsdca-doc branch April 20, 2019 07:24
@codecov
Copy link

codecov bot commented Apr 20, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@082ab77). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master    #3433   +/-   ##
=========================================
  Coverage          ?   72.76%           
=========================================
  Files             ?      808           
  Lines             ?   145452           
  Branches          ?    16244           
=========================================
  Hits              ?   105838           
  Misses            ?    35192           
  Partials          ?     4422
Flag Coverage Δ
#Debug 72.76% <ø> (?)
#production 68.27% <ø> (?)
#test 89.04% <ø> (?)
Impacted Files Coverage Δ
...oft.ML.StandardTrainers/StandardTrainersCatalog.cs 92.34% <ø> (ø)
...oft.ML.StandardTrainers/Standard/SdcaMulticlass.cs 91.12% <ø> (ø)

@ghost ghost locked as resolved and limited conversation to collaborators Mar 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants