-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Xml docs for trainers and a minor infrastructure changes #455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ted Summary class XML. Adding documentation details and references for another batch of the trainers.
This is a first commit, to allow starting to review the changes. |
…is no official documentation on using href with those tags.
internal const string Summary = "K-means is a popular clustering algorithm. With K-means, the data is clustered into a specified " | ||
+ "number of clusters in order to minimize the within-cluster sum of squares. K-means++ improves upon K-means by using a better " | ||
+ "method for choosing the initial cluster centers."; | ||
internal const string Summary = "KMeans++ clustering algorithm"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal const string Summary = "KMeans++ clustering algorithm"; [](start = 8, length = 64)
This summary isn't quite as useful. I understand that we're adding more descriptive text on remarks for the API, but this is a temporary solution as noted elsewhere, so why are we degrading the text that we do have? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other summaries are just one liner, like: ''Averaged Perceptron Binary Classifier." or "Train a field-aware factorization machine for binary classification".
This felt like the outlier. Restoring it.
In reply to: 199548768 [](ancestors = 199548768)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I don't know. I think people don't write good summaries because they're lazy, I wouldn't want to normalize that laziness when someone actually did a good job writing a summary. :) #Resolved
@@ -11,7 +11,7 @@ | |||
using Microsoft.ML.Runtime.LightGBM; | |||
using Microsoft.ML.Runtime.Model; | |||
|
|||
[assembly: LoadableClass(LightGbmBinaryTrainer.Summary, typeof(LightGbmBinaryTrainer), typeof(LightGbmArguments), | |||
[assembly: LoadableClass(LightGbmBinaryTrainer.UserName, typeof(LightGbmBinaryTrainer), typeof(LightGbmArguments), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UserName [](start = 47, length = 8)
The first thing I believe is the summary, or should be. If it was badly written, that's fine, we ought to improve it, but to use the user-name as the first argument is wrong.
Also the thing that is actually being filled in as the user name is a string literal (that somehow differs from the field declared as UserName
. #Resolved
Elastic net regularization can be specified by the l2_weight and l1_weight parameters. Note that the l2_weight has an effect on the rate of convergence. | ||
In general, the larger the l2_weight, the faster SDCA converges."; | ||
The results depends on the order of the training data. For reproducible results, it is recommended that one sets <paramref>shuffle</paramref> to | ||
False and <paramref>NumThreads</paramref> to 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NumThreads [](start = 10, length = 31)
Is the correct form not <paramref href=
?
Also this seems like kind of a strange solution... #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -82,6 +82,31 @@ public abstract class FastTreeTrainerBase<TArgs, TPredictor> : | |||
|
|||
protected string InnerArgs => CmdParser.GetSettings(Host, Args, new TArgs()); | |||
|
|||
internal const string Remarks = @"<remarks> | |||
<para>FastTrees is an implementation of FastRank. FastRank is an efficient implementation of the <a href='https://arxiv.org/abs/1505.01866'>MART</a> gradient boosting algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"FastTrees is" -> "FastTree is"... Also they will have no idea what FastRank is. Bringing in FastRank just complicates the picture a bit. The long history of Microsoft's internal tools, while possibly interesting to some people, does not belong in the first few sentences of user document. #Resolved
In general, the larger the l2_weight, the faster SDCA converges."; | ||
The results depends on the order of the training data. For reproducible results, it is recommended that one sets <paramref>shuffle</paramref> to | ||
False and <paramref>NumThreads</paramref> to 1. | ||
Elastic net regularization can be specified by the l2_weight and l1_weight parameters. Note that the <paramref>l2_weight</paramref> has an effect on the rate of convergence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are l2_weight
and l1_weight
parameters? Those do not conform to .NET naming guidelines. Are they really named that? Also same note about <paramref href
, maybe serach for <paramref>
in your code to find and fix. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, spotted the l2_weight and l1_weight and changed them to the actual L2Const and L1Threeshold.
The usage of paramref in the codebase is correct.
In reply to: 199558256 [](ancestors = 199558256)
So, just curious @sfilipi , if I had this task of generating temporary XML comments for these classes, I wouldn't have done it this way. I would have written a separate file in the API itself, with the classes I wanted to document as partial classes (they already are partial to allow for these sorts of shenanigans), with comments directly on the partial classes. We already use the partial trick class for this sort of thing. That will have a few advantages:
What do you think? |
The documentation that i was aiming at improving was in the generated CSharpApi; the classes in Microsoft.ML.Trainers and Microsoft.ML.Transforms namespaces. Writting another class with all the partial classes by hand just for the comments seems like prone to oversight when it comes to adding the documentation. Reading more about the C# docs, there is an tag that we can leverage, we'd keep the XML docs in the docs side, in XML, and just use this tag to wire the path to those files to the CSharp Api. In reply to: 401866820 [](ancestors = 401866820) |
Fixing new line character in the ep_list and manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepted provisionally, that we work on making this better at a later time.
internal const string DetailedSummary = @"Perceptron is a classification algorithm that makes its predictions based on a linear function. | ||
internal const string Summary = "Averaged Perceptron Binary Classifier."; | ||
internal const string Remarks = @"<remarks> | ||
Perceptron is a classification algorithm that makes its predictions based on a linear function. | ||
I.e., for an instance with feature values f0, f1,..., f_D-1, , the prediction is given by the sign of sigma[0,D-1] ( w_i * f_i), where w_0, w_1,...,w_D-1 are the weights computed by the algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e., for an instance [](start = 0, length = 21)
This is very hard to read. No need to fix now though. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* updating the C# api generator to append the Remarks XML to the generated Summary class XML. Adding documentation details and references for another batch of the trainers. * correcting test trigger condition * typo * More documentation * substituting the <see> and <seealso> tags with <a> tags, since there is no official documentation on using href with those tags. * adressing PR comments. Fixing new line character in the ep_list and manifest. * Fixing the list and code generation merges from master
Addresses: #388
This PR adds an EntryPointInfo attribute that will contain the XML documentation to append to the summary: "Description", "References", "See also" sections.
It also modifies the C# generation code to append the content of this new attribute to the summary.