-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[Part 3] Added convenience constructors for set of transforms. #520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -79,6 +79,18 @@ public sealed class Arguments : TransformInputBase | |||
|
|||
private readonly SchemaImpl _schemaImpl; | |||
|
|||
/// <summary> | |||
/// Convenience constructor for public facing API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convenience constructor for public facing API. [](start = 12, length = 46)
We might want to improve this description, since from the point of view of the user it doesn't really help them much to know that they're using the public facing API really. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... I see this was mostly done via copy-paste, I'm afraid that an actual description will be necessary. Look on the bright side, I think that an understanding of how it could be used can only help you imagine the best way in which the convenience should be structured.
In reply to: 201910848 [](ancestors = 201910848)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably obvious, but just in case not clear this is intended as a general comment.
In reply to: 201912669 [](ancestors = 201912669,201910848)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking about adding more description to the constructors and wanted to open a separate issue to include clear description to each of the convenience constructors. What do you say?
In reply to: 201912748 [](ancestors = 201912748,201912669,201910848)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Convenience constructor for public facing API. | ||
/// </summary> | ||
/// <param name="env">Host Environment.</param> | ||
/// <param name="input">Input <see cref="IDataView"/>. This is the output from previous transform or loader.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Input . This is the output from previous transform or loader [](start = 32, length = 83)
This on the other hand might be over-description. Certainly this is not the best place for people to learn how to instantiate IDataView
s if they don't already know how to do so. in It's also not quite correct, since there are certainly other data views other than loaders and transforms. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Descriptions of these might be good. I see that the summary comment on the class itself already has some documentation. The relevant section where it explains the Refers to: src/Microsoft.ML.Transforms/UngroupTransform.cs:69 in d9c4a29. [](commit_id = d9c4a29, deletion_comment = False) |
/// <param name="input">Input <see cref="IDataView"/>. This is the output from previous transform or loader.</param> | ||
/// <param name="name">Name of the output column.</param> | ||
/// <param name="source">Name of the column to be transformed. If this is null '<paramref name="name"/>' will be used.</param> | ||
/// <param name="newDim">The number of random Fourier features to create.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// The number of random Fourier features to create. [](start = 8, length = 81)
This ought to be required. #Closed
/// <param name="name">Name of the output column.</param> | ||
/// <param name="source">Name of the column to be transformed. If this is null '<paramref name="name"/>' will be used.</param> | ||
/// <param name="newDim">The number of random Fourier features to create.</param> | ||
/// <param name="useSin">create two features for every random Fourier frequency? (one for cos and one for sin).</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create two features for every random Fourier frequency? [](start = 33, length = 55)
So, if newDim
was 10, and I set this to true
, does the output become 20 or something?
This guy is a little complex. I wonder if we can get away with not including it in the convenience? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it creates 2 * newDim features when useSin=true. Yes, we can remove useSin
parameter as it's default value is also false
.
In reply to: 201911445 [](ancestors = 201911445)
My thinking is that transforms that are hidden probably don't need convenience constructors. This thing in particular is mostly something I have for debugging. #Closed Refers to: src/Microsoft.ML.Transforms/ProduceIdTransform.cs:12 in d9c4a29. [](commit_id = d9c4a29, deletion_comment = False) |
} | ||
|
||
internal static string RegistrationName = "MutualInformationFeatureSelectionTransform"; | ||
|
||
/// <summary> | ||
/// A helper method to create <see cref="MutualInformationFeatureSelectionTransform"/> for public facing API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MutualInformationFeatureSelectionTransform [](start = 49, length = 42)
You can't be creating it, it's a static class. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comments as before, when documenting we ought to describe what it is actually useful for, not just describing its return values (which are part of the method signature, and we aren't even doing that quite right).
In reply to: 201912038 [](ancestors = 201912038)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, this is hidden, and mostly deprecated in favor of the Refers to: src/Microsoft.ML.Transforms/MissingValueIndicatorTransform.cs:16 in d9c4a29. [](commit_id = d9c4a29, deletion_comment = False) |
/// <param name="input">Input <see cref="IDataView"/>. This is the output from previous transform or loader.</param> | ||
/// <param name="groupKey">Columns to group by</param> | ||
/// <param name="columns">Columns to group together</param> | ||
public GroupTransform(IHostEnvironment env, IDataView input, string[] groupKey, params string[] columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string[] groupKey [](start = 69, length = 17)
Considering the cases I'm aware of where it is used, I have to think that a single string would be preferable for the convenience constructor. #Closed
After a long discussion yesterday on a separate thread I got to know the purpose of this transform.
I am not sure why its not sealed. Looking at all other transforms, I assume it should be sealed as well. I am not changing it right now unless you see it feasible to do so. In reply to: 404392058 [](ancestors = 404392058) Refers to: src/Microsoft.ML.Transforms/OptionalColumnTransform.cs:29 in d9c4a29. [](commit_id = d9c4a29, deletion_comment = False) |
private static class Defaults | ||
{ | ||
public const string Column = "Id"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably don't need this Defaults
class any longer. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed we might as well revert the whole file probably.
In reply to: 202426903 [](ancestors = 202426903)
@@ -34,12 +34,30 @@ namespace Microsoft.ML.Runtime.Data | |||
/// </summary> | |||
public static class NAHandleTransform | |||
{ | |||
/// <summary> | |||
/// | |||
/// </summary> | |||
public enum ReplacementKind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty? #Closed
May as well. And since you've gone to the effort of digging up what it's actually useful for maybe just paste that into an XML comment. In reply to: 404611599 [](ancestors = 404611599,404392058) Refers to: src/Microsoft.ML.Transforms/OptionalColumnTransform.cs:29 in d9c4a29. [](commit_id = d9c4a29, deletion_comment = False) |
public enum UngroupMode | ||
{ | ||
/// <summary> | ||
/// A number of output rows are equal to the minimum length of pivot columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these "A number" would read better as "The number". #Closed
Getting really close thanks @zeahmed! #Closed |
/// <summary> | ||
/// This transform is used to mark some of the columns (e.g. Label) optional during training So that the columns is not required during scoring.. | ||
/// At scoring time, it is checked if the data schema for scoring matches the data used for training except for the optional columns. | ||
/// </summary> | ||
public class OptionalColumnTransform : RowToRowMapperTransformBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public class [](start = 4, length = 12)
Sealed? :) #Closed
Outer, | ||
|
||
/// <summary> | ||
/// The number of output rows are equal to the length of the first pivot column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are [](start = 42, length = 3)
is equal #Closed
Inner, | ||
|
||
/// <summary> | ||
/// The number of output rows are equal to the maximum length of pivot columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are [](start = 42, length = 3)
is equal #Closed
public enum UngroupMode | ||
{ | ||
/// <summary> | ||
/// The number of output rows are equal to the minimum length of pivot columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are [](start = 42, length = 3)
is equal #Closed
@@ -26,6 +26,10 @@ | |||
|
|||
namespace Microsoft.ML.Runtime.DataPipe | |||
{ | |||
/// <summary> | |||
/// This transform is used to mark some of the columns (e.g. Label) optional during training So that the columns is not required during scoring.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So [](start = 97, length = 2)
lower case "so" since this is in the middle of a sentence? #Closed
@@ -26,6 +26,10 @@ | |||
|
|||
namespace Microsoft.ML.Runtime.DataPipe | |||
{ | |||
/// <summary> | |||
/// This transform is used to mark some of the columns (e.g. Label) optional during training So that the columns is not required during scoring.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
columns is [](start = 109, length = 10)
"columns is" => "columns are" #Closed
@@ -26,6 +26,10 @@ | |||
|
|||
namespace Microsoft.ML.Runtime.DataPipe | |||
{ | |||
/// <summary> | |||
/// This transform is used to mark some of the columns (e.g. Label) optional during training So that the columns is not required during scoring.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. [](start = 147, length = 2)
One period instead of two. #Closed
@@ -26,6 +26,10 @@ | |||
|
|||
namespace Microsoft.ML.Runtime.DataPipe | |||
{ | |||
/// <summary> | |||
/// This transform is used to mark some of the columns (e.g. Label) optional during training So that the columns is not required during scoring.. | |||
/// At scoring time, it is checked if the data schema for scoring matches the data used for training except for the optional columns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At scoring time, it is checked if the data schema for scoring matches the data used for training except for the optional columns. [](start = 8, length = 129)
I'm not sure I actually understand this. What it really does is, if the column is not there, it will make it appear as if it is there with metadata filled in, and if the column is requested will return entirely default values. Or so I understand the code in the cursor. Is there some way to describe it in a way that communicates that? I'm not sure this sentence here actually communicates that. #Closed
public class OptionalColumnTransform : RowToRowMapperTransformBase | ||
/// <summary> | ||
/// This transform is used to mark some of the columns (e.g. Label) optional during training so that the columns are not required during scoring. | ||
/// When applied to new data, if optional columns are not present a meta column is created having the same properties (e.g. 'name', 'type' etc.) as used during training. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meta column [](start = 72, length = 11)
The phrase meta-column is a bit odd, I'm not sure that's a good description. Maybe dummy column? Mock column? #Closed
/// - scalar for scalar column | ||
/// - totally sparse vector for vector column. | ||
/// If value of the column is requested the default value will be returned. | ||
/// </summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This last line seems redundant. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @TomFinley and @codemzs! |
This PR fixes #518. The convenience constructors were added for following transforms.