Skip to content

Added convenience constructors for set of transforms. #491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 11, 2018

Conversation

zeahmed
Copy link
Contributor

@zeahmed zeahmed commented Jul 4, 2018

This PR fixes #487.

Convenience constructors are added for the following transforms.

  • ChooseColumnsTransform.cs
  • ConvertTransform.cs
  • DropSlotsTransform.cs
  • GenerateNumberTransform.cs
  • HashTransform.cs
  • KeyToValueTransform.cs
  • KeyToVectorTransform.cs
  • LabelConvertTransform.cs
  • LabelIndicatorTransform.cs
  • RangeFilter.cs
  • ShuffleTransform.cs
  • SkipTakeFilter.cs
  • TermTransform.cs

string name,
string source = null,
DataKind? resultType = null,
KeyRange keyRange = null)
Copy link
Contributor

@TomFinley TomFinley Jul 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KeyRange keyRange = null [](start = 12, length = 24)

I don't think conversions among key ranges are common enough that they are worthy of a convenience constructor. #Closed

IDataView input,
string name,
string source = null,
DataKind? resultType = null,
Copy link
Contributor

@TomFinley TomFinley Jul 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataKind? [](start = 12, length = 9)

Hi @zeahmed thanks for doing this. Quick question, why is this nullable? What happens if it is null? #Closed

Copy link
Contributor

@TomFinley TomFinley Jul 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is interesting. Unlike some other transforms, this transform is one of those where we kind of do have a required argument. Hmmm. We need to think carefully about how we want to phrase the constructor here. We may need to refine my prior idea from #380 of having the signature be string name, string source = null. That may have been an error. #Closed

Copy link
Contributor

@TomFinley TomFinley Jul 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe not. Maybe just putting them before name is fine, maybe. But that's kind of weird too, since then you have the input data view, then required parameters, then names, then parameters with defaults. But maybe. #Closed

Copy link
Contributor Author

@zeahmed zeahmed Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding resultType being null. I see following happening in the code. The decision is being made based on resultType, KeyRange and Range parameters. At the end, if everything is null DataKind.Num is the value.

So, I would suggest setting resultType = DataKind.Num or make it required.
#Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make it required myself... I sort of view API usage as similar to programmatic usage, and if we view this "convert" as similar to a cast or something, the idea that there is a default type to cast to just seems silly. Even if maybe we want to allow it in command-line/GUI world.


In reply to: 200468398 [](ancestors = 200468398)

/// <param name="input">Input <see cref="IDataView"/>. This is the output from previous transform or loader.</param>
/// <param name="name">Name of the output column.</param>
/// <param name="source">Name of the selected column. If this is null '<paramref name="name"/>' will be used.</param>
public ChooseColumnsTransform(IHostEnvironment env, IDataView input, string name, string source = null)
Copy link
Contributor

@TomFinley TomFinley Jul 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose column transform is one of those transforms that is pretty much completely useless unless it supports multiple columns. It's not like other transforms where you can just apply the transform again to get more columns. If something isn't in the choose list, it is simply dropped -- no second chances. We need to support multiple column specification in the constructor.

However, choose columns generally as far as I know could just do with a params string[] name or something, and be done with it. Generally it isn't used to rename columns. (That is, I see xf=choose{col=Foo,Bar,Biz}, I don't see much xf=Choose{col=Foo:Bar} or something like this.) #Closed

Copy link
Contributor

@TomFinley TomFinley Jul 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually wait. What is the relationship between "choose columns transform" and "keep columns transform"? I think there is a subtle difference, but it escapes me at the moment. #Closed

Copy link
Contributor Author

@zeahmed zeahmed Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, renaming of chosen column is the difference between ChooseColumnsTransform and KeepColumnsTransform. KeepColumnsTransform does not support renaming and drops the rest of the columns. ChooseColumnsTransform does renaming as well as has other parameters like Hidden which has following options.

public enum HiddenColumnOption : byte
{
    Drop = 1,
    Keep = 2,
    Rename = 3
}

What I suggest that for convenience this transform operate in similar way as KeepColumnsTransform.

 public ChooseColumnsTransform(IHostEnvironment env, IDataView input, param string[] columnNames)
            : this(env, new Arguments(columnNames) {  }, input)
{
}

For other usage, user can invoke constructor with Arguments object. What do you suggest? #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably just keep the non-renaming case convenient, and nuke the constructor on arguments.


In reply to: 200471726 [](ancestors = 200471726)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that current changes are fine? If not can you please explain bit more?


In reply to: 200727095 [](ancestors = 200727095,200471726)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are good, thank you @zeahmed.


In reply to: 200743746 [](ancestors = 200743746,200727095,200471726)


}

public Arguments(string name, string source)
Copy link
Contributor

@TomFinley TomFinley Jul 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see that in some circumstances, maybe having a specialized constructor on arguments could be useful, but why is it helpful here? #Closed

IDataView input,
string name,
string source = null,
int classIndex = Defaults.ClassIndex)
Copy link
Contributor

@TomFinley TomFinley Jul 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This classIndex ought to have been a required parameter. #Closed

Copy link
Contributor Author

@zeahmed zeahmed Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that zero is being used as default value. Do you suggest making it required? #Resolved

Copy link
Contributor

@TomFinley TomFinley Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. So, in these so-called Arguments classes things have default values just because it's easier to have them have defaults, than to make them required, just by the nature of how the structure is used in our existing tooling -- that is, there was a cultural pressure to make the default construction with no arguments work, actually primarily due to the GUI because people thought it was odd that you could add a constructor via the GUI and have it immediately yield an error. That is not a concern here in this case, considering what this is used for, and making it default makes absolutely no sense. This really is something a user should be explicit about.

I do not suggest changing arguments (though you could if you want and I think it would be a great improvement, I think it's really silly that it doesn't require usage to be explicit) but I would certainly require it in the helper constructor. There's nothing fundamentally privileged about the first value... #Closed

/// <param name="input">>Input <see cref="IDataView"/>. This is the output from previous transform or loader.</param>
/// <param name="count">Number of rows to skip</param>
/// <returns></returns>
public static SkipTakeFilter CreateSkipFilter(IHostEnvironment env, IDataView input, long count = Arguments.DefaultSkip)
Copy link
Contributor

@TomFinley TomFinley Jul 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another case where having static classes like SkipFilter and TakeFilter both with Create methods might make for a more sensible API. So instead of usage being

SkipTakeFilter.CreateSkipFilter
SkipTakeFilter.CreateTakeFilter

it might instead look like

SkipFilter.Create
TakeFilter.Create

#Closed

Copy link
Contributor

@TomFinley TomFinley Jul 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning is similar to that expressed in this comment on #405. #Closed

/// <param name="input">Input <see cref="IDataView"/>. This is the output from previous transform or loader.</param>
/// <param name="name">Name of the output column.</param>
/// <param name="source">Name of the input column. If this is null '<paramref name="name"/>' will be used.</param>
public DropSlotsTransform(IHostEnvironment env, IDataView input, string name, string source = null)
Copy link
Contributor

@TomFinley TomFinley Jul 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is a transform that works well for conveniences. Certainly this specific constructor, which really does nothing, is not something we need right now. I'd rather not do drop slots for now. #Closed


}

public Arguments(string name, string source)
Copy link
Contributor

@TomFinley TomFinley Jul 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. What's this constructor for? #Closed

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zeahmed thanks for doing this.

/// <param name="input">Input <see cref="IDataView"/>. This is the output from previous transform or loader.</param>
/// <param name="name">Name of the output column.</param>
/// <param name="source">Name of the selected column. If this is null '<paramref name="name"/>' will be used.</param>
public ChooseColumnsTransform(IHostEnvironment env, IDataView input, string name, string source = null)
Copy link
Contributor

@TomFinley TomFinley Jul 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually wait. What is the relationship between "choose columns transform" and "keep columns transform"? I think there is a subtle difference, but it escapes me at the moment. #Closed

/// <param name="input">>Input <see cref="IDataView"/>. This is the output from previous transform or loader.</param>
/// <param name="count">Number of rows to skip</param>
/// <returns></returns>
public static SkipTakeFilter CreateSkipFilter(IHostEnvironment env, IDataView input, long count = Arguments.DefaultSkip)
Copy link
Contributor

@TomFinley TomFinley Jul 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning is similar to that expressed in this comment on #405. #Closed

IDataView input,
string name,
string source = null,
int maxNumTerms = Defaults.MaxNumTerms)
Copy link
Contributor

@TomFinley TomFinley Jul 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of feel like the sort order is something we ought to allow to be parameterized. #Closed

public static class SkipFilter
{
/// <summary>
/// A helper method to create <see cref="SkipFilter"/> for public facing API.
Copy link
Contributor

@TomFinley TomFinley Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A helper method to create for public facing API. [](start = 12, length = 73)

You cannot create a SkipFilter since there is no such instantiable type. I might rather describe what it does. #Closed



/// <summary>
/// A helper method to create <see cref="TakeFilter"/> for public facing API.
Copy link
Contributor

@TomFinley TomFinley Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A helper method to create for public facing API. [](start = 12, length = 73)

Similar comment here. #Closed


}

public Arguments(params string[] columns)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 12, length = 6)

should it be private/internal? #Resolved

Copy link
Contributor Author

@zeahmed zeahmed Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be if you don't foresee anyone else interested in using it? #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My own thinking is there's no purpose in it given that the constructor on ChooseColumnsTransform itself does this. internal sounds good to me.


In reply to: 200723744 [](ancestors = 200723744)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a higher level point, we always have a tendency to make things as private as they reasonably can be, both because it makes the code much easier to understand, and also we will eventually have the problem that something "unhidden" can never be hidden again.


In reply to: 201022055 [](ancestors = 201022055,200723744)

/// <param name="input">Input <see cref="IDataView"/>. This is the output from previous transform or loader.</param>
/// <param name="name">Name of the output column.</param>
/// <param name="source">Name of the input column. If this is null '<paramref name="name"/>' will be used.</param>
public KeyToValueTransform(IHostEnvironment env, IDataView input, string name, string source = null)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IHostEnvironment env, IDataView input, string name, string source = null [](start = 35, length = 72)

half of your files formatted in this way, half is one parameter for each line, why? #Closed

Copy link
Contributor Author

@zeahmed zeahmed Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the line is going to be long enough to fit into the view, I format every parameter on separate line otherwise not.

For this particular, its fine to have it on one line. #Closed

@zeahmed
Copy link
Contributor Author

zeahmed commented Jul 9, 2018

@TomFinley and @Ivanidzo4ka, can you please give it a final review and approve it OR let me know of your other feedback? #Closed

{


/// <summary>
Copy link
Contributor

@TomFinley TomFinley Jul 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get rid of this. #Closed

public static class SkipFilter
{
/// <summary>
/// A helper method to create'SkipFilter' transform by skipping the number of rows defined by the <paramref name="count"/> parameter.
Copy link
Contributor

@TomFinley TomFinley Jul 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SkipFilter [](start = 38, length = 10)

There is no such thing as a SkipFilter transform per se. Let's just clarify that this is a SkipTakeFilter (since it is), and use a proper <see tag of course. Here and below. #Resolved

@TomFinley
Copy link
Contributor

Hi @zeahmed sorry for the delay, I am a sick, sick man. :D


In reply to: 403630508 [](ancestors = 403630508)

@zeahmed
Copy link
Contributor Author

zeahmed commented Jul 10, 2018

Thanks @TomFinley! #Closed

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

{


/// <summary>
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXTRA LINES! #Resolved

string source = null,
int maxNumTerms = Defaults.MaxNumTerms,
SortOrder sort = Defaults.Sort)
: this(env, new Arguments() { Column = new[] { new Column() { Name = name, Source = source ?? name } }, MaxNumTerms = maxNumTerms, Sort = sort }, input)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name = name, Source = source ?? name [](start = 74, length = 37)

just nitpicking, everywhere else you use Source = source?? name, Name = name, and here you have different order! #Resolved

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@zeahmed zeahmed merged commit 268ebbc into dotnet:master Jul 11, 2018
ChiragRupani pushed a commit to ChiragRupani/machinelearning that referenced this pull request Jul 12, 2018
@zeahmed zeahmed deleted the convenience_constructor branch July 17, 2018 17:42
eerhardt pushed a commit to eerhardt/machinelearning that referenced this pull request Jul 27, 2018
@ghost ghost locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Part 2] Create convenience constructor for the listed Transforms.
3 participants