Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tiny] Use string[] instead of IEnumerable<string> in column names #2815

Merged
merged 2 commits into from
Mar 5, 2019

Conversation

wschin
Copy link
Member

@wschin wschin commented Mar 1, 2019

There are three possible ways to denote input column names, string[], IEnumerable<string>, and params. I personally like string[] because

  • IEnumerable<string> is not used elsewhere for the same purpose.
  • params makes signature less typed and it forces input column names to be the last argument group.

Fix #2801, Fix #2460.

@wschin wschin added the API Issues pertaining the friendly API label Mar 1, 2019
@wschin wschin self-assigned this Mar 1, 2019
@wschin wschin requested review from abgoswam, sfilipi and Ivanidzo4ka and removed request for abgoswam March 1, 2019 17:03
@wschin wschin changed the title Use string[] instead of IEnumerable<string> in column names [Tiny] Use string[] instead of IEnumerable<string> in column names Mar 1, 2019
@TomFinley
Copy link
Contributor

TomFinley commented Mar 1, 2019

I understand that we wrote our samples this way, but writing it this way means we could not, for instance, take things that are the results of linq queries. It seems like we're just giving up capabilities for no particular benefit.

Why not just standardize elsewhere on IEnumerable?

I think it is important in some cases that we continue to support params when the input column names are reasonably the last item. The syntax just winds up looking cleaner. But those can be overloads I think. #Resolved

@@ -45,7 +45,7 @@ public static class TextCatalog
/// </example>
public static TextFeaturizingEstimator FeaturizeText(this TransformsCatalog.TextTransforms catalog,
string outputColumnName,
IEnumerable<string> inputColumnNames,
Copy link
Member

@sfilipi sfilipi Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IEnumerable inputColumnNames, [](start = 10, length = 39)

my vote, and the original issues is to have this be the last parameter of the method, and be a params, to optimize for both cases where the users has one or more columns and wants to provied options.

Having the IEumerable, or with the string[] will require building an Enumerable/Array to hold just one element, which is unnecessary since params exists.

Why would the users have to do either:

var customized_pipeline = ml.Transforms.Text.FeaturizeText(customizedColumnName, options, new List<string> { "SentimentText" }, var customized_pipeline = ml.Transforms.Text.FeaturizeText(customizedColumnName, options, new string[] { "SentimentText" },

when they can just go:

var customized_pipeline = ml.Transforms.Text.FeaturizeText(customizedColumnName, options, "SentimentText"

I do think the signature should be:

public static TextFeaturizingEstimator FeaturizeText(this TransformsCatalog.TextTransforms catalog,
string outputColumnName,
TextFeaturizingEstimator.Options options
params string[] inputColumnNames,
)

and we are doing that everywhere else...

#Resolved

Copy link
Member Author

@wschin wschin Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption here is that output column is always single, but why can't it be multiple columns? #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be both, hence params is the cleanest


In reply to: 261709072 [](ancestors = 261709072)

@codecov
Copy link

codecov bot commented Mar 1, 2019

Codecov Report

Merging #2815 into master will increase coverage by <.01%.
The diff coverage is 18.18%.

@@            Coverage Diff             @@
##           master    #2815      +/-   ##
==========================================
+ Coverage   71.67%   71.68%   +<.01%     
==========================================
  Files         809      808       -1     
  Lines      142416   142392      -24     
  Branches    16120    16112       -8     
==========================================
- Hits       102076   102073       -3     
+ Misses      35910    35886      -24     
- Partials     4430     4433       +3
Flag Coverage Δ
#Debug 71.68% <18.18%> (ø) ⬆️
#production 67.92% <ø> (+0.01%) ⬆️
#test 85.87% <18.18%> (ø) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.Transforms/Text/TextCatalog.cs 30.43% <ø> (ø) ⬆️
...s/StochasticDualCoordinateAscentClassifierBench.cs 0% <0%> (ø) ⬆️
...ft.ML.Tests/Scenarios/Api/Estimators/Visibility.cs 100% <100%> (ø) ⬆️
src/Microsoft.ML.Maml/MAML.cs 24.75% <0%> (-1.46%) ⬇️
src/Microsoft.ML.TensorFlow/TensorflowTransform.cs 79.76% <0%> (-0.44%) ⬇️
...StandardLearners/Standard/LinearModelParameters.cs 60.63% <0%> (-0.27%) ⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 85.53% <0%> (-0.17%) ⬇️
...cenariosWithDirectInstantiation/TensorflowTests.cs 91.67% <0%> (-0.09%) ⬇️
...est/Microsoft.ML.StaticPipelineTesting/Training.cs 99.25% <0%> (-0.01%) ⬇️
.../Scenarios/Api/Estimators/IntrospectiveTraining.cs 100% <0%> (ø) ⬆️
... and 14 more

@wschin wschin requested a review from sfilipi March 2, 2019 00:02
@wschin
Copy link
Member Author

wschin commented Mar 2, 2019

I switch to params now.


In reply to: 468742828 [](ancestors = 468742828)

@wschin wschin requested a review from TomFinley March 2, 2019 00:03
Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@wschin wschin merged commit c2fabb8 into dotnet:master Mar 5, 2019
@wschin wschin deleted the change-one-arg-type branch March 5, 2019 01:06
@sharwell
Copy link
Member

sharwell commented Mar 5, 2019

@wschin This commit appears to have broken the repo build

@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants