Skip to content

Move Static API extensions to separate assembly #1930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jan 3, 2019

Conversation

abgoswam
Copy link
Member

@abgoswam abgoswam commented Dec 19, 2018

Towards #1695 . Also fixes #1480

  • Static extensions for TextLoader moved to the Microsoft.ML.StaticPipe assembly
  • Static extension related files moved to the Microsoft.ML.StaticPipe assembly
  • Moved static extensions for some transforms / learners that were missed in previous PR Separate assemblies for static extensions #1914

@abgoswam abgoswam changed the title WIP : Static extensions for TextLoader WIP : Place static extensions for TextLoader in Microsoft.ML.StaticPipe assembly Dec 19, 2018
{
public sealed partial class TextLoader
public static class TextLoaderStatic
Copy link
Contributor

@Zruty0 Zruty0 Dec 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TextLoaderStatic [](start = 24, length = 16)

I expect this will go into the M.ML.StaticPipe assembly, right? #Resolved

Copy link
Member Author

@abgoswam abgoswam Dec 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap. When I move it to the Microsoft.ML.StaticPipe assembly I see two tests fail ReturnTypeIsSchemaShape and ReturnTypeIsSchemaShapeChained.

Error : Message: Mismatch between number of diagnostics returned

@TomFinley . Any insights on what might be going on here ?


In reply to: 243091763 [](ancestors = 243091763)

@Zruty0
Copy link
Contributor

Zruty0 commented Dec 19, 2018

    public static DataReader<IMultiStreamSource, TShape> CreateReader<[IsShape] TShape>(

Can we make it an extension to MLContext.Data, like the other CreateTextReader?


Refers to: src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderStatic.cs:40 in 2775152. [](commit_id = 2775152, deletion_comment = False)

@Zruty0
Copy link
Contributor

Zruty0 commented Dec 19, 2018

    public static DataReader<IMultiStreamSource, TShape> CreateReader<[IsShape] TShape>(

And name it CreateTextReader, I guess.


In reply to: 448760552 [](ancestors = 448760552)


Refers to: src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderStatic.cs:40 in 2775152. [](commit_id = 2775152, deletion_comment = False)

@abgoswam
Copy link
Member Author

    public static DataReader<IMultiStreamSource, TShape> CreateReader<[IsShape] TShape>(

I believe an mlcontext extension exists for this already. CreateTextReader inside DataLoadSaveOperationsExtensions. I see usages for this in the cookbook samples for static API

Kindly let me know if I am missing something.


In reply to: 448760615 [](ancestors = 448760615,448760552)


Refers to: src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderStatic.cs:40 in 2775152. [](commit_id = 2775152, deletion_comment = False)

@abgoswam abgoswam changed the title WIP : Place static extensions for TextLoader in Microsoft.ML.StaticPipe assembly Move Static API extensions to separate assembly Dec 20, 2018
internal class OnlineDefaultArgs
{
[BestFriend]
internal const int NumIterations = 1;
Copy link
Contributor

@artidoro artidoro Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not making this public? If the outer class is internal, that should be sufficient right? #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question applies everywhere else you have argument defaults


In reply to: 243470421 [](ancestors = 243470421)

Copy link
Contributor

@TomFinley TomFinley Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @abgoswam, incidentally this issue raised by @artidoro is the one thing that is preventing me from signing off (well, that and whatever is going on with multistream source). While the mistake of whoever wrote these default classes (making members of an internal class itself internal, which is utterly pointless and counterproductive) was a regrettable mistake, by sprinkling this attribute everywhere instead of fixing the original problem (that is, these members should have the public access modifier), we are making the code appreciably more difficult to read. While I know you did not commit that original error, since you're now relying on them we should fix it now. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. thanks for the details.


In reply to: 243577623 [](ancestors = 243577623)

@@ -376,7 +376,7 @@ private void CategoricalFeaturizationOn(params string[] dataPath)
);

// Read the data.
var data = reader.Read(dataPath);
var data = reader.Read(new MultiFileSource(dataPath));
Copy link
Contributor

@artidoro artidoro Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MultiFileSource [](start = 39, length = 15)

I don't see why just the string dataPath is not sufficient? #Resolved

Copy link
Contributor

@TomFinley TomFinley Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the extension of #1281 not being picked up for some reason? We should continue to use it. It makes the sample so much more pleasant to read. Back before we had that, it was something that came up perennially from the .NET team, users, our own developers, so it would be a shame if we had somehow regressed on that. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add the method Read(params string[] path) in TextLoader.cs to get it working.

The regression in this PR is because TextLoaderStatic used to be a partial class TextLoader


In reply to: 243578060 [](ancestors = 243578060)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to fix a bug in TextLoader.cs to get it to work.


In reply to: 243471085 [](ancestors = 243471085)

@@ -376,7 +376,7 @@ private void CategoricalFeaturizationOn(params string[] dataPath)
);

// Read the data.
var data = reader.Read(dataPath);
var data = reader.Read(new MultiFileSource(dataPath));

Copy link
Contributor

@artidoro artidoro Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. So this change is not needed anymore.


In reply to: 243471218 [](ancestors = 243471218)

@TomFinley
Copy link
Contributor

TomFinley commented Dec 21, 2018

    public static DataReader<IMultiStreamSource, TShape> CreateReader<[IsShape] TShape>(

What I believe @Zruty0 is suggesting is that we should not be creating the static pipelines in this fashion, but rather similarly to how we create dynamic pipelines, by using MLContext directly.

I had previously resisted doing that, since the static pipeline work was experimental. Therefore putting them on something as central as the MLContext might lead to confusion. However, your work here puts them in a separate assembly. This is actually an interesting side benefit to your changes I had hitherto not considered -- by moving these extension methods to a separate assembly (and, ultimately a separate experimental nuget), we will have free license to add creation methods like this directly on the catalogs themselves. Which is a very attractive proposition! However, I would not make that part of your PR right now. It is definitely something we want to write up in an issue.

Of course, if you felt like doing that work now for this specific component (just to see what it would look like), that might be nice. I would not make it a requirement of your PR though.


In reply to: 448765345 [](ancestors = 448765345,448760615,448760552)


Refers to: src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderStatic.cs:40 in 2775152. [](commit_id = 2775152, deletion_comment = False)

@abgoswam
Copy link
Member Author

    public static DataReader<IMultiStreamSource, TShape> CreateReader<[IsShape] TShape>(

Thanks for the details. Yes, I will write up a separate issue for this, and keep it outside the purview of this PR

Could you kindly give an example of what you mean by -
"...we should not be creating the static pipelines in this fashion, but rather similarly to how we create dynamic pipelines, by using MLContext directly."

To me the pipeline creation seems similar between the static and dynamic APIs. Maybe I am missing some subtlety here. An example would be very helpful.


In reply to: 449246594 [](ancestors = 449246594,448765345,448760615,448760552)


Refers to: src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderStatic.cs:40 in 2775152. [](commit_id = 2775152, deletion_comment = False)

@abgoswam
Copy link
Member Author

abgoswam commented Jan 2, 2019

Hi @TomFinley @artidoro . Could you kindly review this PR. I have addressed all the existing comments.

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

Copy link
Contributor

@artidoro artidoro Jan 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need the license in the files in the samples folder. I don't see it in other files, maybe better to remove it for consistency. #Resolved

@@ -7,6 +7,7 @@
<ItemGroup>
<ProjectReference Include="..\Microsoft.ML.Data\Microsoft.ML.Data.csproj" />
<ProjectReference Include="..\Microsoft.ML.HalLearners\Microsoft.ML.HalLearners.csproj" />
<ProjectReference Include="..\Microsoft.ML.StaticPipe\Microsoft.ML.StaticPipe.csproj" />
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jan 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StaticPipe [](start = 71, length = 10)

why we have this? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this VectorWhiteningStaticExtensions.cs would complain of missing references.

Note previously the static extensions were part of the Microsoft.ML.Data assembly. In this PR we moved them to a separate assembly Microsoft.ML.StaticPipe


In reply to: 244826453 [](ancestors = 244826453)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense if this StaticPipe project


In reply to: 244829980 [](ancestors = 244829980,244826453)

{
public sealed partial class TextLoader
public static class TextLoaderStatic
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jan 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TextLoaderStatic [](start = 24, length = 16)

out of curiosity, why we have to have this suffix Static for TextLoader? It's already in different namespace. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of reasons:

  1. In general I found it more convenient to understand which TextLoader I was using (the Dynamic TextLoader or its static counterpart TextLoaderStatic)
  2. Wanted the class to match the filename, esp since I was moving files around.

In reply to: 244827020 [](ancestors = 244827020)

@@ -722,64 +720,4 @@ public SchemaShape GetOutputSchema(SchemaShape inputSchema)
return new SchemaShape(result.Values);
}
}

public static class PcaEstimatorExtensions
Copy link
Contributor

@zeahmed zeahmed Jan 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PcaEstimatorExtensions [](start = 24, length = 22)

Where did this code move? I don't find it elsewhere. Is it not needed? #Closed

Copy link
Member Author

@abgoswam abgoswam Jan 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to Microsoft.ML.StaticPipe\TransformsStatic.cs

Line # 1625


In reply to: 244831747 [](ancestors = 244831747)

@zeahmed
Copy link
Contributor

zeahmed commented Jan 2, 2019

public static class FactorizationMachineExtensions

FactorizationMachineStaticExtensions???

Just a note: I know its not your work. You just moved the code into this assembly but I see that naming of the files and class names are not consistent. Do you think this should be corrected too (maybe in another issue)? #Pending


Refers to: src/Microsoft.ML.StandardLearners/FactorizationMachine/FactorizationMachineStatic.cs:19 in b108d08. [](commit_id = b108d08, deletion_comment = False)

@abgoswam
Copy link
Member Author

abgoswam commented Jan 2, 2019

public static class FactorizationMachineExtensions

I think this is an issue across the codebase where the naming of files and class names is not consistent.

@shmoradims . If I recall correctly, you had brought up this issue . is there already an issue that tracks this ? If not I will create a separate issue.


In reply to: 450965658 [](ancestors = 450965658)


Refers to: src/Microsoft.ML.StandardLearners/FactorizationMachine/FactorizationMachineStatic.cs:19 in b108d08. [](commit_id = b108d08, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Jan 3, 2019

public static class LocalPathReader

Can you tag #1480 issue, so it closes after you merge your PR? #Resolved


Refers to: src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderStatic.cs:314 in 6169d13. [](commit_id = 6169d13, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Jan 3, 2019

namespace Microsoft.ML.StaticPipe.Runtime

#1697
we have this issue which state what we should remove all "Runtime", but I'm not sure it's actually mean ALL namespace reference including this one.

Anyway. I would ask @TomFinley do we still need "Runtime" in namespace, or we can have one namespace since it's now part of one assembly?


Refers to: src/Microsoft.ML.Data/StaticPipe/Reconciler.cs:10 in 6169d13. [](commit_id = 6169d13, deletion_comment = False)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I would still ask @TomFinley regarding runtime namespace (maybe classes need to be make internal, in separate PR, or maybe it's fine to have runtime namespace)

@abgoswam abgoswam merged commit 25f3287 into dotnet:master Jan 3, 2019
@abgoswam
Copy link
Member Author

abgoswam commented Jan 3, 2019

@Ivanidzo4ka . I have created a separate issue #2003 to track how to deal with the Microsoft.ML.StaticPipe.Runtime namespace.

I will investigate what it takes to consolidate the code related to the Static API within the Microsoft.ML.StaticPipe namespace itself.

@abgoswam abgoswam deleted the abgoswam/textloader_static branch January 13, 2019 18:12
@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LocalPathReader should live in Microsoft.ML.StaticPipe
6 participants