-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Move Static API extensions to separate assembly #1930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
{ | ||
public sealed partial class TextLoader | ||
public static class TextLoaderStatic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TextLoaderStatic [](start = 24, length = 16)
I expect this will go into the M.ML.StaticPipe assembly, right? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap. When I move it to the Microsoft.ML.StaticPipe assembly I see two tests fail ReturnTypeIsSchemaShape
and ReturnTypeIsSchemaShapeChained
.
Error : Message: Mismatch between number of diagnostics returned
@TomFinley . Any insights on what might be going on here ?
In reply to: 243091763 [](ancestors = 243091763)
Can we make it an extension to Refers to: src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderStatic.cs:40 in 2775152. [](commit_id = 2775152, deletion_comment = False) |
And name it In reply to: 448760552 [](ancestors = 448760552) Refers to: src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderStatic.cs:40 in 2775152. [](commit_id = 2775152, deletion_comment = False) |
I believe an mlcontext extension exists for this already. Kindly let me know if I am missing something. In reply to: 448760615 [](ancestors = 448760615,448760552) Refers to: src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderStatic.cs:40 in 2775152. [](commit_id = 2775152, deletion_comment = False) |
internal class OnlineDefaultArgs | ||
{ | ||
[BestFriend] | ||
internal const int NumIterations = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not making this public? If the outer class is internal, that should be sufficient right? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question applies everywhere else you have argument defaults
In reply to: 243470421 [](ancestors = 243470421)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @abgoswam, incidentally this issue raised by @artidoro is the one thing that is preventing me from signing off (well, that and whatever is going on with multistream source). While the mistake of whoever wrote these default classes (making members of an internal class itself internal, which is utterly pointless and counterproductive) was a regrettable mistake, by sprinkling this attribute everywhere instead of fixing the original problem (that is, these members should have the public access modifier), we are making the code appreciably more difficult to read. While I know you did not commit that original error, since you're now relying on them we should fix it now. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -376,7 +376,7 @@ private void CategoricalFeaturizationOn(params string[] dataPath) | |||
); | |||
|
|||
// Read the data. | |||
var data = reader.Read(dataPath); | |||
var data = reader.Read(new MultiFileSource(dataPath)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MultiFileSource [](start = 39, length = 15)
I don't see why just the string dataPath is not sufficient? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the extension of #1281 not being picked up for some reason? We should continue to use it. It makes the sample so much more pleasant to read. Back before we had that, it was something that came up perennially from the .NET team, users, our own developers, so it would be a shame if we had somehow regressed on that. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to add the method Read(params string[] path)
in TextLoader.cs to get it working.
The regression in this PR is because TextLoaderStatic
used to be a partial class TextLoader
In reply to: 243578060 [](ancestors = 243578060)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to fix a bug in TextLoader.cs to get it to work.
In reply to: 243471085 [](ancestors = 243471085)
@@ -376,7 +376,7 @@ private void CategoricalFeaturizationOn(params string[] dataPath) | |||
); | |||
|
|||
// Read the data. | |||
var data = reader.Read(dataPath); | |||
var data = reader.Read(new MultiFileSource(dataPath)); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you change this file you also need to change the CookBook. https://github.com/dotnet/machinelearning/blob/7faa46c41fcd1a214fac7bfbdc8a9d1adbcc8d41/docs/code/MlNetCookBook.md#mlnet-cookbook #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I believe @Zruty0 is suggesting is that we should not be creating the static pipelines in this fashion, but rather similarly to how we create dynamic pipelines, by using MLContext directly. I had previously resisted doing that, since the static pipeline work was experimental. Therefore putting them on something as central as the MLContext might lead to confusion. However, your work here puts them in a separate assembly. This is actually an interesting side benefit to your changes I had hitherto not considered -- by moving these extension methods to a separate assembly (and, ultimately a separate experimental nuget), we will have free license to add creation methods like this directly on the catalogs themselves. Which is a very attractive proposition! However, I would not make that part of your PR right now. It is definitely something we want to write up in an issue. Of course, if you felt like doing that work now for this specific component (just to see what it would look like), that might be nice. I would not make it a requirement of your PR though. In reply to: 448765345 [](ancestors = 448765345,448760615,448760552) Refers to: src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderStatic.cs:40 in 2775152. [](commit_id = 2775152, deletion_comment = False) |
Thanks for the details. Yes, I will write up a separate issue for this, and keep it outside the purview of this PR Could you kindly give an example of what you mean by - To me the pipeline creation seems similar between the static and dynamic APIs. Maybe I am missing some subtlety here. An example would be very helpful. In reply to: 449246594 [](ancestors = 449246594,448765345,448760615,448760552) Refers to: src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderStatic.cs:40 in 2775152. [](commit_id = 2775152, deletion_comment = False) |
Hi @TomFinley @artidoro . Could you kindly review this PR. I have addressed all the existing comments. |
// Licensed to the .NET Foundation under one or more agreements. | ||
// The .NET Foundation licenses this file to you under the MIT license. | ||
// See the LICENSE file in the project root for more information. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need the license in the files in the samples folder. I don't see it in other files, maybe better to remove it for consistency. #Resolved
@@ -7,6 +7,7 @@ | |||
<ItemGroup> | |||
<ProjectReference Include="..\Microsoft.ML.Data\Microsoft.ML.Data.csproj" /> | |||
<ProjectReference Include="..\Microsoft.ML.HalLearners\Microsoft.ML.HalLearners.csproj" /> | |||
<ProjectReference Include="..\Microsoft.ML.StaticPipe\Microsoft.ML.StaticPipe.csproj" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StaticPipe [](start = 71, length = 10)
why we have this? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this VectorWhiteningStaticExtensions.cs
would complain of missing references.
Note previously the static extensions were part of the Microsoft.ML.Data
assembly. In this PR we moved them to a separate assembly Microsoft.ML.StaticPipe
In reply to: 244826453 [](ancestors = 244826453)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
public sealed partial class TextLoader | ||
public static class TextLoaderStatic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TextLoaderStatic [](start = 24, length = 16)
out of curiosity, why we have to have this suffix Static for TextLoader? It's already in different namespace. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of reasons:
- In general I found it more convenient to understand which TextLoader I was using (the Dynamic
TextLoader
or its static counterpartTextLoaderStatic
) - Wanted the class to match the filename, esp since I was moving files around.
In reply to: 244827020 [](ancestors = 244827020)
@@ -722,64 +720,4 @@ public SchemaShape GetOutputSchema(SchemaShape inputSchema) | |||
return new SchemaShape(result.Values); | |||
} | |||
} | |||
|
|||
public static class PcaEstimatorExtensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PcaEstimatorExtensions [](start = 24, length = 22)
Where did this code move? I don't find it elsewhere. Is it not needed? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to Microsoft.ML.StaticPipe\TransformsStatic.cs
Line # 1625
In reply to: 244831747 [](ancestors = 244831747)
FactorizationMachineStaticExtensions??? Just a note: I know its not your work. You just moved the code into this assembly but I see that naming of the files and class names are not consistent. Do you think this should be corrected too (maybe in another issue)? #Pending Refers to: src/Microsoft.ML.StandardLearners/FactorizationMachine/FactorizationMachineStatic.cs:19 in b108d08. [](commit_id = b108d08, deletion_comment = False) |
I think this is an issue across the codebase where the naming of files and class names is not consistent. @shmoradims . If I recall correctly, you had brought up this issue . is there already an issue that tracks this ? If not I will create a separate issue. In reply to: 450965658 [](ancestors = 450965658) Refers to: src/Microsoft.ML.StandardLearners/FactorizationMachine/FactorizationMachineStatic.cs:19 in b108d08. [](commit_id = b108d08, deletion_comment = False) |
#1697 Anyway. I would ask @TomFinley do we still need "Runtime" in namespace, or we can have one namespace since it's now part of one assembly? Refers to: src/Microsoft.ML.Data/StaticPipe/Reconciler.cs:10 in 6169d13. [](commit_id = 6169d13, deletion_comment = False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I would still ask @TomFinley regarding runtime namespace (maybe classes need to be make internal, in separate PR, or maybe it's fine to have runtime namespace)
@Ivanidzo4ka . I have created a separate issue #2003 to track how to deal with the I will investigate what it takes to consolidate the code related to the Static API within the |
Towards #1695 . Also fixes #1480