Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to execute only the last transform in TransformWrapper and have WordBagEstimator return transformer chain #3700

Merged
merged 13 commits into from
May 16, 2019

Conversation

codemzs
Copy link
Member

@codemzs codemzs commented May 10, 2019

fixes #3699

@codecov
Copy link

codecov bot commented May 10, 2019

Codecov Report

Merging #3700 into master will increase coverage by 0.12%.
The diff coverage is 97.97%.

@@            Coverage Diff             @@
##           master    #3700      +/-   ##
==========================================
+ Coverage   72.77%    72.9%   +0.12%     
==========================================
  Files         808      809       +1     
  Lines      145588   145599      +11     
  Branches    16250    16243       -7     
==========================================
+ Hits       105956   106144     +188     
+ Misses      35207    35035     -172     
+ Partials     4425     4420       -5
Flag Coverage Δ
#Debug 72.9% <97.97%> (+0.12%) ⬆️
#production 68.41% <92.59%> (+0.13%) ⬆️
#test 89.05% <100%> (+0.02%) ⬆️
Impacted Files Coverage Δ
test/Microsoft.ML.Tests/Scenarios/WordBagTest.cs 100% <100%> (ø)
...c/Microsoft.ML.Transforms/Text/WordBagTransform.cs 79.68% <100%> (-1.81%) ⬇️
...L.Transforms/Text/WordHashBagProducingTransform.cs 61.08% <100%> (+0.58%) ⬆️
...soft.ML.Transforms/Text/WrappedTextTransformers.cs 98.18% <100%> (+4.54%) ⬆️
test/Microsoft.ML.Functional.Tests/ModelFiles.cs 96.22% <100%> (+0.14%) ⬆️
...crosoft.ML.Core.Tests/UnitTests/TestEntryPoints.cs 98.08% <100%> (ø) ⬆️
...Microsoft.ML.Data/DataLoadSave/TransformWrapper.cs 61.76% <80%> (+39.76%) ⬆️
...luators/Metrics/MulticlassClassificationMetrics.cs 100% <0%> (ø) ⬆️
src/Microsoft.ML.LightGbm/LightGbmTrainerBase.cs 62.44% <0%> (ø) ⬆️
... and 10 more

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented May 10, 2019

if you can add some models to https://github.com/dotnet/machinelearning/tree/master/test/data/backcompat like ep_modelX.zip and update respectful test that would be nice thing to have. #Resolved

@yaeldekel
Copy link

yaeldekel commented May 10, 2019

    private TransformWrapper(IHostEnvironment env, ModelLoadContext ctx)

There are two estimators in our code base that use TransformWrapper on Fit, WordBagEstimator and WordHashBagEstimator. Can you add unit tests to make sure saving and loading works? I would add other estimators before them in the pipeline, to exercise the new _useLastTransformOnly logic. #Resolved


Refers to: src/Microsoft.ML.Data/DataLoadSave/TransformWrapper.cs:98 in 0879374. [](commit_id = 0879374, deletion_comment = False)

@codemzs
Copy link
Member Author

codemzs commented May 13, 2019

I have added a test.


In reply to: 491164237 [](ancestors = 491164237)

@codemzs
Copy link
Member Author

codemzs commented May 13, 2019

    private TransformWrapper(IHostEnvironment env, ModelLoadContext ctx)

Yes, done. thanks this was a great suggestion, turns out we had an existing bug in the code where if you use the above estimators more than once in the pipeline it will crash. I have changed them to return transformer chain as per your suggestion.


In reply to: 491371956 [](ancestors = 491371956)


Refers to: src/Microsoft.ML.Data/DataLoadSave/TransformWrapper.cs:98 in 0879374. [](commit_id = 0879374, deletion_comment = False)

@codemzs codemzs changed the title Add option to execute only the last transform in TransformWrapper. Add option to execute only the last transform in TransformWrapper and have WordBagEstimator return transformer chain May 13, 2019
@yaeldekel
Copy link

The test you added doesn't test the TransformWrapper though. You can add a model that contains the OptionalColumnTransform in it to the repository that Ivan mentioned, and then add a test that loads it and tries to apply it to some data.


In reply to: 491938454 [](ancestors = 491938454,491164237)

@yaeldekel
Copy link

yaeldekel commented May 13, 2019

Sorry, Ivan didn't mention the repo actually. Here it is: https://github.com/dotnet/machinelearning-testdata.

@eerhardt has permissions to push a new nuget package containing the models to myget, maybe he can help.


In reply to: 491993425 [](ancestors = 491993425,491938454,491164237)

@codemzs
Copy link
Member Author

codemzs commented May 15, 2019

We can check-in models in the current repo too.


In reply to: 491994611 [](ancestors = 491994611,491993425,491938454,491164237)

@yaeldekel
Copy link

yaeldekel commented May 15, 2019

[assembly: LoadableClass(typeof(TransformWrapper), null, typeof(SignatureLoadModel),

This and the ctor creating a TransformWrapper from a ModelLoadContext can be deleted. #Resolved


Refers to: src/Microsoft.ML.Data/DataLoadSave/TransformWrapper.cs:12 in 0e3dc6a. [](commit_id = 0e3dc6a, deletion_comment = False)

Copy link

@yaeldekel yaeldekel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@codemzs codemzs merged commit 2c1cfca into dotnet:master May 16, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TransfromWrapper should only apply the last transform in some cases
4 participants