Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addresses #4226 . Fixes problem when loading NormalizerTransformer from disk. #4321

Merged
merged 22 commits into from
Oct 28, 2019

Conversation

antoniovs1029
Copy link
Member

@antoniovs1029 antoniovs1029 commented Oct 9, 2019

Addresses and actually solves issue #4226 by fixing the problem with loading a NormalizerTransformer from disk, where the NormalizerTransformer is supposed to work with multidimensional vectors as input.

Although this PR actually fixes that issue, another problem is described in one comment of the issue. This other problem is related to Resnet18, and is independent of the problem here fixed. Solving any of these two problems actually solve this issue, but perhaps the issue shall remain open until both problems are fixed.

The solution

  • I changed the way NormalizerTransformers are saved and loaded, so that it included the information of the dimensions of the vector it has as input. This changes maintain backward compatibility in that it is still possible to load models that were saved with previous binary format . But notice that the original problem described in the issue will persist for models that were saved before the changes made in this PR... it would be necessary to save the model with the updated save method in order to avoid that specific problem when working with multidimensional vectors.

  • I added a 3 test cases: one reproducing a very similar problem as in the original issue, another that tests a simplified version of the original problem, and another that checks that backward compatibility is still maintained.

@antoniovs1029 antoniovs1029 requested a review from a team as a code owner October 9, 2019 22:11
@codecov
Copy link

codecov bot commented Oct 9, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@0b20a99). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master    #4321   +/-   ##
=========================================
  Coverage          ?   74.66%           
=========================================
  Files             ?      883           
  Lines             ?   155225           
  Branches          ?    16939           
=========================================
  Hits              ?   115905           
  Misses            ?    34574           
  Partials          ?     4746
Flag Coverage Δ
#Debug 74.66% <100%> (?)
#production 70.22% <100%> (?)
#test 89.58% <100%> (?)
Impacted Files Coverage Δ
...t.ML.OnnxTransformerTest/DnnImageFeaturizerTest.cs 97.91% <100%> (ø)
...Microsoft.ML.Tests/Transformers/NormalizerTests.cs 100% <100%> (ø)
src/Microsoft.ML.Data/Transforms/Normalizer.cs 86.25% <100%> (ø)

// *** New Binary format ***
// - int: number of dimensions
// - ints: as many as dimensions, each one represent the size of each dimension
// - byte: ItemKind of input column (only R4 and R8 are valid)
Copy link

@yaeldekel yaeldekel Oct 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

byte [](start = 23, length = 4)

If you read (and write) this byte first, then in the !isVector case you can return itemType without reading anything else. #Resolved

Copy link
Member Author

@antoniovs1029 antoniovs1029 Oct 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done this. Please review if it was what you were expecting. Thanks! #Resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Thanks!


In reply to: 334615811 [](ancestors = 334615811)

@antoniovs1029 antoniovs1029 changed the title Addresses #4226 . Fixes problem when loading NormalizerTransformer from disk. WIP Addresses #4226 . Fixes problem when loading NormalizerTransformer from disk. Oct 11, 2019
Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had 2 minor comments. Once those are addressed, the change looks good to me.

@codemzs
Copy link
Member

codemzs commented Oct 14, 2019

Can you please remove “WIP” from the title? #Resolved

@antoniovs1029 antoniovs1029 changed the title WIP Addresses #4226 . Fixes problem when loading NormalizerTransformer from disk. Addresses #4226 . Fixes problem when loading NormalizerTransformer from disk. Oct 14, 2019
@justinormont
Copy link
Contributor

justinormont commented Oct 28, 2019

@eerhardt / @antoniovs1029: Is this ready for merge? The CodeCoverage is failing; does that matter?

@antoniovs1029 antoniovs1029 removed the request for review from yaeldekel October 28, 2019 06:36
@eerhardt
Copy link
Member

Is this ready for merge?

I believe so.

The CodeCoverage is failing; does that matter?

No.


var dimensions = new int[ndimensions];
for (int i = 0; i < ndimensions; i++)
dimensions[i] = ctx.Reader.ReadInt32();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use ctx.Reader.ReadIntArray() function, its a util extension function that is designed for reading arrays.

var dims = vectorType.Dimensions;
ctx.Writer.Write(dims.Length);
for (int i = 0; i < dims.Length; i++)
ctx.Writer.Write(dims[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use ctx.Writer.WriteIntArray() function, its a util extension function that is designed for writing arrays.

@codemzs
Copy link
Member

codemzs commented Oct 28, 2019

@justinormont @eerhardt It's not ready to be merged I have few comments that need to be addressed but it should be merged EOD.

@antoniovs1029 antoniovs1029 merged commit f0fb720 into dotnet:master Oct 28, 2019
frank-dong-ms-zz pushed a commit to frank-dong-ms-zz/machinelearning that referenced this pull request Nov 4, 2019
…mer from disk. (dotnet#4321)

* Fixed problem when saving and loading normalizer transformers that work on multidimensional vectors

* Added test of normalizer loaded from disk for a multidimensional vector

* Added test for backward compatibility with normalizer transformer with verWrittenCur: 0x00010001

* Added test reproducing the original scenario

* Changed visibility of classes made for test
frank-dong-ms-zz pushed a commit to frank-dong-ms-zz/machinelearning that referenced this pull request Nov 4, 2019
…mer from disk. (dotnet#4321)

* Fixed problem when saving and loading normalizer transformers that work on multidimensional vectors

* Added test of normalizer loaded from disk for a multidimensional vector

* Added test for backward compatibility with normalizer transformer with verWrittenCur: 0x00010001

* Added test reproducing the original scenario

* Changed visibility of classes made for test
@ghost ghost locked as resolved and limited conversation to collaborators Mar 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants