Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed OnnxTransformer output column mapping. #5192

Merged

Conversation

michaelgsharp
Copy link
Member

Our OnnxTransformer has overrides that let you specify a subset of output columns, all the output columns, or let it figure it out from the onnx model. The issue was that if you manually specified either a subset of the columns, or all the columns, and did it in an order different then the onnx model, our transformer would not do the mapping correctly and it would try and access the wrong column. This usually resulted in an error that the types didn't match, but when types matched it just returned wrong data.

This PR fixes that issue by doing the correct mapping regardless of the order they are provided in.

@michaelgsharp michaelgsharp requested review from harishsk and a team June 2, 2020 19:43
@michaelgsharp michaelgsharp self-assigned this Jun 2, 2020
Comment on lines +335 to +337
[OnnxFact]
public void OnnxModelOutputDifferentOrder()
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can refactor both tests into one, as they're very similar. And include an [InlineData(true) with OnnxModelOutputDifferentOrder(bool checkSubset)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll combine them. I dont think the [InlineData] is needed since I'm testing the output ordering and not the checkSubset` itself though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood what you were saying first. I get it now. The issue with that is that with the subset I dont compare all the columns and with the full column set I do. I just combined the tests and moved the subset pipeline and checking below the full one.

@codecov
Copy link

codecov bot commented Jun 2, 2020

Codecov Report

Merging #5192 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           master    #5192    +/-   ##
========================================
  Coverage   73.32%   73.32%            
========================================
  Files        1007     1007            
  Lines      187911   188015   +104     
  Branches    20241    20243     +2     
========================================
+ Hits       137790   137871    +81     
- Misses      44595    44622    +27     
+ Partials     5526     5522     -4     
Flag Coverage Δ
#Debug 73.32% <100.00%> (+<0.01%) ⬆️
#production 69.07% <100.00%> (-0.01%) ⬇️
#test 87.46% <100.00%> (+0.01%) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.OnnxTransformer/OnnxTransform.cs 91.97% <100.00%> (+0.06%) ⬆️
...osoft.ML.OnnxTransformerTest/OnnxTransformTests.cs 97.67% <100.00%> (+0.19%) ⬆️
...rosoft.ML.Data/DataView/BatchDataViewMapperBase.cs 68.67% <0.00%> (-0.84%) ⬇️
...rosoft.ML.Data/DataView/RowToRowMapperTransform.cs 90.65% <0.00%> (-0.82%) ⬇️
...rosoft.ML.AutoML/ColumnInference/TextFileSample.cs 62.25% <0.00%> (-0.67%) ⬇️
...rosoft.ML.Data/Transforms/PerGroupTransformBase.cs 76.33% <0.00%> (-0.46%) ⬇️
...Microsoft.ML.Transforms/OptionalColumnTransform.cs 77.68% <0.00%> (-0.46%) ⬇️
...rc/Microsoft.ML.Data/Scorers/RowToRowScorerBase.cs 85.95% <0.00%> (-0.36%) ⬇️
src/Microsoft.ML.Data/Transforms/NAFilter.cs 85.77% <0.00%> (-0.31%) ⬇️
src/Microsoft.ML.Data/DataView/Transposer.cs 83.92% <0.00%> (-0.27%) ⬇️
... and 26 more

/// <summary>
/// In the case that the ML.Net user wants a subset of columns or lists the columns in a different order then specified in the ONNX model,
/// we need to map from the ONNX model index to the ML.Net column index. This method does that mapping for us.
/// </summary>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is mapping the dataview column index to onnx output index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

@harishsk harishsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@michaelgsharp michaelgsharp merged commit 3d848a9 into dotnet:master Jun 3, 2020
@michaelgsharp michaelgsharp deleted the onnx-transformer-mapping-fix branch June 3, 2020 21:58
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants