[WIP] Avoid propagating some input columns when applying Onnx models #4971

antoniovs1029 · 2020-03-25T13:17:19Z

The goal of this PR is to:

Fix ColumnSelectingTransformer exported onnx model doesn't drop input columns coming from input dataview #4970 regarding ColumnSelectingTransformer onnx exported models not working as expected when applying it with ML.NET
Avoid propagating the input columns through an OnnxTransformer. This is a new requirement that was discussed offline with @harishsk . The idea is that the output schema of the onnxtransformer should contain only a column for each output of the onnx model. Until now, since OnnxTransformer is a RowToRowTransformer, the input schema was propagated to the output.

Fixing point number 2 would actually fix point number 1, given the way that the onnx export works today for ColumnSelectingTransformer (where the dropped columns are removed from the onnx model output inside the onnx graph).

https://github.com/harishsk/machinelearning/commit/1b6cb4f142553606b62bcf6520981c34aca2906b

…orHarish

This reverts commit 7be5229.

This reverts commit 00a26b2.

…odel output (without the propagated columns from input)

antoniovs1029 · 2020-03-25T13:29:57Z

I think the main problem here is that OnnxTransformer is a RowToRowTransformer, and RTRT's don't allow transformers to remove columns from the output schema, only add columns to it. Because of this, I'd believe there are 2 possible solutions:

Change OnnxTransformer to stop inheriting from RTRT, and add any necessary behavior to it
Change RTRT to make it possible for OnnxTransformer to remove columns through some mechanism.

Currently, this PR uses the second approach, and it's based on a previous attempt that @harishsk tried to implement to solve these problems some months ago, but that weren't included on his final commit because other solutions seemed more appropriate at the time.

With his approach, this PR adds a OnnxDataTransform which will actually transform the input to the output, by still using OnnxTransformer's mapper. It is also necessary to modify RowToRowTransformer in order to make it possible to OnnxTransformer to let OnnxDataTransform to take responsibility of doing the transform.

Personally I am still not convinced that this is the best way to go, particularly after fixing some issues that appeared on our tests while trying to use this approach. I feel that the correct solution should be to stop inheriting from RTRT and then OnnxDataTransform might not be necessary. I still want to explore that other option, perhaps on another PR, but I haven't managed to fully accomplish it, so I'll just leave this PR here and see if anyone has some comments on it.

Also, there are still things that I might be able to remove from OnnxTransformer and OnnxDataTransform, but I will still need to look more into it.

antoniovs1029 · 2020-03-25T13:31:24Z

test/Microsoft.ML.OnnxTransformerTest/OnnxTransformTests.cs

@@ -416,7 +419,7 @@ public void OnnxModelInMemoryImage()
            // Convert IDataView back to IEnumerable<ImageDataPoint> so that user can inspect the output, column "softmaxout_1", of the ONNX transform.
            // Note that Column "softmaxout_1" would be stored in ImageDataPont.Scores because the added attributed [ColumnName("softmaxout_1")]
            // tells that ImageDataPont.Scores is equivalent to column "softmaxout_1".
-            var transformedDataPoints = ML.Data.CreateEnumerable<ImageDataPoint>(onnx, false).ToList();


This test needed to be changed because now we're not propagating the input Image column to the output.

antoniovs1029 · 2020-03-25T13:39:32Z

src/Microsoft.ML.OnnxTransformer/OnnxTransform.cs

+        protected override IRowToRowMapper GetRowToRowMapperCore(DataViewSchema inputSchema)
+        {
+            Host.CheckValue(inputSchema, nameof(inputSchema));
+            return new OnnxDataTransform(Host, new EmptyDataView(Host, inputSchema), new Mapper(this, inputSchema));
+        }
+
+        protected override DataViewSchema GetOutputSchemaCore(DataViewSchema inputSchema)
+        {
+            return new OnnxDataTransform(Host, new EmptyDataView(Host, inputSchema), new Mapper(this, inputSchema)).OutputSchema;
+        }
+
+        private protected override IDataView MakeDataTransformCore(IDataView input)
+        {
+            Host.CheckValue(input, nameof(input));
+            return new OnnxDataTransform(Host, input, new Mapper(this, input.Schema));
+        }
+


Besides the fact that I don't fully like modifying RowToRowTransformer to make these methods overridable, I don't like the way this looks either. It makes it kinda tricky to have a different transform set the output of this mapper, but it's the only way to go with this approach... it gets weirder because the OnnxDataTransform simply gets the output by accessing members that exist on this mapper.

But it's necessary that OnnxTransformer, OnnxTransformer.Mapper, OnnxDataTransform, and OnnxScoringEstimator all give the same output schema, since different code paths will require the output schema from any of these classes. So the entanglement here seems to be necessary. The only alternative is to have each of the classes determine the output schema by themselves, but it might become more difficult to maintain the same code on different places.

yaeldekel · 2020-03-25T20:32:21Z

In the vast majority of cases (including the test that Antonio mentioned needs to change) this will be doing the wrong thing. The ML.NET pipelines (unless they explicitly contain a DropColumns transformer), only add columns, and they don't do anything to the existing columns of the data. A pipeline can contain columns that are not related to the OnnxTransformer, and removing them (especially without giving the user a workaround to not remove them) would make the OnnxTransformer unusable in these cases. For example, imagine that a pipeline has two feature columns - one which the user wants to featurize using an ONNX model, and one that doesn't need featurization, and then the user wants to concatenate the unchanged features column to the output of the OnnxTransformer. Another example of when this would do the wrong thing is if a pipeline contains more than one OnnxTransformer - the first one would remove the input columns, and would make the second one be unusable.

In general, I think that the problem this PR is trying to solve is not a bug. The contract of the OnnxTransformer is to compute outputs of the ONNX model, it does not say anything about the output schema matching the output schema of the ONNX model, so I am not sure there is a change that needs to be made. But I could be misunderstanding the problem that this PR is trying to fix?

In reply to: 603839891 [](ancestors = 603839891)

antoniovs1029 · 2020-03-25T20:55:17Z

In the vast majority of cases (including the test that Antonio mentioned needs to change) this will be doing the wrong thing.

You're right. In fact, the test that is failing right now on the CI demonstrates this. It is failing because it is trying to apply a DNN Featurizer .onnx model that has only 1 input and 1 output to a feature column, before using a classifier... and, with the current implementation on this PR, applying the featurizer also drops the "Label" column, and so then the classifier throws an exception because it can't find that column.

I have discussed this offline with Harish, and I had misunderstood what columns where expected to be propagated. So this was my mistake.

In general, the onnx transformer should only drop input columns when the input columns are actually mentioned as input inside the .onnx model, but they aren't connected to the output of the .onnx model. This way it will actually make the ColumnSelectingTransformer to work as expected (i.e. with the onnx transformer actually dropping the columns).

I will update the PR and my comments here to fit the actual expected behavior.

…rent sources

* Added Bindings class to OnnxDataTransform. It still propagates the input columns.

antoniovs1029 · 2020-04-09T10:01:42Z

I've decided to start again this PR to avoid excessive cluttering.

So I will close this one. Further work will be made on #5012

antoniovs1029 added 7 commits February 20, 2020 13:25

Merged with Harish commit with partial solution:

7be5229

https://github.com/harishsk/machinelearning/commit/1b6cb4f142553606b62bcf6520981c34aca2906b

Merge remote-tracking branch 'upstream/master' into is25onnxColSelect…

0ac2da1

…orHarish

Revert "Merged with Harish commit with partial solution:"

00a26b2

This reverts commit 7be5229.

Revert "Revert "Merged with Harish commit with partial solution:""

944b066

This reverts commit 00a26b2.

Modified test, because now output schema should be the same as onnx m…

5a6a1b0

…odel output (without the propagated columns from input)

Actually check that the output schema has dropped the columns

4dbdc26

Further modifications to make this work with all the existing tests

a1b2b6e

antoniovs1029 requested a review from a team as a code owner March 25, 2020 13:17

antoniovs1029 changed the title ~~Is25onnx col selector harish~~ Avoid propagating input columns when applying Onnx models Mar 25, 2020

antoniovs1029 commented Mar 25, 2020

View reviewed changes

Remove unnecessary outputschema property on OnnxTransformer

dfed82e

antoniovs1029 changed the title ~~Avoid propagating input columns when applying Onnx models~~ [WIP] Avoid propagating input columns when applying Onnx models Mar 25, 2020

antoniovs1029 added 6 commits March 30, 2020 04:22

Move OutputSchema logic to OnnxDataTransform instead of Mapper

816c66f

Added the use of ColumnBindings on OnnxDataTransform

e928b3b

Added MYTODO to comments

d604cf4

Still not working. GetActive() can't return inputcolumns from 2 diffe…

e9f4def

…rent sources

* Removed ColumnSelectingTransformer from OnnxDataTransform

357648e

* Added Bindings class to OnnxDataTransform. It still propagates the input columns.

Drop columns inside OnnxDataTransformer.Bindings and added comments

0687f52

antoniovs1029 changed the title ~~[WIP] Avoid propagating input columns when applying Onnx models~~ [WIP] Avoid propagating some input columns when applying Onnx models Apr 7, 2020

antoniovs1029 added 7 commits April 7, 2020 02:54

Revert changes in OnnxTransformTests

9191ea9

Added comment

88bd905

Added test for the different possible cases

183ba26

Added comments

315aa52

Added test for drop columns

6759324

Fixed mistakes in ColumnSelectingOnnxTestColumnPropagation

0fd1557

Added side effect test of dropping input columns

f73e097

Updated comments

26e72ec

antoniovs1029 closed this Apr 9, 2020

ghost locked as resolved and limited conversation to collaborators Mar 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Avoid propagating some input columns when applying Onnx models #4971

[WIP] Avoid propagating some input columns when applying Onnx models #4971

Uh oh!

antoniovs1029 commented Mar 25, 2020 •

edited

Loading

Uh oh!

antoniovs1029 commented Mar 25, 2020 •

edited

Loading

Uh oh!

antoniovs1029 Mar 25, 2020

Uh oh!

antoniovs1029 Mar 25, 2020 •

edited

Loading

Uh oh!

yaeldekel commented Mar 25, 2020

Uh oh!

antoniovs1029 commented Mar 25, 2020 •

edited

Loading

Uh oh!

antoniovs1029 commented Apr 9, 2020

Uh oh!

Uh oh!

[WIP] Avoid propagating some input columns when applying Onnx models #4971

[WIP] Avoid propagating some input columns when applying Onnx models #4971

Uh oh!

Conversation

antoniovs1029 commented Mar 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antoniovs1029 commented Mar 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antoniovs1029 Mar 25, 2020

Choose a reason for hiding this comment

Uh oh!

antoniovs1029 Mar 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaeldekel commented Mar 25, 2020

Uh oh!

antoniovs1029 commented Mar 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antoniovs1029 commented Apr 9, 2020

Uh oh!

Uh oh!

antoniovs1029 commented Mar 25, 2020 •

edited

Loading

antoniovs1029 commented Mar 25, 2020 •

edited

Loading

antoniovs1029 Mar 25, 2020 •

edited

Loading

antoniovs1029 commented Mar 25, 2020 •

edited

Loading