-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[WIP] Avoid propagating some input columns when applying Onnx models #4971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Avoid propagating some input columns when applying Onnx models #4971
Conversation
I think the main problem here is that
Currently, this PR uses the second approach, and it's based on a previous attempt that @harishsk tried to implement to solve these problems some months ago, but that weren't included on his final commit because other solutions seemed more appropriate at the time. With his approach, this PR adds a Personally I am still not convinced that this is the best way to go, particularly after fixing some issues that appeared on our tests while trying to use this approach. I feel that the correct solution should be to stop inheriting from RTRT and then OnnxDataTransform might not be necessary. I still want to explore that other option, perhaps on another PR, but I haven't managed to fully accomplish it, so I'll just leave this PR here and see if anyone has some comments on it. Also, there are still things that I might be able to remove from OnnxTransformer and OnnxDataTransform, but I will still need to look more into it. |
@@ -416,7 +419,7 @@ public void OnnxModelInMemoryImage() | |||
// Convert IDataView back to IEnumerable<ImageDataPoint> so that user can inspect the output, column "softmaxout_1", of the ONNX transform. | |||
// Note that Column "softmaxout_1" would be stored in ImageDataPont.Scores because the added attributed [ColumnName("softmaxout_1")] | |||
// tells that ImageDataPont.Scores is equivalent to column "softmaxout_1". | |||
var transformedDataPoints = ML.Data.CreateEnumerable<ImageDataPoint>(onnx, false).ToList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test needed to be changed because now we're not propagating the input Image column to the output.
protected override IRowToRowMapper GetRowToRowMapperCore(DataViewSchema inputSchema) | ||
{ | ||
Host.CheckValue(inputSchema, nameof(inputSchema)); | ||
return new OnnxDataTransform(Host, new EmptyDataView(Host, inputSchema), new Mapper(this, inputSchema)); | ||
} | ||
|
||
protected override DataViewSchema GetOutputSchemaCore(DataViewSchema inputSchema) | ||
{ | ||
return new OnnxDataTransform(Host, new EmptyDataView(Host, inputSchema), new Mapper(this, inputSchema)).OutputSchema; | ||
} | ||
|
||
private protected override IDataView MakeDataTransformCore(IDataView input) | ||
{ | ||
Host.CheckValue(input, nameof(input)); | ||
return new OnnxDataTransform(Host, input, new Mapper(this, input.Schema)); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the fact that I don't fully like modifying RowToRowTransformer
to make these methods overridable, I don't like the way this looks either. It makes it kinda tricky to have a different transform set the output of this mapper, but it's the only way to go with this approach... it gets weirder because the OnnxDataTransform simply gets the output by accessing members that exist on this mapper.
But it's necessary that OnnxTransformer, OnnxTransformer.Mapper, OnnxDataTransform, and OnnxScoringEstimator all give the same output schema, since different code paths will require the output schema from any of these classes. So the entanglement here seems to be necessary. The only alternative is to have each of the classes determine the output schema by themselves, but it might become more difficult to maintain the same code on different places.
In the vast majority of cases (including the test that Antonio mentioned needs to change) this will be doing the wrong thing. The ML.NET pipelines (unless they explicitly contain a In general, I think that the problem this PR is trying to solve is not a bug. The contract of the In reply to: 603839891 [](ancestors = 603839891) |
You're right. In fact, the test that is failing right now on the CI demonstrates this. It is failing because it is trying to apply a DNN Featurizer .onnx model that has only 1 input and 1 output to a feature column, before using a classifier... and, with the current implementation on this PR, applying the featurizer also drops the "Label" column, and so then the classifier throws an exception because it can't find that column. I have discussed this offline with Harish, and I had misunderstood what columns where expected to be propagated. So this was my mistake. In general, the onnx transformer should only drop input columns when the input columns are actually mentioned as input inside the .onnx model, but they aren't connected to the output of the .onnx model. This way it will actually make the ColumnSelectingTransformer to work as expected (i.e. with the onnx transformer actually dropping the columns). I will update the PR and my comments here to fit the actual expected behavior. |
* Added Bindings class to OnnxDataTransform. It still propagates the input columns.
I've decided to start again this PR to avoid excessive cluttering. So I will close this one. Further work will be made on #5012 |
The goal of this PR is to:
ColumnSelectingTransformer
onnx exported models not working as expected when applying it with ML.NETOnnxTransformer
. This is a new requirement that was discussed offline with @harishsk . The idea is that the output schema of the onnxtransformer should containonly
a column for each output of the onnx model. Until now, since OnnxTransformer is aRowToRowTransformer
, the input schema was propagated to the output.Fixing point number 2 would actually fix point number 1, given the way that the onnx export works today for
ColumnSelectingTransformer
(where the dropped columns are removed from the onnx model output inside the onnx graph).