Skip to content

ColumnSelectingTransformer exported onnx model doesn't drop input columns coming from input dataview #4970

Closed
@antoniovs1029

Description

@antoniovs1029

The ColumnSelectingTransformer has the ability to drop columns that aren't desired by the user. The Onnx exported model eliminates those columns from the Onnx model graph output, but this isn't enough to remove them from the output Schema, so even if they're removed from the onnx model, those columns are still there in the output after using the ApplyOnnxModel method. The reason for this is that the OnnxTransformer (which is used by the ApplyOnnxModel method) is a RowToRowTransformer, and such transformers don't have the capacity to drop columns, only to add them to the input DataView.

Code

The existing test for ColumnSelectingTransformer can be used to notice this issue

public void SelectColumnsOnnxTest()
{
var mlContext = new MLContext(seed: 1);
string dataPath = GetDataPath("breast-cancer.txt");
var dataView = ML.Data.LoadFromTextFile(dataPath, new[] {
new TextLoader.Column("Label", DataKind.Boolean, 0),
new TextLoader.Column("Thickness", DataKind.Double, 1),
new TextLoader.Column("Size", DataKind.Single, 2),
new TextLoader.Column("Shape", DataKind.Int32, 3),
new TextLoader.Column("Adhesion", DataKind.Int32, 4),
new TextLoader.Column("EpithelialSize", DataKind.Int32, 5),
new TextLoader.Column("BlandChromatin", DataKind.Int32, 7),
new TextLoader.Column("NormalNucleoli", DataKind.Int32, 8),
new TextLoader.Column("Mitoses", DataKind.Int32, 9),
});
var pipeline = mlContext.Transforms.ReplaceMissingValues("Size").Append(mlContext.Transforms.SelectColumns(new[] { "Size", "Shape", "Thickness", "Label" }));
var model = pipeline.Fit(dataView);
var transformedData = model.Transform(dataView);
var onnxModel = mlContext.Model.ConvertToOnnxProtobuf(model, dataView);
var onnxFileName = "selectcolumns.onnx";
var onnxModelPath = GetOutputPath(onnxFileName);
SaveOnnxModel(onnxModel, onnxModelPath, null);
if (IsOnnxRuntimeSupported())
{
// Evaluate the saved ONNX model using the data used to train the ML.NET pipeline.
string[] inputNames = onnxModel.Graph.Input.Select(valueInfoProto => valueInfoProto.Name).ToArray();
string[] outputNames = onnxModel.Graph.Output.Select(valueInfoProto => valueInfoProto.Name).ToArray();
var onnxEstimator = mlContext.Transforms.ApplyOnnxModel(outputNames, inputNames, onnxModelPath);
var onnxTransformer = onnxEstimator.Fit(dataView);
var onnxResult = onnxTransformer.Transform(dataView);
// Verify that onnx output has only the four columns we selected from the input
Assert.Equal(4, outputNames.Length);
Assert.Equal("Size.output", outputNames[0]);
Assert.Equal("Shape.output", outputNames[1]);
Assert.Equal("Thickness.output", outputNames[2]);
Assert.Equal("Label.output", outputNames[3]);
CompareSelectedColumns<Single>("Size", "Size", transformedData, onnxResult);
CompareSelectedColumns<int>("Shape", "Shape", transformedData, onnxResult);
CompareSelectedColumns<double>("Thickness", "Thickness", transformedData, onnxResult);
CompareSelectedColumns<bool>("Label", "Label", transformedData, onnxResult);
}
onnxFileName = "SelectColumns.txt";
var subDir = Path.Combine("..", "..", "BaselineOutput", "Common", "Onnx", "Transforms");
var onnxTextModelPath = GetOutputPath(subDir, onnxFileName);
SaveOnnxModel(onnxModel, null, onnxTextModelPath);
CheckEquality(subDir, onnxFileName, digitsOfPrecision: 1);
Done();
}

By setting a breakpoint, and inspecting into the schema of the outputs, I get the following result:
image

It shows that the undesired columns were correctly dropped by the ColumnSelectingTransformer, but not by the OnnxModel. This test should also include comparing both schemas, to make sure that the onnxmodel is actually dropping the undesired columns.

Metadata

Metadata

Assignees

Labels

P1Priority of the issue for triage purpose: Needs to be fixed soon.wontfixThis will not be worked on

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions