Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added onnx export support for SelectColumns #4590

Merged
merged 5 commits into from
Jan 27, 2020
Merged

Added onnx export support for SelectColumns #4590

merged 5 commits into from
Jan 27, 2020

Conversation

harishsk
Copy link
Contributor

Implemented ability to export ColumnSelectingEstimator to onnx.

@harishsk harishsk requested a review from a team as a code owner December 18, 2019 18:56
@codecov
Copy link

codecov bot commented Dec 19, 2019

Codecov Report

Merging #4590 into master will decrease coverage by 0.64%.
The diff coverage is 98.27%.

@@            Coverage Diff             @@
##           master    #4590      +/-   ##
==========================================
- Coverage   75.78%   75.14%   -0.65%     
==========================================
  Files         944      913      -31     
  Lines      171156   160909   -10247     
  Branches    18468    17307    -1161     
==========================================
- Hits       129715   120918    -8797     
+ Misses      36316    35143    -1173     
+ Partials     5125     4848     -277
Flag Coverage Δ
#Debug 75.14% <98.27%> (-0.65%) ⬇️
#production 70.53% <94.11%> (-0.85%) ⬇️
#test 90.3% <100%> (-0.34%) ⬇️
Impacted Files Coverage Δ
test/Microsoft.ML.Tests/OnnxConversionTest.cs 95.72% <100%> (-1.95%) ⬇️
...rc/Microsoft.ML.Data/Transforms/ColumnSelecting.cs 97.41% <94.11%> (+0.1%) ⬆️
src/Microsoft.ML.Core/Data/ReadOnlyMemoryUtils.cs 66.23% <0%> (-13.64%) ⬇️
src/Microsoft.ML.Data/Transforms/NAFilter.cs 75.53% <0%> (-10.31%) ⬇️
src/Microsoft.ML.Featurizers/Common.cs 19.58% <0%> (-9.28%) ⬇️
....ML.AutoML/PipelineSuggesters/PipelineSuggester.cs 79.83% <0%> (-7.57%) ⬇️
...icrosoft.ML.AutoML/Experiment/SuggestedPipeline.cs 88.65% <0%> (-4.13%) ⬇️
...classClassification/MulticlassNaiveBayesTrainer.cs 88.88% <0%> (-3.81%) ⬇️
...crosoft.ML.Core/ComponentModel/ComponentCatalog.cs 67.36% <0%> (-3.69%) ⬇️
...rosoft.ML.AutoML/ColumnInference/TextFileSample.cs 59.6% <0%> (-2.65%) ⬇️
... and 128 more

CompareSelectedScalarColumns<int>("Shape", "Shape1", transformedData, onnxResult);
CompareSelectedScalarColumns<int>("Thickness", "Thickness1", transformedData, onnxResult);
CompareSelectedScalarColumns<bool>("Label", "Label1", transformedData, onnxResult);
}
Copy link
Member

@codemzs codemzs Dec 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for linux/centos platforms where ORT is not supported can we have a baseline test to compare against? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added baseline comparison


In reply to: 361510683 [](ancestors = 361510683)

var dstVariable = ctx.AddIntermediateVariable(dstCol.Type, dstCol.Name, true);
string opType = "Identity";
ctx.CreateNode(opType, srcVariable, dstVariable, ctx.GetNodeName(opType), "");
}
Copy link
Member

@codemzs codemzs Dec 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are you dropping remaining columns? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, only the columns that have connections are selected. So the columns not in the OutputToInputMap are automatically dropped.

But your comment made me look deeper into the output schema. I have fixed the output schema with an updated to OnnxTransformer.


In reply to: 361511119 [](ancestors = 361511119)

CompareSelectedScalarColumns<int>("Size", "Size1", transformedData, onnxResult);
CompareSelectedScalarColumns<int>("Shape", "Shape1", transformedData, onnxResult);
CompareSelectedScalarColumns<int>("Thickness", "Thickness1", transformedData, onnxResult);
CompareSelectedScalarColumns<bool>("Label", "Label1", transformedData, onnxResult);
Copy link
Member

@codemzs codemzs Dec 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to make sure the columns you have not selected are not present in the output schema anymore, that is the point of select column transform. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to verify that the output schema contains only the selected columns


In reply to: 361511408 [](ancestors = 361511408)

@harishsk harishsk requested a review from codemzs January 22, 2020 21:24
Copy link
Member

@ganik ganik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@harishsk harishsk merged commit cb3060d into dotnet:master Jan 27, 2020
@harishsk harishsk deleted the selectcolumns branch April 21, 2020 23:57
@ghost ghost locked as resolved and limited conversation to collaborators Mar 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants