-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Adding KeyToValueTransformers before finishing exporting to ONNX #4841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
1a826dc
Experimentations
antoniovs1029 009a84f
Removing MyTest method
antoniovs1029 f11c197
Removing unused comment
antoniovs1029 3a972f5
Fixing a change a made by mistake in an unrelated test
antoniovs1029 3e432a1
Added test for debugging purposes
antoniovs1029 11b84d0
Add support to map back keys from columns that where untouched by the…
antoniovs1029 a786d9f
Added security checks and removed redundant steps
antoniovs1029 cfa264d
Added todo comment and formatting
antoniovs1029 86fb14a
Skip Custom tests on CI
antoniovs1029 afec6d6
Merge remote-tracking branch 'upstream/master' into is23onnxnimbus
antoniovs1029 3e5b2bd
Removed my custom tests
antoniovs1029 28e447d
Merge remote-tracking branch 'upstream/master' into is23onnxnimbus
antoniovs1029 8864d8e
Added security checks to confirm that the PredictedLabel column has K…
antoniovs1029 d382373
Merge remote-tracking branch 'upstream/master' into is23onnxnimbus
antoniovs1029 ff9650e
Added HasKeyValues helper method and changed pass-through columns fil…
antoniovs1029 953ad7b
Fix extra space
antoniovs1029 9c63f1a
Merge remote-tracking branch 'upstream/master' into is23onnxnimbus
antoniovs1029 d70843f
Corrected and refactored GetPassThroughKeyDataViewTypeColumnsNames logic
antoniovs1029 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is another edge case: If the input has a categorical column, say
A
, and the pipeline has aCopyTransform
that copiesA
into a column namedB
.The way it is written now, we will not apply key-to-value on
B
. Should we?On one hand, we should, because
A
andB
should be copies of each other, so it does not make sense if the ONNX model has different types and values for them... On the other hand, I don't see an easy rule that would help decide when we do need to apply key-to-value toB
(when it is the output of Copy), versus when we don't (for example, when it is the output of ToKey).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar issue with Concat: say the input has two categorical columns
A
andB
, and the pipeline has a concat transform that concatenates them to produce a columnC
...In reply to: 383580868 [](ancestors = 383580868)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally I think we should apply KeyToValueTransformer to B. But, as you say, I can't think of an easy way to decide when to do this when it's the output of Copy. So I wouldn't know if we should handle this particular edge case.
Thoughts, @ganik , @harishsk ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, that is a bit strange side-effect. Perhaps in next iterations we can get more details on how AutoML prefers this. Maybe we wont need to add KeyToValue for passthrough columns at all and let them surface as keys. For now we can document this.