-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MurmurHash Onnx Export #5013
MurmurHash Onnx Export #5013
Conversation
// every type uses the same implementation on V2. | ||
// The V1 String Hashing Algorithm had the following properities: | ||
// - Case Conversion: used inside the hashing algorithm in ML.Net. | ||
// - Mock UTF8 encoding: strings in C# are UTF16 and need to be converted to UTF8 before hashing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please elaborate on what Mock UTF8 encoding is. If I recall correctly, we are omitting certain code pages.
@KsenijaS Didn't you have an implementation that supported the full UTF8 encoding? Can we use that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but also partial UTF8 support is sufficient. Mock UTF8 doesn't cover emojis and some special math characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report
@@ Coverage Diff @@
## master #5013 +/- ##
=======================================
Coverage 75.60% 75.60%
=======================================
Files 993 993
Lines 178509 178509
Branches 19197 19197
=======================================
Hits 134964 134964
Misses 38309 38309
Partials 5236 5236
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following data input types are currently not supported by onnxruntime's murmurHash operator: float, double, ulong, long and ordered hashing(vectors). Once added, ml.net will be able to support them as well.