Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MurmurHash Onnx Export #5013

Merged
merged 33 commits into from
May 5, 2020
Merged

MurmurHash Onnx Export #5013

merged 33 commits into from
May 5, 2020

Conversation

Lynx1820
Copy link
Contributor

@Lynx1820 Lynx1820 commented Apr 10, 2020

The following data input types are currently not supported by onnxruntime's murmurHash operator: float, double, ulong, long and ordered hashing(vectors). Once added, ml.net will be able to support them as well.

@Lynx1820 Lynx1820 marked this pull request as ready for review April 10, 2020 19:45
@Lynx1820 Lynx1820 requested a review from a team as a code owner April 10, 2020 19:45
@Lynx1820 Lynx1820 requested a review from KsenijaS April 10, 2020 19:52
@Lynx1820 Lynx1820 requested a review from yaeldekel April 13, 2020 19:40
@yaeldekel
Copy link

yaeldekel commented Apr 19, 2020

    public void TestHashTransformFloat()

Please add a backwards compatibility test for version 0x00010002 models. #Resolved


Refers to: test/Microsoft.ML.TestFramework/DataPipe/TestDataPipe.cs:1054 in 6e5dac7. [](commit_id = 6e5dac7, deletion_comment = False)

// every type uses the same implementation on V2.
// The V1 String Hashing Algorithm had the following properities:
// - Case Conversion: used inside the hashing algorithm in ML.Net.
// - Mock UTF8 encoding: strings in C# are UTF16 and need to be converted to UTF8 before hashing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate on what Mock UTF8 encoding is. If I recall correctly, we are omitting certain code pages.
@KsenijaS Didn't you have an implementation that supported the full UTF8 encoding? Can we use that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but also partial UTF8 support is sufficient. Mock UTF8 doesn't cover emojis and some special math characters

@yaeldekel yaeldekel self-requested a review April 26, 2020 07:05
Copy link

@yaeldekel yaeldekel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

Copy link
Contributor

@harishsk harishsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Lynx1820 Lynx1820 requested a review from yaeldekel April 29, 2020 03:16
@Lynx1820 Lynx1820 requested a review from yaeldekel May 4, 2020 23:12
@codecov
Copy link

codecov bot commented May 5, 2020

Codecov Report

Merging #5013 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #5013   +/-   ##
=======================================
  Coverage   75.60%   75.60%           
=======================================
  Files         993      993           
  Lines      178509   178509           
  Branches    19197    19197           
=======================================
  Hits       134964   134964           
  Misses      38309    38309           
  Partials     5236     5236           
Flag Coverage Δ
#Debug 75.60% <0.00%> (ø)
#production 71.56% <0.00%> (ø)
#test 88.63% <0.00%> (ø)

Copy link

@yaeldekel yaeldekel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Lynx1820 Lynx1820 merged commit c83ea54 into dotnet:master May 5, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants