Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TokenizingByCharacters export to Onnx #4805

Merged
merged 6 commits into from
Feb 7, 2020

Conversation

Lynx1820
Copy link
Contributor

@Lynx1820 Lynx1820 commented Feb 6, 2020

  • Transformer that tokenizes by character and returns the characters (as uint16)
  • Since there's not a comparable onnx operator, a label encoder is used to map a string token to it's corresponding character value. This will unfortunately make the model much larger, since 65535 values have to be saved as a mapping guide for label encoder.

@Lynx1820 Lynx1820 requested a review from a team as a code owner February 6, 2020 23:23
@harishsk
Copy link
Contributor

harishsk commented Feb 6, 2020

/// | Exportable to ONNX | No |

Please change this line whenever you add new support


Refers to: src/Microsoft.ML.Transforms/Text/TokenizingByCharacters.cs:610 in 0532330. [](commit_id = 0532330, deletion_comment = False)

@Lynx1820 Lynx1820 merged commit daaea53 into dotnet:master Feb 7, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants