-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement hardcoded ICU transliterators #3910
Comments
Does |
Not in the usual place, so if it did, I wouldn't know where.
All code based transliterators are merely for performance reasons + saved human implementation time, as transform rules can implement arbitrary transforms. |
(In the specific case of Any-Hex, it should even be fairly simple to generate rule files for them. I'm not sure if this also applies to NFC, etc) |
There are open PRs (#3946, #3965) that add support for many such transliterators:
These make most of CLDR data usable, and can serve as examples for implementing the remainder. Notably still missing for full CLDR support:
ICU supports more than those. See the ICU4J directory for a full list. |
There are a few rule-defined |
Is it correct that |
Correct! IIRC there are no dangling implementations, everything should be linked in icu4x/components/experimental/src/transliterate/transliterator/mod.rs Lines 341 to 386 in 6b5a69c
|
For feature parity with ICU we need some transliterators that ICU defines not using rule sources but in code. A good (maybe even complete) starting point is this directory: https://github.com/unicode-org/icu/tree/main/icu4j/main/classes/translit/src/com/ibm/icu/text
For example,
EscapeTransliterator.java
is responsible for the manyAny-Hex
variants that exist.Some transliterators also have related components in ICU4X, like
Any-NFC
, so those should be implemented by reusing the ICU4X components and data.Users can create these transliterators using BCP-47 IDs that are defined in #3909.
The text was updated successfully, but these errors were encountered: