Provide a trie-based alternative to UnicodeSet #2220
Labels
A-performance
Area: Performance (CPU, Memory)
C-unicode
Component: Props, sets, tries
help wanted
Issue needs an assignee
T-enhancement
Type: Nice-to-have but not required
Milestone
The ICU4X composing normalizer uses a
UnicodeSet
for a fast-path pass-through check while the ICU4C composing normalizer uses a code point trie lookup. ICU4C ends up being faster ever after optimizing other aspects on the ICU4X side, including special-casing the lowest range of the set (the Latin range below the combining diacritics block).For a known-fragmented compile-time-known set, we should provide an alternative to
UnicodeSet
that uses the structure ofCodePointTrie
, but instead of wasting 7 bits of each value byte, divides the length of the value array by 8 and stores 8 logical bits in each byte.The text was updated successfully, but these errors were encountered: