-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design architecture around low-cost locale parsing and storage #958
Labels
C-locale
Component: Locale identifiers, BCP47
R-obsolete
Resolution: This issue is no longer relevant
S-large
Size: A few weeks (larger feature, major refactoring)
T-core
Type: Required functionality
Milestone
Comments
sffc
added
C-locale
Component: Locale identifiers, BCP47
discuss
Discuss at a future ICU4X-SC meeting
labels
Aug 17, 2021
|
sffc
added
S-large
Size: A few weeks (larger feature, major refactoring)
T-core
Type: Required functionality
and removed
discuss
Discuss at a future ICU4X-SC meeting
labels
Aug 26, 2021
sffc
changed the title
Why not make a zero-copy Locale?
Design architecture around low-cost locale parsing and storage
Aug 26, 2021
Similar to #1034; possible duplicate |
This is resolved because we now have a well-documented solution for how to do zero-copy storage of locales. https://unicode-org.github.io/icu4x-docs/doc/icu_locid/zerovec/index.html |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C-locale
Component: Locale identifiers, BCP47
R-obsolete
Resolution: This issue is no longer relevant
S-large
Size: A few weeks (larger feature, major refactoring)
T-core
Type: Required functionality
Context: The Locale::from_bytes() function contributes a fair bit of bloat to the ICU4X binary. I am brainstorming ways to make it lighter.
In many cases, we encounter valid BCP-47 strings, and we need to parse them into a Locale.
We parse the language, script, and region subtags into TinyStr fields. However, for variants and Unicode extensions, we almost always need to allocate memory. Why? We could instead just point to substrings of the input BCP-47.
The Locale struct would be something along the lines of:
Meanwhile, the function to query the variants or extension subtags would essentially perform "on-the-fly" parsing.
Advantages of this model:
Disadvantages:
Should we consider doing something like UnicodeSet/UnicodeSetBuilder here? Perhaps we could use the existing code as a LocaleBuilder of sorts, and ZeroCopyLocale could be the lightweight version for runtime use.
CC @zbraniecki @Manishearth
The text was updated successfully, but these errors were encountered: