Skip to content

Fix localization detection#85

Merged
lhecker merged 11 commits intomicrosoft:mainfrom
hikilaka:fix-localization
May 21, 2025
Merged

Fix localization detection#85
lhecker merged 11 commits intomicrosoft:mainfrom
hikilaka:fix-localization

Conversation

@hikilaka
Copy link
Contributor

In localization initialization, we now check if the system-preferred
language starts with a prefix rather than checking for equality. This
should now correctly detect languages.

Instead of having the latest element in enums act as the count for
enums, we can use std::mem::variant_count.

This also fixes #74

hikilaka added 2 commits May 19, 2025 21:59
Instead of having the latest element in enums act as the count for
enums, we can use std::mem::variant_count.
In localization initialization, we now check if the system-preffered
language starts with a prefix rather than checking for equality. This
should now correctly detect languages.

This fixes microsoft#74.

fix typo
@hikilaka hikilaka force-pushed the fix-localization branch from ffda5eb to 3e1c6b9 Compare May 20, 2025 03:09
@hikilaka hikilaka requested a review from lhecker May 21, 2025 03:40
@DHowett
Copy link
Member

DHowett commented May 21, 2025

Thanks! I'll leave this to Leonard to merge, if he has any feedback

let mut lang = LangId::en;

for l in langs {
println!("lang: {}", l);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some leftover debug code.

// Licensed under the MIT License.

#![feature(let_chains, linked_list_cursors, os_string_truncate, string_from_utf8_lossy_owned)]
#![feature(let_chains, linked_list_cursors, os_string_truncate, string_from_utf8_lossy_owned, variant_count)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also received very rightful criticism that we shouldn't depend on too many nightly features. As such, I think we should back out this particular change.

src/helpers.rs Outdated
Comment on lines 296 to 297
// If the bytes don't match, it's not a prefix
if text_byte != prefix_byte {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that we need a case-insensitive starts_with function. Since the region codes are expected to be ASCII-only we could write an ASCII-only starts_with function without relying on ICU to do the comparison (since ICU may be unavailable).

src/helpers.rs Outdated
Comment on lines 291 to 294
for i in 0..prefix.len() {
// Get the ASCII byte from both text and prefix
let text_byte = text_bytes[i];
let prefix_byte = prefix_bytes[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention that you can use text_bytes.iter().zip(prefix_bytes.iter()) for such patterns. See here: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.zip

@lhecker
Copy link
Member

lhecker commented May 21, 2025

I pushed directly in your PR as I unfortunately messed up a bit in the unix code and I felt like it would fit if I used your PR to fix both things at once. I apologize.

Comment on lines +922 to +936
const LANG_MAP: &[(&str, LangId)] = &[
("en", LangId::en),
// ----------------
("de", LangId::de),
("es", LangId::es),
("fr", LangId::fr),
("it", LangId::it),
("ja", LangId::ja),
("ko", LangId::ko),
("pt-br", LangId::pt_br),
("ru", LangId::ru),
("zh-hant", LangId::zh_hant),
("zh-tw", LangId::zh_hant),
("zh", LangId::zh_hans),
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trick is neat IMO because it reduces the binary size quite a bit. It also allows the compiler to inline the starts_with_ignore_ascii_case call since there's only 1 call to it now.

// Since the comparison is ASCII, we don't need to worry about that.
let s = self.as_bytes();
let p = prefix.as_bytes();
p.len() <= s.len() && s[..p.len()].eq_ignore_ascii_case(p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes the code a lot shorter and easier to read IMO.
I'm not gonna lie here, Copilot told me about eq_ignore_ascii_case. Convenient search engine. 😅

Comment on lines +541 to +543
let mut res = Vec::new_in(arena);
res.extend(s.as_bytes().iter().map(|&b| if b == b'_' { b'-' } else { b }));
unsafe { ArenaString::from_utf8_unchecked(res) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I broke this by giving wrong direction over in #104. This code has the benefit of being more compact (no extra string copies).
(Iterators in Rust have size hints so the .extend() call knows exactly how many items it needs to expect and can preallocate accordingly.)

@lhecker lhecker merged commit e59e70a into microsoft:main May 21, 2025
1 check passed
diabloproject pushed a commit to diabloproject/edit that referenced this pull request May 29, 2025
Closes microsoft#74

Co-authored-by: Leonard Hecker <leonard@hecker.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Localization is broken

3 participants