Fix localization detection by hikilaka · Pull Request #85 · microsoft/edit

hikilaka · 2025-05-20T03:06:21Z

In localization initialization, we now check if the system-preferred
language starts with a prefix rather than checking for equality. This
should now correctly detect languages.

Instead of having the latest element in enums act as the count for
enums, we can use std::mem::variant_count.

This also fixes #74

Instead of having the latest element in enums act as the count for enums, we can use std::mem::variant_count.

In localization initialization, we now check if the system-preffered language starts with a prefix rather than checking for equality. This should now correctly detect languages. This fixes microsoft#74. fix typo

src/bin/edit/localization.rs

DHowett · 2025-05-21T04:17:09Z

Thanks! I'll leave this to Leonard to merge, if he has any feedback

lhecker · 2025-05-21T12:14:31Z

src/bin/edit/localization.rs

    let mut lang = LangId::en;

    for l in langs {
+        println!("lang: {}", l);


There's some leftover debug code.

lhecker · 2025-05-21T12:15:16Z

src/bin/edit/main.rs

 // Licensed under the MIT License.

-#![feature(let_chains, linked_list_cursors, os_string_truncate, string_from_utf8_lossy_owned)]
+#![feature(let_chains, linked_list_cursors, os_string_truncate, string_from_utf8_lossy_owned, variant_count)]


We also received very rightful criticism that we shouldn't depend on too many nightly features. As such, I think we should back out this particular change.

lhecker · 2025-05-21T12:16:26Z

src/helpers.rs

+        // If the bytes don't match, it's not a prefix
+        if text_byte != prefix_byte {


What I meant is that we need a case-insensitive starts_with function. Since the region codes are expected to be ASCII-only we could write an ASCII-only starts_with function without relying on ICU to do the comparison (since ICU may be unavailable).

lhecker · 2025-05-21T12:17:52Z

src/helpers.rs

+    for i in 0..prefix.len() {
+        // Get the ASCII byte from both text and prefix
+        let text_byte = text_bytes[i];
+        let prefix_byte = prefix_bytes[i];


I forgot to mention that you can use text_bytes.iter().zip(prefix_bytes.iter()) for such patterns. See here: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.zip

lhecker · 2025-05-21T19:00:35Z

I pushed directly in your PR as I unfortunately messed up a bit in the unix code and I felt like it would fit if I used your PR to fix both things at once. I apologize.

lhecker · 2025-05-21T19:02:52Z

src/bin/edit/localization.rs

+    const LANG_MAP: &[(&str, LangId)] = &[
+        ("en", LangId::en),
+        // ----------------
+        ("de", LangId::de),
+        ("es", LangId::es),
+        ("fr", LangId::fr),
+        ("it", LangId::it),
+        ("ja", LangId::ja),
+        ("ko", LangId::ko),
+        ("pt-br", LangId::pt_br),
+        ("ru", LangId::ru),
+        ("zh-hant", LangId::zh_hant),
+        ("zh-tw", LangId::zh_hant),
+        ("zh", LangId::zh_hans),
+    ];


This trick is neat IMO because it reduces the binary size quite a bit. It also allows the compiler to inline the starts_with_ignore_ascii_case call since there's only 1 call to it now.

lhecker · 2025-05-21T19:08:01Z

src/helpers.rs

+        // Since the comparison is ASCII, we don't need to worry about that.
+        let s = self.as_bytes();
+        let p = prefix.as_bytes();
+        p.len() <= s.len() && s[..p.len()].eq_ignore_ascii_case(p)


Makes the code a lot shorter and easier to read IMO.
I'm not gonna lie here, Copilot told me about eq_ignore_ascii_case. Convenient search engine. 😅

lhecker · 2025-05-21T19:08:40Z

src/sys/unix.rs

+                let mut res = Vec::new_in(arena);
+                res.extend(s.as_bytes().iter().map(|&b| if b == b'_' { b'-' } else { b }));
+                unsafe { ArenaString::from_utf8_unchecked(res) }


I broke this by giving wrong direction over in #104. This code has the benefit of being more compact (no extra string copies).
(Iterators in Rust have size hints so the .extend() call knows exactly how many items it needs to expect and can preallocate accordingly.)

Closes microsoft#74 Co-authored-by: Leonard Hecker <leonard@hecker.io>

hikilaka added 2 commits May 19, 2025 21:59

Remove C-style count from enums in localization.

a56f43e

Instead of having the latest element in enums act as the count for enums, we can use std::mem::variant_count.

Fix localizations not being detected properly

3e1c6b9

In localization initialization, we now check if the system-preffered language starts with a prefix rather than checking for equality. This should now correctly detect languages. This fixes microsoft#74. fix typo

hikilaka force-pushed the fix-localization branch from ffda5eb to 3e1c6b9 Compare May 20, 2025 03:09

lhecker reviewed May 20, 2025

View reviewed changes

src/bin/edit/localization.rs Outdated Show resolved Hide resolved

lhecker mentioned this pull request May 20, 2025

Fix language match on Linux (Unix?) #104

Merged

add starts_with_ascii helper

8639972

hikilaka requested a review from lhecker May 21, 2025 03:40

DHowett approved these changes May 21, 2025

View reviewed changes

Kyza mentioned this pull request May 21, 2025

refactor(l10n): macro gen and verify all strings exist #179

Closed

lhecker reviewed May 21, 2025

View reviewed changes

lhecker added 5 commits May 21, 2025 20:28

Merge remote-tracking branch 'origin/main' into fix-localization

560a166

Simplify comparison helper

d17076f

Use a mapping table to reduce binary size

e07131e

Fix pt_BR vs pt-br for unix (my bad)

119277c

Remove unnecessary lowercasing on Windows

74f1cfe

lhecker added 2 commits May 21, 2025 21:01

Reduce reliance on nightly Rust

815c70d

Remove convenience test code

1d50270

DHowett approved these changes May 21, 2025

View reviewed changes

lhecker reviewed May 21, 2025

View reviewed changes

Merge branch 'main' into fix-localization

de0feaa

DHowett approved these changes May 21, 2025

View reviewed changes

lhecker merged commit e59e70a into microsoft:main May 21, 2025
1 check passed

diabloproject pushed a commit to diabloproject/edit that referenced this pull request May 29, 2025

Fix localization detection (microsoft#85)

deef1b3

Closes microsoft#74 Co-authored-by: Leonard Hecker <leonard@hecker.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix localization detection#85

Fix localization detection#85
lhecker merged 11 commits intomicrosoft:mainfrom
hikilaka:fix-localization

hikilaka commented May 20, 2025

Uh oh!

Uh oh!

DHowett commented May 21, 2025

Uh oh!

lhecker May 21, 2025

Uh oh!

lhecker May 21, 2025

Uh oh!

lhecker May 21, 2025

Uh oh!

lhecker May 21, 2025

Uh oh!

lhecker commented May 21, 2025

Uh oh!

lhecker May 21, 2025

Uh oh!

lhecker May 21, 2025

Uh oh!

lhecker May 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		// If the bytes don't match, it's not a prefix
		if text_byte != prefix_byte {

Conversation

hikilaka commented May 20, 2025

Uh oh!

Uh oh!

DHowett commented May 21, 2025

Uh oh!

lhecker May 21, 2025

Choose a reason for hiding this comment

Uh oh!

lhecker May 21, 2025

Choose a reason for hiding this comment

Uh oh!

lhecker May 21, 2025

Choose a reason for hiding this comment

Uh oh!

lhecker May 21, 2025

Choose a reason for hiding this comment

Uh oh!

lhecker commented May 21, 2025

Uh oh!

lhecker May 21, 2025

Choose a reason for hiding this comment

Uh oh!

lhecker May 21, 2025

Choose a reason for hiding this comment

Uh oh!

lhecker May 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants