Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to get language prefs for categories #155

Open
wants to merge 1 commit into
base: v2
Choose a base branch
from

Conversation

cjohnson19
Copy link
Contributor

@cjohnson19 cjohnson19 commented Mar 4, 2025

Adds a new lang_category function which accepts the POSIX standard locale category values. Each value is checked in priority order as:

  1. LC_ALL
  2. LC_xxx
  3. LANG

Closes #145

@cjohnson19
Copy link
Contributor Author

Oh, duh. I would have to implement this for the other OS targets as well if I want to have it be exposed to consumers of the library. It seems like Windows has a similar interface for these If I'm reading the documentation correctly. I'll mark this as a draft and work on adding support there.

@cjohnson19 cjohnson19 marked this pull request as draft March 4, 2025 02:37
@AldaronLau
Copy link
Member

AldaronLau commented Mar 4, 2025

@cjohnson19 Thanks for the PR, this is great! Not sure if you're ready for feedback yet, but I think it would be best to change / replace the existing langs() API as part of the 2.0 update (rather than add a new API). I don't think it ever was designed just right from my attempts.

I had in mind something like this:

#[non_exhaustive]
pub struct LanguagePrefs {
    // maybe not `pub`, and have getters returning `impl Iterator<Item = Language>`?
    pub monetary: Vec<Language>,
    pub numeric: Vec<Language>,
    pub time: Vec<Language>,
    pub message: Vec<Language>,
    // ...
}

// Maybe named `lang_prefs` or `localization_info`?
pub fn lang_prefs() -> Result<LanguagePrefs> {
    todo!()
}

For platforms without the lang category support, I think it should set each category to the "main language preferences" (I'm also okay with deferring Windows edit --and other operating systems-- support to a later PR, and using fallback behavior). I'm not fully sure my idea is the design to commit to yet, though - so I'd love to hear if you have other ideas.

I think the lang_category() API you have might not be ideal, because if you wanted to check all the categories, you're now checking the environment variable LC_ALL a bunch of times in a row. It also requires the user to grab all the different kinds of lang categories from the OS manually (going through the new enum), which whoami could just provide. Also, from an API design standpoint, current whoami APIs keep the enums as return values (not used for parameters), which I think it keeps it simple and easy-to-use.

An alternative idea I have is

#[non_exhaustive]
pub enum LocalePreference {
    Any(Language, Country),
    Message(Language, Country),
    Numeric(Language, Country),
    Time(Language, Country),
}

// or
pub struct LocalePreference {
    // would be `Any`, `Message`, `Numeric`, `Time`, etc.
    kind: LocalePreferenceKind,
    language: Language,
    country: Country,
}

pub fn locale() -> Result<impl Iterator<Item = LocalePreference>> {
    todo!()
}

Apologies for not writing all my ideas in the original issue, and keeping it in my head until now. Thoughts?

@cjohnson19
Copy link
Contributor Author

Apologies for not writing all my ideas in the original issue, and keeping it in my head until now.

No, thank you for the guidance! Your approach makes a lot more sense! Your approach of creating a new structure to return looks nice to me.

When it comes to providing these categories in the language response, should we override categories with the precedence that is defined in POSIX or should we return them literally? For example, if a user's locale setup is:

LC_COLLATE="de_DE.UTF-8"
# ...
LC_ALL="en_US.UTF-8"

Should we return the COLLATE as en_US or de_DE?

My gut tells me we handle the precedence, but I can see how some may find that confusing.

@AldaronLau
Copy link
Member

AldaronLau commented Mar 5, 2025

No, thank you for the guidance! Your approach makes a lot more sense! Your approach of
creating a new structure to return looks nice to me.

Yeah, looking on it again, I think that's probably the easiest for consumers.

When it comes to providing these categories in the language response, should we override categories with the precedence that is defined in POSIX or should we return them literally? For example, if a user's locale setup is:

LC_COLLATE="de_DE.UTF-8"
# ...
LC_ALL="en_US.UTF-8"

Should we return the COLLATE as en_US or de_DE?

My gut tells me we handle the precedence, but I can see how some may find that confusing.

I think following the docs, and override returning [en/US]. I don't think it would necessarily be a bad thing though, to merge if they're both present and return [en/US,de/DE], preferring the one that takes precedence, and adding a fallback for more localized behavior than "default available translation" if no matched language is present.

@cjohnson19 cjohnson19 force-pushed the locale-categories branch 3 times, most recently from 92b94ad to 8e43f76 Compare March 5, 2025 17:54
The `langs` function, `lang_prefs` now, tries to get the user's
preferences for each locale category defined within the GNU spec.
@cjohnson19 cjohnson19 marked this pull request as ready for review March 5, 2025 18:06
@cjohnson19
Copy link
Contributor Author

Sorry for all the failed tests! Don't hesitate to let me know where my code isn't good or is confusing. I couldn't find specific things fix, but felt that I had made some mistakes along the way.

@@ -2,8 +2,9 @@ use std::{env, ffi::OsString};

use crate::{
conversions,
language::LanguagePrefs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The language:: prefix can be removed, since LanguagePrefs is re-exported in the root module.

/// Resulting fields are sorted in order of the user's preference.
///
/// POSIX locale values and GNU nonstandard categories defined in
/// <https://man7.org/linux/man-pages/man5/locale.5.html>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think probably link to Windows categories here too?

//! return enums, and [`langs()`] that returns an iterator of [`String`]s). The
//! following example shows how to use all of the functions (except those that
//! return [`OsString`]):
//! return enums, and [`lang_prefs()`] that returns an iterator of [`String`]s).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be changed to LanguagePrefs instead of "iterator of [String]s.

@@ -52,15 +52,17 @@ use std::{
io::{Error, ErrorKind},
};

use crate::{Arch, DesktopEnv, Platform, Result};
use crate::{
language::LanguagePrefs, Arch, DesktopEnv, Language, Platform, Result,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also, can get rid of language:: in path

Error::new(kind, e)
})
fn unix_lang() -> Result<LanguagePrefs> {
let get_env_var = |var: &str| match env::var(var) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Rust method naming usually 'get' is implicit. So I think, env_var is a more "idiomatic" name for the closure.

/// Determines the interpretation of byte sequences as characters (e.g.,
/// single versus multibyte characters), character classifications (e.g.,
/// alphabetic or digit), and the behavior of character classes.
pub ctype: Vec<Language>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is the name used by the environment variables, but I think something like char_classes is more descriptive / a better name.

/// Determines collation rules used for sorting and regular expressions,
/// including character equivalence classes and multicharacter collating
/// elements.
pub collate: Vec<Language>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe it should be collation instead of collate

pub struct LanguagePrefs {
/// Determines general user language preference, should be used in
/// situations which are not encompassed by other [`LanguagePrefs`].
pub lang: Vec<Language>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but I think "messages" generally could be used as the "catch-all", so I'm not sure this field is needed

Copy link
Member

@AldaronLau AldaronLau Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, actually I'm thinking about this maybe being something that's needed. Since there's two possibilities:

  1. Remove lang - each other field, appends what would have been in lang if the Language wasn't already contained in the Vec
  2. Rename lang to fallback - each other field is an empty Vec if it the environment variable isn't set, add methods for getting each language category:
fn collation_langs(&self) -> impl Iterator<Item = Lang> {
    // not sure if this compiles, but this idea
    self.collate.iter()
        .cloned()
        .zip(self.fallback.iter().filter(|lang| {
            !self.collate.contains(&lang)
        }).cloned())
}

I'm leaning towards the second option, since it's less allocations / duplications on the simple/common case

@@ -57,6 +57,20 @@ impl Language {
}
}

/// Reads an `ll_CC` formatted string into a [`Language`].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be a normal comment (not a doc comment). It also took me a bit to understand what ll_CC meant (full format would also be "lang_COUNTRY.Encoding, with lang being a 2-letter language code, and COUNTRY being 2-letter country code").

writeln!(f, "LC_MESSAGES=\"{}\"", format_langs(&self.messages))?;
writeln!(f, "LC_MONETARY=\"{}\"", format_langs(&self.monetary))?;
writeln!(f, "LC_NUMERIC=\"{}\"", format_langs(&self.numeric))?;
write!(f, "LC_TIME=\"{}\"", format_langs(&self.time))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the formatting here is probably not what we want (too specific to environment variable names).

This implementation doesn't need to be in this PR, but the examples should use the Display implementation if it exists.

Perhaps something like:

Collation=en/US,de/DE;CharClass=en/US,Messages=en/US,Monetary=en/US;Numeric=en/US,Time=en/US

Copy link
Member

@AldaronLau AldaronLau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to stop mid-review, but checked everything now. I know I made a lot of comments, but I think it's mostly minor things (like using use language:: when exported in root), this is great!

// Languages that have a higher global precedence than their specific
// counterparts
if let Some(l) = &lang {
if l == "C" || l == "POSIX" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the POSIX "lang" documented anywhere that could be linked here?

@@ -6,14 +6,15 @@ use std::{
};

use crate::{
language::{Language, LanguagePrefs},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to remove language:: path prefix here too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better Unix Locale Support
2 participants