Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not discard locale modifiers (@variants) #946

Closed
madduck opened this issue Jan 17, 2023 · 2 comments · Fixed by #947
Closed

Do not discard locale modifiers (@variants) #946

madduck opened this issue Jan 17, 2023 · 2 comments · Fixed by #947

Comments

@madduck
Copy link
Contributor

madduck commented Jan 17, 2023

Locale modifiers ("@variants") are described in the GNU gettext documentation like this:

The ‘@variant’ can denote any kind of characteristics that is not already implied by the language ll and the country CC. […] It can also denote a dialect of the language, …

This seems useful for various purposes, and yet, the Babel source code completely discards the modifiers, saying "we don't care about them":

In [1]: import babel

In [2]: babel.parse_locale('de_DE@formal')
Out[2]: ('de', 'DE', None, None)

This is especially problematic, because "private use subtags", as defined by RFC4646 are not properly handled by Babel, and neither are the private use region subtags 'AA', 'ZZ', and in the ranges 'QM'-'QZ' and 'XA'-'XZ', which basically leaves an implementer no choice to extend translations beyond the strict scope of the provided locale-data.

The RFC makes it clear that private use subtags and private region subtags aren't primary choice mechanisms, and should only be used where no other approach suffices.

Python's own gettext handles locale modifiers just like you'd expect, and while it seems that the Locale class would need to be modified to represent modifiers, there appears no reason why "we should not care" about these.

Please do not discard them. I am happy to have a shot at providing a PR if replies here are encouraging.

@akx
Copy link
Member

akx commented Jan 18, 2023

Hi, thanks for chiming in!

One issue I see here is that Babel does currently not have support for custom locales (#454), and de_DE@formal sounds like one (and so do private use tags), i.e. any Locale.parse() (which is an unfortunate name for a function that actually loads a locale, but that's a future Babel 3 issue) for such will fail with an UnknownLocaleError.

For the time being, to allow this (very valid!) use, I think

  • parse_locale should be extended to keep the modifier
  • locale loading should be taught to fail if a parsed locale has a modifier
  • the translations mechanism should be extended/loosened to support well-formed locale identifiers that might not have locale data available in Babel.

If you're willing to give a PR a shot, please do!

@madduck
Copy link
Contributor Author

madduck commented Jan 18, 2023

Thanks for your kind reply, @akx.

I already started poking at the code, and sure thing: it's not as trivial as it might have seemed at the start. Lol.

Getting parse_locale to keep the modifier won't be hard.

I don't think that loading a locale with a modifier should fail per se. In general, if e.g. de_DE@formal cannot be loaded, then de_DE should be tried, just like de is loaded if de_DE were unavailable, am I right?

I need this functionality for PO files, where de_DE@formal should be a translation of its own. The benefit of using de_DE@formal rather than introducing a new language or abusing the region code is precisely the fallback, because most strings in de_DE@formal would be unmodified from de_DE, so that a de_DE@formal PO file could just have those strings where the difference matters.

The real challenge will be to maintain API compatibility / provide proper integration into gettext, which — as I said above — does the right thing:

% cat test.py 
#!/usr/bin/python3
# _*_ coding:utf-8 _*_

from gettext import bindtextdomain, textdomain, gettext
import os

bindtextdomain(textdomain(), localedir=os.path.join(os.getcwd(), 'translations'))

def main():
    print(gettext("You"))

if __name__ == "__main__":
    main()

% ./test.py 
You

% LANGUAGE=de ./test.py    
Du

% LANGUAGE=de@formal ./test.py 
Sie

% LANGUAGE=de_DE@formal ./test.py 
Sie

% LANGUAGE=de_DE ./test.py 
Du

% ls translations 
de  de@formal  messages.pot

madduck added a commit to madduck/babel that referenced this issue Jan 18, 2023
Locale modifiers ("@Variants") are described in the GNU gettext
documentation like this:

> The ‘@variant’ can denote any kind of characteristics that is not
> already implied by the language ll and the country CC. […] It can also
> denote a dialect of the language, …

Wherein Babel previously would discard these, this patch stores the
modifier information in the `Locale` objects, handling string
representation accordingly.

Not implemented is the lookup of a meaningful description of modifiers,
but instead — for now — an identity mapping is provided.

Resolves: python-babel#946
Signed-off-by: martin f. krafft <madduck@madduck.net>
madduck added a commit to madduck/babel that referenced this issue Jan 18, 2023
Locale modifiers ("@Variants") are described in the GNU gettext
documentation like this:

> The ‘@variant’ can denote any kind of characteristics that is not
> already implied by the language ll and the country CC. […] It can also
> denote a dialect of the language, …

Wherein Babel previously would discard these, this patch stores the
modifier information in the `Locale` objects, handling string
representation accordingly.

Not implemented is the lookup of a meaningful description of modifiers,
but instead — for now — an identity mapping is provided.

Resolves: python-babel#946
Signed-off-by: martin f. krafft <madduck@madduck.net>
madduck added a commit to madduck/babel that referenced this issue Jan 20, 2023
Locale modifiers ("@Variants") are described in the GNU gettext
documentation like this:

> The ‘@variant’ can denote any kind of characteristics that is not
> already implied by the language ll and the country CC. […] It can also
> denote a dialect of the language, …

Wherein Babel previously would discard these, this patch stores the
modifier information in the `Locale` objects, handling string
representation accordingly.

Not implemented is the lookup of a meaningful description of modifiers,
but instead — for now — an identity mapping is provided.

Resolves: python-babel#946
Signed-off-by: martin f. krafft <madduck@madduck.net>
madduck added a commit to madduck/babel that referenced this issue Jan 25, 2023
Locale modifiers ("@Variants") are described in the GNU gettext
documentation like this:

> The ‘@variant’ can denote any kind of characteristics that is not
> already implied by the language ll and the country CC. […] It can also
> denote a dialect of the language, …

Wherein Babel previously would discard these, this patch stores the
modifier information in the `Locale` objects, handling string
representation accordingly.

Not implemented is the lookup of a meaningful description of modifiers,
but instead — for now — an identity mapping is provided.

Resolves: python-babel#946
Signed-off-by: martin f. krafft <madduck@madduck.net>
madduck added a commit to madduck/babel that referenced this issue Jan 25, 2023
Locale modifiers ("@Variants") are described in the GNU gettext
documentation like this:

> The ‘@variant’ can denote any kind of characteristics that is not
> already implied by the language ll and the country CC. […] It can also
> denote a dialect of the language, …

Wherein Babel previously would discard these, this patch stores the
modifier information in the `Locale` objects, handling string
representation accordingly.

Not implemented is the lookup of a meaningful description of modifiers,
but instead — for now — an identity mapping is provided.

Resolves: python-babel#946
Signed-off-by: martin f. krafft <madduck@madduck.net>
@akx akx closed this as completed in #947 Jan 26, 2023
akx added a commit that referenced this issue Jan 26, 2023
Locale modifiers ("@Variants") are described in the GNU gettext
documentation like this:

> The ‘@variant’ can denote any kind of characteristics that is not
> already implied by the language ll and the country CC. […] It can also
> denote a dialect of the language, …

Wherein Babel previously would discard these, this patch stores the
modifier information in the `Locale` objects, handling string
representation accordingly.

Resolves: #946
Signed-off-by: martin f. krafft <madduck@madduck.net>
Co-authored-by: Aarni Koskela <akx@iki.fi>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants