Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

f58 not printing correctly on POSIX locale #3199

Closed
SteVwonder opened this issue Sep 9, 2020 · 30 comments
Closed

f58 not printing correctly on POSIX locale #3199

SteVwonder opened this issue Sep 9, 2020 · 30 comments

Comments

@SteVwonder
Copy link
Member

I'm running flux-core 0.19.0 on NERSC's Cori, and it is printing _ rather than f or ƒ for the f58 prefix:

herbein1@nid00073:/global/common/software/flux> flux mini submit -N2 -n64 sleep 10
_FEj8zWK
herbein1@nid00073:/global/common/software/flux> flux jobs
       JOBID USER     NAME       ST NTASKS NNODES  RUNTIME RANKS
    _FEj8zWK herbein1 sleep       R     64      2   1.553s [0-1]

The locale:

herbein1@nid00073:/global/common/software/flux> printenv | grep LC
herbein1@nid00073:/global/common/software/flux> locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
@SteVwonder
Copy link
Member Author

It looks like setting LC_ALL to POSIX remedies the issue. As a machine-specific fix, I can set LC_ALL to POSIX in our tcl module for flux, but I wonder if this can be solved more generically.

herbein1@nid00073:/global/common/software/flux> export LC_ALL=POSIX
herbein1@nid00073:/global/common/software/flux> flux jobs -a
       JOBID USER     NAME       ST NTASKS NNODES  RUNTIME RANKS
    fRAioxkf herbein1 sleep       R     64      2   3.155m [0-1]
    fFEj8zWK herbein1 sleep      CD     64      2   10.14s [0-1]
    f9Wtjk5m herbein1 hostname   CD     64      2   0.226s [0-1]
    f6Y5FTnP herbein1 hostname    F    128      -        - -
    f2XYY7BV herbein1 hostname   CD      2      2   0.089s [0-1]
herbein1@nid00073:/global/common/software/flux> locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=POSIX

@grondo
Copy link
Contributor

grondo commented Sep 9, 2020

We should be falling back to "f" when LC_CTYPE=POSIX or C. The detection uses MB_CUR_MAX(3), what does that function return in the failing environment?

#include <stdlib.h>
#include <stdio.h>
#include <locale.h>

int main (int ac, char **av)
{
    setlocale (LC_ALL, "");
    printf ("MB_CUR_MAX=%ld\n", MB_CUR_MAX);
}

(Edit: forgot that programs have to call setlocale() to initialize locale from environment. Added to test program above)

It doesn't make sense to me that there is a difference between LC_CTYPE=POSIX and LC_ALL=POSIX.

@grondo
Copy link
Contributor

grondo commented Sep 9, 2020

grondo@asp:~/git/flux-core.git$ locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
grondo@asp:~/git/flux-core.git$ ./mb
MB_CUR_MAX=1
grondo@asp:~/git/flux-core.git$ LANG=C.UTF-8 ./mb
MB_CUR_MAX=6

@grondo
Copy link
Contributor

grondo commented Sep 9, 2020

It looks like setting LC_ALL to POSIX remedies the issue. As a machine-specific fix, I can set LC_ALL to POSIX in our tcl module for flux, but I wonder if this can be solved more generically.

As seen above the idea was to solve this generally and it seems to at least nominally work on my system (and on TOSS systems IIRC). So I think this is a machine specific failure or some unanticipated corner case. If you are able to reproduce on another distro, please let me know. O/w, if we can start with the result of MB_CUR_MAX on Cori that would be super helpful.

Thanks!

@SteVwonder
Copy link
Member Author

We should be falling back to "f" when LC_CTYPE=POSIX or C. The detection uses MB_CUR_MAX(3),

Good to know. Thanks for that explanation!

O/w, if we can start with the result of MB_CUR_MAX on Cori that would be super helpful.

Here is the output from Cori. It is running SLES15, so I wonder if it would reproduce on openSUSE 15. I might be able to give that a shot later this week.

herbein1@cori02:~/locale-debugging> locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
herbein1@cori02:~/locale-debugging> ./mb
MB_CUR_MAX=1
herbein1@cori02:~/locale-debugging> export LC_ALL=POSIX
herbein1@cori02:~/locale-debugging> locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=POSIX
herbein1@cori02:~/locale-debugging> ./mb
MB_CUR_MAX=1
herbein1@cori02:~/locale-debugging> LC_ALL=C ./mb
MB_CUR_MAX=1
herbein1@cori02:~/locale-debugging> LC_ALL=C.UTF-8 ./mb
MB_CUR_MAX=6

It looks to me like the test script is behaving sanely. Hmm.

@grondo
Copy link
Contributor

grondo commented Sep 9, 2020

Interesting! I can't get this to reproduce on TOSS3, so there must be some corner case we're hitting on that machine.

I think we may have to do some debugging on Cori to figure out precisely where things are going wrong.

We could start with the following simple patch, just to double check that we're getting the behavior we expect when MB_CUR_MAX == 1:

diff --git a/src/common/libutil/fluid.c b/src/common/libutil/fluid.c
index d9f5fb1..d10897a 100644
--- a/src/common/libutil/fluid.c
+++ b/src/common/libutil/fluid.c
@@ -192,8 +192,12 @@ static int fluid_f58_encode (char *buf, int bufsz, fluid_t id)
     }

     /* Use alternate "f" prefix if locale is not multibyte */
-    if (!is_utf8_locale())
+    if (!is_utf8_locale()) {
         prefix = f58_alt_prefix;
+        fprintf (stderr, "Locale not multibyte capable, using 'f' FLUID prefix\n");
+    }
+    else
+        fprintf (stderr, "Using multibyte FLUID prefix\n");

     if (bufsz <= strlen (prefix) + 1) {
         errno = EOVERFLOW;

@SteVwonder
Copy link
Member Author

Ah, I think we are getting closer:

herbein1@cori02:~/Repositories/flux-framework/flux-core> flux mini run -N1 -n1 hostname
Using multibyte FLUID prefix
cori02
herbein1@cori02:~/Repositories/flux-framework/flux-core> flux jobs -a
       JOBID USER     NAME       ST NTASKS NNODES  RUNTIME RANKS
Using multibyte FLUID prefix
   _4rb5Wr2T herbein1 hostname   CD      1      1   0.287s 0
herbein1@cori02:~/Repositories/flux-framework/flux-core> LC_ALL=POSIX flux jobs -a
       JOBID USER     NAME       ST NTASKS NNODES  RUNTIME RANKS
Locale not multibyte capable, using f FLUID prefix
   f4rb5Wr2T herbein1 hostname   CD      1      1   0.287s 0

@grondo
Copy link
Contributor

grondo commented Sep 9, 2020

Thanks!

The is_utf8_locale() function is literally just:

static inline int is_utf8_locale (void)
{
    /* Assume if MB_CUR_MAX > 1, this locale can handle wide characters
     *  and therefore will properly handle UTF-8.
     */
    if (MB_CUR_MAX > 1)
        return 1;
    return 0;
}

So MB_CUR_MAX must be returning > 1 for some scenario, unless there is something I'm missing. Can you try editing that function to see the return value of MB_CUR_MAX in the default scenario on cori?

@grondo
Copy link
Contributor

grondo commented Sep 9, 2020

Ah, I had a thought. I wonder if the Python version on Cori is initializing the locale differently when LANG/LC_ALL are not set vs the Python on my test systems. What does flux python -V return?

Looks like I've been testing with 3.6.8 and 3.6.9.

@grondo
Copy link
Contributor

grondo commented Sep 9, 2020

A test that occurs to me is to use flux job id to avoid Python and see if we get a different answer:

$ flux job id --to=f58 123456789
ƒBukQL
$ LANG= flux job id --to=f58 123456789
fBukQL

@SteVwonder
Copy link
Member Author

I wonder if the Python version on Cori is initializing the locale differently when LANG/LC_ALL are not set vs the Python on my test systems.

I think you are on to something.

herbein1@cori02:~/Repositories/flux-framework/flux-core> flux job id --to=f58 f4rb5Wr2T
Locale not multibyte capable, using f FLUID prefix
f4rb5Wr2T
herbein1@cori02:~/Repositories/flux-framework/flux-core> LC_ALL=POSIX flux job id --to=f58 f4rb5Wr2T
Locale not multibyte capable, using f FLUID prefix
f4rb5Wr2T
herbein1@cori02:~/Repositories/flux-framework/flux-core> LC_ALL=C.UTF-8 flux job id --to=f58 f4rb5Wr2T
Using multibyte FLUID prefix
_4rb5Wr2T
herbein1@cori02:~/Repositories/flux-framework/flux-core> flux python -V
Python 3.7.4

I'll try with their Python3.6 installation.

@grondo
Copy link
Contributor

grondo commented Sep 9, 2020

I wonder if PEP538 has something to do with it? As far as I can gather, the coercion was introduced into Python 3.7.x.

@SteVwonder
Copy link
Member Author

Looks like it isn't an issue with their Python3.6 installation:

herbein1@cori02:~/Repositories/flux-framework/flux-core> flux mini run -N1 -n1 hostname
Locale not multibyte capable, using f FLUID prefix
cori02
herbein1@cori02:~/Repositories/flux-framework/flux-core> flux jobs -a
       JOBID USER     NAME       ST NTASKS NNODES  RUNTIME RANKS
Locale not multibyte capable, using f FLUID prefix
    f6PwoQAj herbein1 hostname   CD      1      1   0.160s 0
herbein1@cori02:~/Repositories/flux-framework/flux-core> flux python -V
Python 3.6.8 :: Anaconda, Inc.

I wonder if it is just a configure flag, or a version change (e.g., the PEP538 that you mentioned).

@SteVwonder
Copy link
Member Author

I wonder if it is just a configure flag

Nevermind, it looks like the UCS 2 vs 4 configure option was really only relevant for python2.

@grondo
Copy link
Contributor

grondo commented Sep 9, 2020

Can you try with the Python 3.7 version and set PYTHONCOERCECLOCALE=warn to see if a warning pops out?

@SteVwonder
Copy link
Member Author

herbein1@cori02:~/Repositories/flux-framework/flux-core> PYTHONCOERCECLOCALE=warn flux jobs -a
Python detected LC_CTYPE=C: LC_CTYPE coerced to C.UTF-8 (set another locale or PYTHONCOERCECLOCALE=0 to disable this locale coercion behavior).
       JOBID USER     NAME       ST NTASKS NNODES  RUNTIME RANKS
Using multibyte FLUID prefix
   _23ZfKMXu herbein1 hostname   CD      1      1   0.219s 0
herbein1@cori02:~/Repositories/flux-framework/flux-core> LC_ALL=POSIX PYTHONCOERCECLOCALE=warn flux jobs -a
Python runtime initialized with LC_CTYPE=C (a locale with default ASCII encoding), which may cause Unicode compatibility problems. Using C.UTF-8, C.utf8, or UTF-8 (if available) as alternative Unicode-compatible locales is recommended.
       JOBID USER     NAME       ST NTASKS NNODES  RUNTIME RANKS
Locale not multibyte capable, using f FLUID prefix
   f23ZfKMXu herbein1 hostname   CD      1      1   0.219s 0

I don't understand why Python is detecting LC_CTYPE to be C when it is POSIX. I confirmed that they aren't symlinked in /usr/share/i18n/locales/ to be the same.

@SteVwonder
Copy link
Member Author

SteVwonder commented Sep 9, 2020

It appears that the shell I have on Cori cannot handle unicode properly at all:

herbein1@cori02:~/Repositories/flux-framework/flux-core> echo -e "\xE2\x98\xA0"
_

Versus in the same terminal emulator but running locally on my mac:

❯ echo -e "\xE2\x98\xA0"
☠

So at the risk of stating the obvious, I'm guessing the issue is that Python3.7 is coercing the locale to a UTF-8 compatible one, so that Flux thinks it is safe to emit ƒ, but the underlying terminal truly cannot handle Unicode (it should stay POSIX/C). Anyone know what might be preventing my shell on Cori from properly handling Unicode? Maybe I just need to forward my LC_* variables through with SendEnv?

EDIT: looks like I already had locale forwarding on, Nersc blocks the forwarding of environment variables. But that's just a convenience thing I guess. Still not sure where the Unicode breakage is coming from.

EDIT2: It appears that it was a tmux-related issue. I was setting the locale within tmux to be UTF8 but the shell that the tmux server was spawned it was POSIX. After I added:

export LANG=en_US.utf8
export LC_ALL=en_US.utf8

to my ~/.bash_profile and then relaunched tmux, Unicode works as expected now.

@SteVwonder
Copy link
Member Author

Would it make sense for the flux wrapper to set PYTHONCOERCECLOCALE=0 to prevent Python 3.7+ from interfering with our Unicode detection? I wonder if that breaks anything on the TOSS-side of things.

@grondo
Copy link
Contributor

grondo commented Sep 9, 2020

That is a good idea.

However, remember that this will change the environment for all flux jobs as well, which is why we backed off of setting environment variables like this before in the flux command driver.

I'm honestly not sure the best way to proceed.

@grondo
Copy link
Contributor

grondo commented Sep 9, 2020

From coffee call we are going to try out the following solution:

  • Allow an environment variable to override current value of MB_CUR_MAX (e.g. FLUX_F58_FORCE_ASCII)
  • In Python front end utilities, attempt to read LC_CTYPE, LC_ALL and if not set or C or POSIX, then set FLUX_F58_FORCE_ASCII

@grondo
Copy link
Contributor

grondo commented Sep 9, 2020

@SteVwonder, want to see if the following patch works? Seems to at least make flux job id and flux jobs behave consistently on my test system.

diff --git a/src/bindings/python/flux/util.py b/src/bindings/python/flux/util.py
index 6cb1ef5..d467799 100644
--- a/src/bindings/python/flux/util.py
+++ b/src/bindings/python/flux/util.py
@@ -17,6 +17,7 @@ import os
 import math
 import argparse
 import traceback
+import locale
 from datetime import timedelta
 from string import Formatter

@@ -127,6 +128,10 @@ class CLIMain(object):
             self.logger = logger

     def __call__(self, main_func):
+        _, encoding = locale.getdefaultlocale()
+        if encoding in [ None, "C", "POSIX" ]:
+            os.environ["FLUX_F58_FORCE_ASCII"] = "1"
+
         loglevel = int(os.environ.get("FLUX_PYCLI_LOGLEVEL", logging.INFO))
         logging.basicConfig(
             level=loglevel, format="%(name)s: %(levelname)s: %(message)s"
diff --git a/src/common/libutil/fluid.c b/src/common/libutil/fluid.c
index d9f5fb1..fb3b6c2 100644
--- a/src/common/libutil/fluid.c
+++ b/src/common/libutil/fluid.c
@@ -174,7 +174,7 @@ static inline int is_utf8_locale (void)
     /* Assume if MB_CUR_MAX > 1, this locale can handle wide characters
      *  and therefore will properly handle UTF-8.
      */
-    if (MB_CUR_MAX > 1)
+    if (MB_CUR_MAX > 1 && !getenv ("FLUX_F58_FORCE_ASCII"))
         return 1;
     return 0;
 }

@SteVwonder
Copy link
Member Author

SteVwonder commented Sep 10, 2020

Unfortunately, if LC_ALL is not set, then getdefaultlocale seems to default to UTF-8 😢 . Maybe it is returning the value post-coercion? Looking at the cpython interpreter source code for getdefaultlocale, that is the only conclusion that I can come to (i.e., that the python coercion sets LC_ALL to en_us.UTF-8). Also related, TIL that (from LOCALE(1)):

Values for variables set in the environment are printed without double quotes, implied values are printed with double quotes.

So for all of the previous outputs of locale, the "POSIX" values were just implied 🤦 . Which is probably why python is taking it upon itself to so liberally coerce things to UTF-8. Nothing locale-wise is actually set on Cori.

herbein1@cori02:~/Repositories/flux-framework/flux-core> locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
herbein1@cori02:~/Repositories/flux-framework/flux-core> flux mini submit hostname
flux-mini: WARNING: Detected default locale: UTF-8
flux-mini: WARNING: Using autodetection for unicode support for F58
Using multibyte FLUID prefix
_76NrY9aX
herbein1@cori02:~/Repositories/flux-framework/flux-core> LANG="POSIX" flux mini submit hostname
flux-mini: WARNING: Detected default locale: UTF-8
flux-mini: WARNING: Using autodetection for unicode support for F58
Using multibyte FLUID prefix
_7A6h9bAF
herbein1@cori02:~/Repositories/flux-framework/flux-core> LANG="C" flux mini submit hostname
flux-mini: WARNING: Detected default locale: UTF-8
flux-mini: WARNING: Using autodetection for unicode support for F58
Using multibyte FLUID prefix
_7Bkn33UT
herbein1@cori02:~/Repositories/flux-framework/flux-core> LC_ALL="POSIX" flux mini submit hostname
flux-mini: WARNING: Detected default locale: None
flux-mini: WARNING: Forcing the use of ASCII for F58
Locale not multibyte capable, using f FLUID prefix
f7GEi38B1
herbein1@cori02:~/Repositories/flux-framework/flux-core> LC_ALL="C" flux mini submit hostname
flux-mini: WARNING: Detected default locale: None
flux-mini: WARNING: Forcing the use of ASCII for F58
Locale not multibyte capable, using f FLUID prefix
f7Hm6gJfR
herbein1@cori02:~/Repositories/flux-framework/flux-core> LC_ALL="en_US.UTF-8" flux mini submit hostname
flux-mini: WARNING: Detected default locale: UTF-8
flux-mini: WARNING: Using autodetection for unicode support for F58
Using multibyte FLUID prefix
_7LhSMndM

I don't understand why LANG doesn't change anything 🤔. BUT the override does seem to work correctly, which by itself is still a nice thing:

herbein1@cori02:~/Repositories/flux-framework/flux-core> FLUX_F58_FORCE_ASCII=1 LC_ALL="en_US.UTF-8" flux mini submit hostname
Detected default locale: UTF-8
Using autodetection for unicode support for F58
Locale not multibyte capable, using f FLUID prefix
f8ZGkArhM

Note: I added some extra logging to the util.py main decorator for the above code snippets:

diff --git a/src/bindings/python/flux/util.py b/src/bindings/python/flux/util.py                                                                                   [15/124]
index 6cb1ef5cd..a2602d83d 100644
--- a/src/bindings/python/flux/util.py
+++ b/src/bindings/python/flux/util.py
@@ -17,6 +17,7 @@ import os
 import math
 import argparse
 import traceback
+import locale
 from datetime import timedelta
 from string import Formatter

@@ -127,10 +128,18 @@ class CLIMain(object):
             self.logger = logger

     def __call__(self, main_func):
+        _, encoding = locale.getdefaultlocale()
+        print("Detected default locale: {}".format(encoding))
+        if encoding in [ None, "C", "POSIX" ]:
+            print("Forcing the use of ASCII for F58")
+            os.environ["FLUX_F58_FORCE_ASCII"] = "1"
+        else:
+            print("Using autodetection for unicode support for F58")
         loglevel = int(os.environ.get("FLUX_PYCLI_LOGLEVEL", logging.INFO))
         logging.basicConfig(
             level=loglevel, format="%(name)s: %(levelname)s: %(message)s"
         )
+
         exit_code = 0
         try:
             main_func()

@SteVwonder
Copy link
Member Author

SteVwonder commented Sep 10, 2020

Maybe it is returning the value post-coercion?

Some more evidence in support of that theory:

herbein1@cori02:~/Repositories/flux-framework/flux-core> PYTHONCOERCECLOCALE=warn LANG=C flux mini submit hostname
Python detected LC_CTYPE=C: LC_CTYPE coerced to C.UTF-8 (set another locale or PYTHONCOERCECLOCALE=0 to disable this locale coercion behavior).
Detected default locale: en_US, UTF-8
Using autodetection for unicode support for F58
Using multibyte FLUID prefix
_KYA6UFEo
herbein1@cori02:~/Repositories/flux-framework/flux-core> PYTHONCOERCECLOCALE=warn flux mini submit hostname
Python detected LC_CTYPE=C: LC_CTYPE coerced to C.UTF-8 (set another locale or PYTHONCOERCECLOCALE=0 to disable this locale coercion behavior).
Detected default locale: en_US, UTF-8
Using autodetection for unicode support for F58
Using multibyte FLUID prefix
_Kg1PZgns
herbein1@cori02:~/Repositories/flux-framework/flux-core> PYTHONCOERCECLOCALE=warn LC_ALL=C flux mini submit hostname
Python runtime initialized with LC_CTYPE=C (a locale with default ASCII encoding), which may cause Unicode compatibility problems. Using C.UTF-8, C.utf8, or UTF-8 (if available) as alternative Unicode-compatible locales is recommended.
Detected default locale: None, None
Forcing the use of ASCII for F58
Locale not multibyte capable, using f FLUID prefix
fLJQzcku5
herbein1@cori02:~/Repositories/flux-framework/flux-core> PYTHONCOERCECLOCALE=0 flux mini submit hostname
Detected default locale: None, None
Forcing the use of ASCII for F58
Locale not multibyte capable, using f FLUID prefix
fLTkmAJoh

@SteVwonder
Copy link
Member Author

SteVwonder commented Sep 10, 2020

Given that the issue appears to be on systems with absolutely no locale information set whatsoever (which is arguably a user/system configuration issue, not a Flux issue), how does this sound:

We document this problem and potential solutions in our FAQ over on flux-docs. In particular, the preferred solution would be for the user to set LC_ALL to something (either C, POSIX, or *.UTF-8). Really anything that isn't empty should work. In scenarios where they cannot change their locale (i.e., it must be en_us.UTF-8) but they want the ascii f, the user should set the FLUX_F58_FORCE_ASCII env var to force the use of ASCII when printing f58 from Flux utilities.

Thoughts?

@grondo
Copy link
Contributor

grondo commented Sep 10, 2020

Thanks @SteVwonder! Your observations above were not what I observed on TOSS3 with Python 3.7.2 loaded. In that case getdefaultlocale() returned (None, None) when LANG or LC_ALL was not set.

 grondo@fluke108:~$ python3 -V
Python 3.7.2
 grondo@fluke108:~$ locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
 grondo@fluke108:~$ python3 -c 'import locale; print(locale.getdefaultlocale())'
(None, None)

I tried on a Mac with 3.7.3, but unfortunately calling getdefaultlocale() throws an error (!?)

╰─ ❯ locale
LANG=""
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
╭─   ~                                                                         08:04:12
╰─ ❯ python3 -c 'import locale; print(locale.getdefaultlocale())'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/locale.py", line 568, in getdefaultlocale
    return _parse_localename(localename)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/locale.py", line 495, in _parse_localename
    raise ValueError('unknown locale: %s' % localename)
ValueError: unknown locale: UTF-8

It seems like you should be able to detect the underlying, default locale from Python. I don't understand why this isn't working. At the very least I would like to get the output of Python commands (flux mini and flux jobs) to match the output of C commands (flux job and flux shell for example) in environments where the locale is supposed to default to POSIX.

I wonder if we should just test the environment variables ourselves, or write our own getlocale() to be used by Python which will work like locale(1) and more importantly won't change beneath us for minor Python interpreter revisions.

@grondo
Copy link
Contributor

grondo commented Sep 10, 2020

Maybe this simpler approach will work to detect the specific troublesome circumstance described here:

diff --git a/src/bindings/python/flux/util.py b/src/bindings/python/flux/util.py
index 6cb1ef5..298d06a 100644
--- a/src/bindings/python/flux/util.py
+++ b/src/bindings/python/flux/util.py
@@ -119,6 +119,13 @@ def help_formatter(argwidth=40):
     return lambda prog: FluxHelpFormatter(prog, max_help_position=argwidth)


+def defaultlocale():
+    for key in [ "LANG", "LC_ALL", "LC_CTYPE" ]:
+        if key in os.environ:
+            return os.environ[key]
+    return None
+
+
 class CLIMain(object):
     def __init__(self, logger=None):
         if logger is None:
@@ -127,6 +134,10 @@ class CLIMain(object):
             self.logger = logger

     def __call__(self, main_func):
+        locale = defaultlocale()
+        if locale in [ None, "C", "POSIX" ]:
+            os.environ["FLUX_F58_FORCE_ASCII"] = "1"
+
         loglevel = int(os.environ.get("FLUX_PYCLI_LOGLEVEL", logging.INFO))
         logging.basicConfig(
             level=loglevel, format="%(name)s: %(levelname)s: %(message)s"
diff --git a/src/common/libutil/fluid.c b/src/common/libutil/fluid.c
index d9f5fb1..fb3b6c2 100644
--- a/src/common/libutil/fluid.c
+++ b/src/common/libutil/fluid.c
@@ -174,7 +174,7 @@ static inline int is_utf8_locale (void)
     /* Assume if MB_CUR_MAX > 1, this locale can handle wide characters
      *  and therefore will properly handle UTF-8.
      */
-    if (MB_CUR_MAX > 1)
+    if (MB_CUR_MAX > 1 && !getenv ("FLUX_F58_FORCE_ASCII"))
         return 1;
     return 0;
 }

@grondo
Copy link
Contributor

grondo commented Sep 10, 2020

For future reference, on Python 3.7.2 I don't see any evidence that coercion of C locale to UTF-8 changes the environment of the Python process to set LC_CTYPE, whereas @SteVwonder reported that for 3.7.4 the current process environment is modified.

E.g. for 3.7.2:

grondo@fluke108:~$ PYTHONCOERCECLOCALE=warn flux mini run --env="-*" --env="LC*" --dry-run hostname  | jq '.attributes.system.environment'
Python runtime initialized with LC_CTYPE=C (a locale with default ASCII encoding), which may cause Unicode compatibility problems. Using C.UTF-8, C.utf8, or UTF-8 (if available) as alternative Unicode-compatible locales is recommended.
{
  "LCSCHEDCLUSTER": "fluke"
}

@SteVwonder
Copy link
Member Author

Documented the FAQ workaround in flux-framework/flux-docs#61

@SteVwonder
Copy link
Member Author

whereas @SteVwonder reported that for 3.7.4 the current process environment is modified.

For Python 3.7.4 from Anaconda on Cori:

herbein1@cori02:/global/u2/h/herbein1/Repositories/flux-framework/flux-core> PYTHONCOERCECLOCALE=warn flux mini run --env="-*" --env="LC*" --dry-run hostname  | jq '.attributes.system.environment'
Python detected LC_CTYPE=C: LC_CTYPE coerced to C.UTF-8 (set another locale or PYTHONCOERCECLOCALE=0 to disable this locale coercion behavior).
{
  "LC_CTYPE": "C.UTF-8"
}

@grondo
Copy link
Contributor

grondo commented Sep 10, 2020

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants