-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-34589: C locale coercion off by default #9073
Conversation
This is still doing the coercion far too late, after command line options (et al) have already been decoded. Please just take the fields out of the core config struct entirely, and change it back to doing the coercion before PyDecode_Locale even gets called to read the command line arguments the first time. |
Ugh, the broken implementation shipped in 3.7.0: 9454060#diff-8c018c3ada66d06c8e101e47a313c2c7 :( |
pymain_read_conf() detects if the encoding changed after the configuration has been read. In that case, it forgets everything about the configuration and reads again the configuration. This is how I implemented properly the UTF-8 Mode which is configured by 3 different options:
Py_DecodeLocale() is also affected by Py_UTF8Mode. The issue is not specific to the C locale coercion. I understand that you are saying that the current implementation doesn't work. Maybe I made a mistake, that's totally possible, but would you mind to explain how to see this bug? Because I made many manual tests and I'm not aware of any issue with the C locale coercion. -- This change only makes sure that the C locale coercion cannot be not enabled when Python is embedded. Is that what you wanted to do? |
The intent of the locale coercion design is to make it work for any embedding application, not just one that has been carefully crafted to decode everything incorrectly, realise it made a mistake, and then decode it back again. The way you're currently doing it instead tightly couples the locale coercion to CPython specifically, such that it won't work for arbitrary applications that just happen to be embedding CPython. That's not the way locale coercion is supposed to work - it's supposed to be a behaviour of the Python CLI application that any other application embedding Python can emulate, not a property of the CPython runtime itself the way UTF-8 mode is. Anyway, I've realised that the easiest way to make myself clear is to just submit my own PR that puts the PEP 538 implementation back the way it was intended to be, while still retaining your other improvements (like deferring emitting the warning until after the C level stdio streams have been initialised properly). |
Note that "broken" above isn't the right word, since you've carefully crafted it to not be broken. "Fragile" would be a better word - just like UTF-8 mode, implementing locale coercion the way you currently have requires paying carefully attention to every single thing that CPython might read from the operating system, so it can be re-encoded and decoded differently if the locale encoding changes later on. |
Please move the discussion to https://bugs.python.org/issue34589 I'm unable to follow the discussion on different (private) email threads, GitHub PR, bug tracker, etc. |
3 tests failed: test_c_locale_coercion test_cmd_line test_sys |
Py_Initialize() and Py_Main() cannot enable the C locale coercion (PEP 538) anymore: it is always disabled. It can now only be enabled by the Python program ("python3). test_embed: get_filesystem_encoding() doesn't have to set PYTHONUTF8 nor PYTHONCOERCECLOCALE, these variables are already set in the parent.
I merged PR #9371, so this PR is now simplified to its limited scope: Py_Initialize() and Py_Main() don't enable C locale coercion anymore. |
_coerce_c_locale cannot be tested using Py_Initialize()/Py_Main() anymore.
… by default (GH-9379) * bpo-34589: Make _PyCoreConfig.coerce_c_locale private (GH-9371) _PyCoreConfig: * Rename coerce_c_locale to _coerce_c_locale * Rename coerce_c_locale_warn to _coerce_c_locale_warn These fields are now private (name prefixed by "_"). (cherry picked from commit 188ebfa) * bpo-34589: C locale coercion off by default (GH-9073) Py_Initialize() and Py_Main() cannot enable the C locale coercion (PEP 538) anymore: it is always disabled. It can now only be enabled by the Python program ("python3). test_embed: get_filesystem_encoding() doesn't have to set PYTHONUTF8 nor PYTHONCOERCECLOCALE, these variables are already set in the parent. (cherry picked from commit 7a0791b) * bpo-34589: Add -X coerce_c_locale command line option (GH-9378) Add a new -X coerce_c_locale command line option to control C locale coercion (PEP 538). (cherry picked from commit dbdee00)
Victor, why did you merge this? We were a long way from having agreement that this is the right thing to do, and now I'm going to have to update #9257 to account for your cavalier changes. |
…" (GH-9430) * Revert "bpo-34589: Add -X coerce_c_locale command line option (GH-9378)" This reverts commit dbdee00. * Revert "bpo-34589: C locale coercion off by default (GH-9073)" This reverts commit 7a0791b. * Revert "bpo-34589: Make _PyCoreConfig.coerce_c_locale private (GH-9371)" This reverts commit 188ebfa.
The C locale coercion is always disabled by default. It can now only
be enabled in main() and wmain() functions: by the Python program
("python3").
Rename coerce_c_locale and coerce_c_locale_warn attributes of
_PyCoreConfig to _coerce_c_locale and _coerce_c_locale_warn: fields
are now private.
https://bugs.python.org/issue34589