Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-34589: C locale coercion off by default #9073

Merged
merged 3 commits into from
Sep 17, 2018
Merged

bpo-34589: C locale coercion off by default #9073

merged 3 commits into from
Sep 17, 2018

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Sep 5, 2018

The C locale coercion is always disabled by default. It can now only
be enabled in main() and wmain() functions: by the Python program
("python3").

Rename coerce_c_locale and coerce_c_locale_warn attributes of
_PyCoreConfig to _coerce_c_locale and _coerce_c_locale_warn: fields
are now private.

https://bugs.python.org/issue34589

@ncoghlan
Copy link
Contributor

This is still doing the coercion far too late, after command line options (et al) have already been decoded.

Please just take the fields out of the core config struct entirely, and change it back to doing the coercion before PyDecode_Locale even gets called to read the command line arguments the first time.

@ncoghlan
Copy link
Contributor

Ugh, the broken implementation shipped in 3.7.0: 9454060#diff-8c018c3ada66d06c8e101e47a313c2c7 :(

@vstinner
Copy link
Member Author

This is still doing the coercion far too late, after command line options (et al) have already been decoded.

pymain_read_conf() detects if the encoding changed after the configuration has been read. In that case, it forgets everything about the configuration and reads again the configuration. This is how I implemented properly the UTF-8 Mode which is configured by 3 different options:

  • -X utf8 command line option
  • LC_CTYPE locale
  • PYTHONUTF8 env var

Please just take the fields out of the core config struct entirely, and change it back to doing the coercion before PyDecode_Locale even gets called to read the command line arguments the first time.

Py_DecodeLocale() is also affected by Py_UTF8Mode. The issue is not specific to the C locale coercion.

I understand that you are saying that the current implementation doesn't work. Maybe I made a mistake, that's totally possible, but would you mind to explain how to see this bug? Because I made many manual tests and I'm not aware of any issue with the C locale coercion.

--

This change only makes sure that the C locale coercion cannot be not enabled when Python is embedded. Is that what you wanted to do?

@ncoghlan
Copy link
Contributor

The intent of the locale coercion design is to make it work for any embedding application, not just one that has been carefully crafted to decode everything incorrectly, realise it made a mistake, and then decode it back again.

The way you're currently doing it instead tightly couples the locale coercion to CPython specifically, such that it won't work for arbitrary applications that just happen to be embedding CPython. That's not the way locale coercion is supposed to work - it's supposed to be a behaviour of the Python CLI application that any other application embedding Python can emulate, not a property of the CPython runtime itself the way UTF-8 mode is.

Anyway, I've realised that the easiest way to make myself clear is to just submit my own PR that puts the PEP 538 implementation back the way it was intended to be, while still retaining your other improvements (like deferring emitting the warning until after the C level stdio streams have been initialised properly).

@ncoghlan
Copy link
Contributor

Note that "broken" above isn't the right word, since you've carefully crafted it to not be broken. "Fragile" would be a better word - just like UTF-8 mode, implementing locale coercion the way you currently have requires paying carefully attention to every single thing that CPython might read from the operating system, so it can be re-encoded and decoded differently if the locale encoding changes later on.

@vstinner
Copy link
Member Author

Please move the discussion to https://bugs.python.org/issue34589 I'm unable to follow the discussion on different (private) email threads, GitHub PR, bug tracker, etc.

@vstinner
Copy link
Member Author

3 tests failed: test_c_locale_coercion test_cmd_line test_sys

Py_Initialize() and Py_Main() cannot enable the C locale coercion
(PEP 538) anymore: it is always disabled. It can now only be enabled
by the Python program ("python3).

test_embed: get_filesystem_encoding() doesn't have to set PYTHONUTF8
nor PYTHONCOERCECLOCALE, these variables are already set in the
parent.
@vstinner
Copy link
Member Author

Rename coerce_c_locale and coerce_c_locale_warn attributes of
_PyCoreConfig to _coerce_c_locale and _coerce_c_locale_warn: fields
are now private.

I merged PR #9371, so this PR is now simplified to its limited scope: Py_Initialize() and Py_Main() don't enable C locale coercion anymore.

_coerce_c_locale cannot be tested using Py_Initialize()/Py_Main()
anymore.
@vstinner vstinner merged commit 7a0791b into python:master Sep 17, 2018
@vstinner vstinner deleted the coerce_c_locale branch September 17, 2018 23:22
vstinner added a commit that referenced this pull request Sep 18, 2018
… by default (GH-9379)

* bpo-34589: Make _PyCoreConfig.coerce_c_locale private (GH-9371)

_PyCoreConfig:

* Rename coerce_c_locale to _coerce_c_locale
* Rename coerce_c_locale_warn to _coerce_c_locale_warn

These fields are now private (name prefixed by "_").

(cherry picked from commit 188ebfa)

* bpo-34589: C locale coercion off by default (GH-9073)

Py_Initialize() and Py_Main() cannot enable the C locale coercion
(PEP 538) anymore: it is always disabled. It can now only be enabled
by the Python program ("python3).

test_embed: get_filesystem_encoding() doesn't have to set PYTHONUTF8
nor PYTHONCOERCECLOCALE, these variables are already set in the
parent.

(cherry picked from commit 7a0791b)

* bpo-34589: Add -X coerce_c_locale command line option (GH-9378)

Add a new -X coerce_c_locale command line option to control C locale
coercion (PEP 538).

(cherry picked from commit dbdee00)
@ncoghlan
Copy link
Contributor

Victor, why did you merge this? We were a long way from having agreement that this is the right thing to do, and now I'm going to have to update #9257 to account for your cavalier changes.

vstinner added a commit that referenced this pull request Sep 19, 2018
…" (GH-9430)

* Revert "bpo-34589: Add -X coerce_c_locale command line option (GH-9378)"

This reverts commit dbdee00.

* Revert "bpo-34589: C locale coercion off by default (GH-9073)"

This reverts commit 7a0791b.

* Revert "bpo-34589: Make _PyCoreConfig.coerce_c_locale private (GH-9371)"

This reverts commit 188ebfa.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants