Skip to content

Commit

Permalink
bpo-28180: Implementation for PEP 538 (python#659)
Browse files Browse the repository at this point in the history
- new PYTHONCOERCECLOCALE config setting
- coerces legacy C locale to C.UTF-8, C.utf8 or UTF-8 by default
- always uses C.UTF-8 on Android
- uses `surrogateescape` on stdin and stdout in the coercion
  target locales
- configure option to disable locale coercion at build time
- configure option to disable C locale warning at build time
  • Loading branch information
ncoghlan authored Jun 11, 2017
1 parent 0afbabe commit 6ea4186
Show file tree
Hide file tree
Showing 14 changed files with 699 additions and 55 deletions.
36 changes: 36 additions & 0 deletions Doc/using/cmdline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -713,6 +713,42 @@ conflict.

.. versionadded:: 3.6


.. envvar:: PYTHONCOERCECLOCALE

If set to the value ``0``, causes the main Python command line application
to skip coercing the legacy ASCII-based C locale to a more capable UTF-8
based alternative. Note that this setting is checked even when the
:option:`-E` or :option:`-I` options are used, as it is handled prior to
the processing of command line options.

If this variable is *not* set, or is set to a value other than ``0``, and
the current locale reported for the ``LC_CTYPE`` category is the default
``C`` locale, then the Python CLI will attempt to configure the following
locales for the ``LC_CTYPE`` category in the order listed before loading the
interpreter runtime:

* ``C.UTF-8``
* ``C.utf8``
* ``UTF-8``

If setting one of these locale categories succeeds, then the ``LC_CTYPE``
environment variable will also be set accordingly in the current process
environment before the Python runtime is initialized. This ensures the
updated setting is seen in subprocesses, as well as in operations that
query the environment rather than the current C locale (such as Python's
own :func:`locale.getdefaultlocale`).

Configuring one of these locales (either explicitly or via the above
implicit locale coercion) will automatically set the error handler for
:data:`sys.stdin` and :data:`sys.stdout` to ``surrogateescape``. This
behavior can be overridden using :envvar:`PYTHONIOENCODING` as usual.

Availability: \*nix

.. versionadded:: 3.7
See :pep:`538` for more details.

Debug-mode variables
~~~~~~~~~~~~~~~~~~~~

Expand Down
45 changes: 45 additions & 0 deletions Doc/whatsnew/3.7.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,51 @@ Summary -- Release highlights
New Features
============

.. _whatsnew37-pep538:

PEP 538: Legacy C Locale Coercion
---------------------------------

An ongoing challenge within the Python 3 series has been determining a sensible
default strategy for handling the "7-bit ASCII" text encoding assumption
currently implied by the use of the default C locale on non-Windows platforms.

:pep:`538` updates the default interpreter command line interface to
automatically coerce that locale to an available UTF-8 based locale as
described in the documentation of the new :envvar:`PYTHONCOERCECLOCALE`
environment variable. Automatically setting ``LC_CTYPE`` this way means that
both the core interpreter and locale-aware C extensions (such as
:mod:`readline`) will assume the use of UTF-8 as the default text encoding,
rather than ASCII.

The platform support definition in :pep:`11` has also been updated to limit
full text handling support to suitably configured non-ASCII based locales.

As part of this change, the default error handler for ``stdin`` and ``stdout``
is now ``surrogateescape`` (rather than ``strict``) when using any of the
defined coercion target locales (currently ``C.UTF-8``, ``C.utf8``, and
``UTF-8``). The default error handler for ``stderr`` continues to be
``backslashreplace``, regardless of locale.

.. note::

In the current implementation, a warning message is printed directly to
``stderr`` even for successful implicit locale coercion. This gives
redistributors and system integrators the opportunity to determine if they
should be making an environmental change to avoid the need for implicit
coercion at the Python interpreter level.

However, it's not clear that this is going to be the best approach for
the final 3.7.0 release, and we may end up deciding to disable the warning
by default and provide some way of opting into it at runtime or build time.

Concrete examples of use cases where it would be preferrable to disable the
warning by default can be noted on :issue:`30565`.

.. seealso::

:pep:`538` -- Coercing the legacy C locale to a UTF-8 based locale
PEP written and implemented by Nick Coghlan.


Other Language Changes
Expand Down
56 changes: 30 additions & 26 deletions Lib/test/support/script_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,35 @@ def interpreter_requires_environment():
return __cached_interp_requires_environment


_PythonRunResult = collections.namedtuple("_PythonRunResult",
("rc", "out", "err"))
class _PythonRunResult(collections.namedtuple("_PythonRunResult",
("rc", "out", "err"))):
"""Helper for reporting Python subprocess run results"""
def fail(self, cmd_line):
"""Provide helpful details about failed subcommand runs"""
# Limit to 80 lines to ASCII characters
maxlen = 80 * 100
out, err = self.out, self.err
if len(out) > maxlen:
out = b'(... truncated stdout ...)' + out[-maxlen:]
if len(err) > maxlen:
err = b'(... truncated stderr ...)' + err[-maxlen:]
out = out.decode('ascii', 'replace').rstrip()
err = err.decode('ascii', 'replace').rstrip()
raise AssertionError("Process return code is %d\n"
"command line: %r\n"
"\n"
"stdout:\n"
"---\n"
"%s\n"
"---\n"
"\n"
"stderr:\n"
"---\n"
"%s\n"
"---"
% (self.rc, cmd_line,
out,
err))


# Executing the interpreter in a subprocess
Expand Down Expand Up @@ -107,30 +134,7 @@ def run_python_until_end(*args, **env_vars):
def _assert_python(expected_success, *args, **env_vars):
res, cmd_line = run_python_until_end(*args, **env_vars)
if (res.rc and expected_success) or (not res.rc and not expected_success):
# Limit to 80 lines to ASCII characters
maxlen = 80 * 100
out, err = res.out, res.err
if len(out) > maxlen:
out = b'(... truncated stdout ...)' + out[-maxlen:]
if len(err) > maxlen:
err = b'(... truncated stderr ...)' + err[-maxlen:]
out = out.decode('ascii', 'replace').rstrip()
err = err.decode('ascii', 'replace').rstrip()
raise AssertionError("Process return code is %d\n"
"command line: %r\n"
"\n"
"stdout:\n"
"---\n"
"%s\n"
"---\n"
"\n"
"stderr:\n"
"---\n"
"%s\n"
"---"
% (res.rc, cmd_line,
out,
err))
res.fail(cmd_line)
return res

def assert_python_ok(*args, **env_vars):
Expand Down
Loading

0 comments on commit 6ea4186

Please sign in to comment.