Skip to content

gh-91156: Fix encoding="locale" in UTF-8 mode #70056

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 14, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions Doc/library/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ Text Encoding
-------------

The default encoding of :class:`TextIOWrapper` and :func:`open` is
locale-specific (:func:`locale.getpreferredencoding(False) <locale.getpreferredencoding>`).
locale-specific (:func:`locale.getencoding`).

However, many developers forget to specify the encoding when opening text files
encoded in UTF-8 (e.g. JSON, TOML, Markdown, etc...) since most Unix
Expand Down Expand Up @@ -948,8 +948,7 @@ Text I/O
:class:`TextIOBase`.

*encoding* gives the name of the encoding that the stream will be decoded or
encoded with. It defaults to
:func:`locale.getpreferredencoding(False) <locale.getpreferredencoding>`.
encoded with. It defaults to :func:`locale.getencoding()`.
``encoding="locale"`` can be used to specify the current locale's encoding
explicitly. See :ref:`io-text-encoding` for more information.

Expand Down
2 changes: 1 addition & 1 deletion Doc/using/windows.rst
Original file line number Diff line number Diff line change
Expand Up @@ -618,7 +618,7 @@ UTF-8 mode

Windows still uses legacy encodings for the system encoding (the ANSI Code
Page). Python uses it for the default encoding of text files (e.g.
:func:`locale.getpreferredencoding`).
:func:`locale.getencoding`).

This may cause issues because UTF-8 is widely used on the internet
and most Unix systems, including WSL (Windows Subsystem for Linux).
Expand Down
8 changes: 5 additions & 3 deletions Lib/_pyio.py
Original file line number Diff line number Diff line change
Expand Up @@ -1988,7 +1988,7 @@ class TextIOWrapper(TextIOBase):
r"""Character and line based layer over a BufferedIOBase object, buffer.

encoding gives the name of the encoding that the stream will be
decoded or encoded with. It defaults to locale.getpreferredencoding(False).
decoded or encoded with. It defaults to locale.getencoding().

errors determines the strictness of encoding and decoding (see the
codecs.register) and defaults to "strict".
Expand Down Expand Up @@ -2021,7 +2021,9 @@ def __init__(self, buffer, encoding=None, errors=None, newline=None,
self._check_newline(newline)
encoding = text_encoding(encoding)

if encoding == "locale":
if encoding == "locale" and sys.platform == "win32":
# On Unix, os.device_encoding() returns "utf-8" instead of locale encoding
# in the UTF-8 mode. So we use os.device_encoding() only on Windows.
try:
encoding = os.device_encoding(buffer.fileno()) or "locale"
except (AttributeError, UnsupportedOperation):
Expand All @@ -2034,7 +2036,7 @@ def __init__(self, buffer, encoding=None, errors=None, newline=None,
# Importing locale may fail if Python is being built
encoding = "utf-8"
else:
encoding = locale.getpreferredencoding(False)
encoding = locale.getencoding()

if not isinstance(encoding, str):
raise ValueError("invalid encoding: %r" % encoding)
Expand Down
2 changes: 1 addition & 1 deletion Lib/locale.py
Original file line number Diff line number Diff line change
Expand Up @@ -557,7 +557,7 @@ def getdefaultlocale(envvars=('LC_ALL', 'LC_CTYPE', 'LANG', 'LANGUAGE')):

import warnings
warnings.warn(
"Use setlocale(), getpreferredencoding(False) and getlocale() instead",
"Use setlocale(), getencoding() and getlocale() instead",
DeprecationWarning, stacklevel=2
)

Expand Down
1 change: 1 addition & 0 deletions Lib/test/test_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -2737,6 +2737,7 @@ def test_default_encoding(self):
os.environ.update(old_environ)

@support.cpython_only
@unittest.skipIf(sys.platform != "win32", "Windows-only test")
@unittest.skipIf(sys.flags.utf8_mode, "utf-8 mode is enabled")
def test_device_encoding(self):
# Issue 15989
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Make :class:`TextIOWrapper` uses locale encoding when ``encoding="locale"``
is specified even in UTF-8 mode.
8 changes: 4 additions & 4 deletions Modules/_io/_iomodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -92,9 +92,9 @@ it already exists), 'x' for creating and writing to a new file, and
'a' for appending (which on some Unix systems, means that all writes
append to the end of the file regardless of the current seek position).
In text mode, if encoding is not specified the encoding used is platform
dependent: locale.getpreferredencoding(False) is called to get the
current locale encoding. (For reading and writing raw bytes use binary
mode and leave encoding unspecified.) The available modes are:
dependent: locale.getencoding() is called to get the current locale encoding.
(For reading and writing raw bytes use binary mode and leave encoding
unspecified.) The available modes are:

========= ===============================================================
Character Meaning
Expand Down Expand Up @@ -196,7 +196,7 @@ static PyObject *
_io_open_impl(PyObject *module, PyObject *file, const char *mode,
int buffering, const char *encoding, const char *errors,
const char *newline, int closefd, PyObject *opener)
/*[clinic end generated code: output=aefafc4ce2b46dc0 input=1543f4511d2356a5]*/
/*[clinic end generated code: output=aefafc4ce2b46dc0 input=5bb37f174cb2fb11]*/
{
unsigned i;

Expand Down
8 changes: 4 additions & 4 deletions Modules/_io/clinic/_iomodule.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions Modules/_io/clinic/textio.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 13 additions & 5 deletions Modules/_io/textio.c
Original file line number Diff line number Diff line change
Expand Up @@ -1023,7 +1023,7 @@ _io.TextIOWrapper.__init__
Character and line based layer over a BufferedIOBase object, buffer.

encoding gives the name of the encoding that the stream will be
decoded or encoded with. It defaults to locale.getpreferredencoding(False).
decoded or encoded with. It defaults to locale.getencoding().

errors determines the strictness of encoding and decoding (see
help(codecs.Codec) or the documentation for codecs.register) and
Expand Down Expand Up @@ -1055,12 +1055,12 @@ _io_TextIOWrapper___init___impl(textio *self, PyObject *buffer,
const char *encoding, PyObject *errors,
const char *newline, int line_buffering,
int write_through)
/*[clinic end generated code: output=72267c0c01032ed2 input=77d8696d1a1f460b]*/
/*[clinic end generated code: output=72267c0c01032ed2 input=72590963698f289b]*/
{
PyObject *raw, *codec_info = NULL;
_PyIO_State *state = NULL;
PyObject *res;
int r;
int use_locale_encoding = 0; // Use locale encoding even in UTF-8 mode.

self->ok = 0;
self->detached = 0;
Expand All @@ -1076,6 +1076,7 @@ _io_TextIOWrapper___init___impl(textio *self, PyObject *buffer,
}
else if (strcmp(encoding, "locale") == 0) {
encoding = NULL;
use_locale_encoding = 1;
}

if (errors == Py_None) {
Expand Down Expand Up @@ -1113,10 +1114,15 @@ _io_TextIOWrapper___init___impl(textio *self, PyObject *buffer,
self->encodefunc = NULL;
self->b2cratio = 0.0;

#ifdef MS_WINDOWS
// os.device_encoding() on Unix is the locale encoding or UTF-8
// according to UTF-8 Mode.
// Since UTF-8 mode shouldn't affect `encoding="locale"`, we call
// os.device_encoding() only on Windows.
if (encoding == NULL) {
/* Try os.device_encoding(fileno) */
PyObject *fileno;
state = IO_STATE();
_PyIO_State *state = IO_STATE();
if (state == NULL)
goto error;
fileno = PyObject_CallMethodNoArgs(buffer, &_Py_ID(fileno));
Expand Down Expand Up @@ -1144,8 +1150,10 @@ _io_TextIOWrapper___init___impl(textio *self, PyObject *buffer,
Py_CLEAR(self->encoding);
}
}
#endif

if (encoding == NULL && self->encoding == NULL) {
if (_PyRuntime.preconfig.utf8_mode) {
if (_PyRuntime.preconfig.utf8_mode && !use_locale_encoding) {
_Py_DECLARE_STR(utf_8, "utf-8");
self->encoding = Py_NewRef(&_Py_STR(utf_8));
}
Expand Down
1 change: 0 additions & 1 deletion Tools/c-analyzer/TODO
Original file line number Diff line number Diff line change
Expand Up @@ -251,7 +251,6 @@ Modules/_io/textio.c:PyId_close _Py_IDENTIFIER(
Modules/_io/textio.c:PyId_decode _Py_IDENTIFIER(decode)
Modules/_io/textio.c:PyId_fileno _Py_IDENTIFIER(fileno)
Modules/_io/textio.c:PyId_flush _Py_IDENTIFIER(flush)
Modules/_io/textio.c:PyId_getpreferredencoding _Py_IDENTIFIER(getpreferredencoding)
Modules/_io/textio.c:PyId_isatty _Py_IDENTIFIER(isatty)
Modules/_io/textio.c:PyId_mode _Py_IDENTIFIER(mode)
Modules/_io/textio.c:PyId_name _Py_IDENTIFIER(name)
Expand Down