Skip to content

Commit 1653e14

Browse files
authored
PEP 597: Update (#1799)
1 parent 7d7965b commit 1653e14

File tree

1 file changed

+135
-34
lines changed

1 file changed

+135
-34
lines changed

pep-0597.rst

Lines changed: 135 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,9 @@ The warning is disabled by default. New ``-X warn_encoding``
2121
command-line option and ``PYTHONWARNENCODING`` environment variable
2222
are used to enable the warnings.
2323

24+
``encoding="locale"`` option is added too. It is used to specify
25+
locale encoding explicitly.
26+
2427

2528
Motivation
2629
==========
@@ -39,34 +42,57 @@ in the ``README.md`` file which is encoded in UTF-8.
3942
For example, 489 packages of the 4000 most downloaded packages from
4043
PyPI used non-ASCII characters in README. And 82 packages of them
4144
can not be installed from source package when locale encoding is
42-
ASCII. [1_] They used the default encoding to read README or TOML
45+
ASCII. [1]_ They used the default encoding to read README or TOML
4346
file.
4447

4548
Another example is ``logging.basicConfig(filename="log.txt")``.
4649
Some users expect UTF-8 is used by default, but locale encoding is
47-
used actually. [2_]
50+
used actually. [2]_
4851

4952
Even Python experts assume that default encoding is UTF-8.
50-
It creates bugs that happen only on Windows. See [3_] and [4_].
53+
It creates bugs that happen only on Windows. See [3]_, [4]_, [5]_,
54+
and [6]_ for example.
5155

5256
Emitting a warning when the ``encoding`` option is omitted will help
5357
to find such mistakes.
5458

5559

60+
Explicit way to use locale-specific encoding
61+
--------------------------------------------
62+
63+
``open(filename)`` isn't explicit about which encoding is expected:
64+
65+
* Expects ASCII (not a bug, but inefficient on Windows)
66+
* Expects UTF-8 (bug or platform specific script)
67+
* Expects the locale encoding.
68+
69+
In this point of view, ``open(filename)`` is not readable.
70+
71+
``encoding=locale.getpreferredencoding(False)`` can be used to
72+
specify the locale encoding explicitly. But it is too long and easy
73+
to misuse. (e.g. forget to pass ``False`` to its parameter)
74+
75+
This PEP provides an explicit way to specify the locale encoding.
76+
77+
5678
Prepare to change the default encoding to UTF-8
5779
-----------------------------------------------
5880

59-
We had chosen to use locale encoding for the default text encoding in
60-
Python 3.0. But UTF-8 has been adopted very widely since then.
81+
Since UTF-8 becomes de-facto standard text encoding, we might change
82+
the default text encoding to UTF-8 in the future.
6183

62-
We might change the default text encoding to UTF-8 in the future.
63-
But this change will affect many applications and libraries.
64-
Many ``DeprecationWarning`` will be emitted if we start emitting the
65-
warning by default. It will be too noisy.
84+
But this change will affect many applications and libraries. If we
85+
start emitting ``DeprecationWarning`` everywhere ``encoding`` option
86+
is omitted by default, it will be too noisy and painful.
6687

6788
Although this PEP doesn't propose to change the default encoding,
68-
this PEP will help to reduce the warning in the future if we decide
69-
to change the default encoding.
89+
this PEP will the change:
90+
91+
* Reduce the number of omitted ``encoding`` option in many libraries
92+
before emitting the warning by default.
93+
94+
* Users will be able to use ``encoding="locale"`` option to suppress
95+
the warning without dropping Python 3.10 support.
7096

7197

7298
Specification
@@ -75,7 +101,7 @@ Specification
75101
``EncodingWarning``
76102
--------------------
77103

78-
Add new ``EncodingWarning`` warning class which is a subclass of
104+
Add a new ``EncodingWarning`` warning class which is a subclass of
79105
``Warning``. It is used to warn when the ``encoding`` option is
80106
omitted and the default encoding is locale-specific.
81107

@@ -94,6 +120,9 @@ When the option is enabled, ``io.TextIOWrapper()``, ``open()``, and
94120
other modules using them will emit ``EncodingWarning`` when
95121
``encoding`` is omitted.
96122

123+
Since ``EncodingWarning`` is a subclass of ``Warning``, they are
124+
shown by default, unlike ``DeprecationWarning``.
125+
97126

98127
``encoding="locale"`` option
99128
----------------------------
@@ -102,21 +131,6 @@ other modules using them will emit ``EncodingWarning`` when
102131
same to current ``encoding=None``. But ``io.TextIOWrapper`` doesn't
103132
emit ``EncodingWarning`` when ``encoding="locale"`` is specified.
104133

105-
Add ``io.LOCALE_ENCODING = "locale"`` constant too. This constant can
106-
be used to avoid confusing ``LookupError: unknown encoding: locale``
107-
error when the code is run in old Python accidentally.
108-
109-
The constant can be used to test that ``encoding="locale"`` option is
110-
supported too. For example,
111-
112-
.. code-block::
113-
114-
# Want to suppress an EncodingWarning but still need support
115-
# old Python versions.
116-
locale_encoding = getattr(io, "LOCALE_ENCODING", None)
117-
with open(filename, encoding=locale_encoding) as f:
118-
...
119-
120134

121135
``io.text_encoding()``
122136
-----------------------
@@ -145,7 +159,7 @@ Pure Python implementation will be like this::
145159
import warnings
146160
warnings.warn("'encoding' option is omitted",
147161
EncodingWarning, stacklevel + 2)
148-
encoding = LOCALE_ENCODING
162+
encoding = "locale"
149163
return encoding
150164

151165
For example, ``pathlib.Path.read_text()`` can use the function like:
@@ -158,20 +172,20 @@ For example, ``pathlib.Path.read_text()`` can use the function like:
158172
return f.read()
159173
160174
By using ``io.text_encoding()``, ``EncodingWarning`` is emitted for
161-
the caller of ``read_text()`` instead of ``read_text()``.
175+
the caller of ``read_text()`` instead of ``read_text()`` itself.
162176

163177

164178
Affected stdlibs
165-
-------------------
179+
-----------------
166180

167181
Many stdlibs will be affected by this change.
168182

169183
Most APIs accepting ``encoding=None`` will use ``io.text_encoding()``
170184
as written in the previous section.
171185

172186
Where using locale encoding as the default encoding is reasonable,
173-
``encoding=io.LOCALE_ENCODING`` will be used instead. For example,
174-
``subprocess`` module will use locale encoding for the default
187+
``encoding="locale"`` will be used instead. For example,
188+
the ``subprocess`` module will use locale encoding for the default
175189
encoding of the pipes.
176190

177191
Many tests use ``open()`` without ``encoding`` specified to read
@@ -185,7 +199,7 @@ Opt-in warning
185199
---------------
186200

187201
Although ``DeprecationWarning`` is suppressed by default, emitting
188-
``DeprecationWarning`` always when ``encoding`` option is omitted
202+
``DeprecationWarning`` always when the ``encoding`` option is omitted
189203
would be too noisy.
190204

191205
Noisy warnings may lead developers to dismiss the
@@ -203,12 +217,82 @@ when ``encoding=None``. This behavior can not be implemented in
203217
the codec.
204218

205219

220+
Backward Compatibility
221+
======================
222+
223+
The new warning is not emitted by default. So this PEP is 100%
224+
backward compatible.
225+
226+
227+
Forward Compatibility
228+
=====================
229+
230+
``encoding="locale"`` option is not forward compatible. Codes
231+
using the option will not work on Python older than 3.10. It will
232+
raise ``LookupError: unknown encoding: locale``.
233+
234+
Until developers can drop Python 3.9 support, ``EncodingWarning``
235+
can be used only for finding missing ``encoding="utf-8"`` options.
236+
237+
238+
How to teach this
239+
=================
240+
241+
For new users
242+
-------------
243+
244+
Since ``EncodingWarning`` is used to write a cross-platform code,
245+
no need to teach it to new users.
246+
247+
We can just recommend using UTF-8 for text files and use
248+
``encoding="utf-8"`` when opening test files.
249+
250+
251+
For experienced users
252+
---------------------
253+
254+
Using ``open(filename)`` to read text files encoded in UTF-8 is a
255+
common mistake. It may not work on Windows because UTF-8 is not the
256+
default encoding.
257+
258+
You can use ``-X warn_encoding`` or ``PYTHONWARNENCODING=1`` to find
259+
this type of mistake.
260+
261+
Omitting ``encoding`` option is not a bug when opening text files
262+
encoded in locale encoding. But ``encoding="locale"`` is recommended
263+
after Python 3.10 because it is more explicit.
264+
265+
206266
Reference Implementation
207267
========================
208268

209269
https://github.com/python/cpython/pull/19481
210270

211271

272+
Discussions
273+
===========
274+
275+
* Why not implement this in linters?
276+
277+
* ``encoding="locale"`` and ``io.text_encoding()`` must be in
278+
Python.
279+
280+
* It is difficult to find all caller of functions wrapping
281+
``open()`` or ``TextIOWrapper()``. (See ``io.text_encoding()``
282+
section.)
283+
284+
* Many developers will not use the option.
285+
286+
* Some developers use the option and report the warnings to
287+
libraries they use. So the option is worth enough even though
288+
many developers won't use it.
289+
290+
* For example, I find [7]_ and [8]_ by running
291+
``pip install -U pip`` and find [9]_ by running ``tox``
292+
with the reference implementation. It demonstrates how this
293+
option find potential issues.
294+
295+
212296
References
213297
==========
214298

@@ -225,11 +309,28 @@ References
225309
.. [4] ``json.tool`` had used locale encoding to read JSON files.
226310
(https://bugs.python.org/issue33684)
227311
312+
.. [5] site: Potential UnicodeDecodeError when handling pth file
313+
(https://bugs.python.org/issue33684)
314+
315+
.. [6] pypa/pip: "Installing packages fails if Python 3 installed
316+
into path with non-ASCII characters"
317+
(https://github.com/pypa/pip/issues/9054)
318+
319+
.. [7] "site: Potential UnicodeDecodeError when handling pth file"
320+
(https://bugs.python.org/issue43214)
321+
322+
.. [8] "[pypa/pip] Use ``encoding`` option or binary mode for open()"
323+
(https://github.com/pypa/pip/pull/9608)
324+
325+
.. [9] "Possible UnicodeError caused by missing encoding="utf-8""
326+
(https://github.com/tox-dev/tox/issues/1908)
327+
228328
229329
Copyright
230330
=========
231331

232-
This document has been placed in the public domain.
332+
This document is placed in the public domain or under the
333+
CC0-1.0-Universal license, whichever is more permissive.
233334

234335

235336
..

0 commit comments

Comments
 (0)