@@ -21,6 +21,9 @@ The warning is disabled by default. New ``-X warn_encoding``
2121command-line option and ``PYTHONWARNENCODING `` environment variable
2222are used to enable the warnings.
2323
24+ ``encoding="locale" `` option is added too. It is used to specify
25+ locale encoding explicitly.
26+
2427
2528Motivation
2629==========
@@ -39,34 +42,57 @@ in the ``README.md`` file which is encoded in UTF-8.
3942For example, 489 packages of the 4000 most downloaded packages from
4043PyPI used non-ASCII characters in README. And 82 packages of them
4144can not be installed from source package when locale encoding is
42- ASCII. [1 _] They used the default encoding to read README or TOML
45+ ASCII. [1 ]_ They used the default encoding to read README or TOML
4346file.
4447
4548Another example is ``logging.basicConfig(filename="log.txt") ``.
4649Some users expect UTF-8 is used by default, but locale encoding is
47- used actually. [2 _]
50+ used actually. [2 ]_
4851
4952Even Python experts assume that default encoding is UTF-8.
50- It creates bugs that happen only on Windows. See [3 _] and [4 _].
53+ It creates bugs that happen only on Windows. See [3 ]_, [4 ]_, [5 ]_,
54+ and [6 ]_ for example.
5155
5256Emitting a warning when the ``encoding `` option is omitted will help
5357to find such mistakes.
5458
5559
60+ Explicit way to use locale-specific encoding
61+ --------------------------------------------
62+
63+ ``open(filename) `` isn't explicit about which encoding is expected:
64+
65+ * Expects ASCII (not a bug, but inefficient on Windows)
66+ * Expects UTF-8 (bug or platform specific script)
67+ * Expects the locale encoding.
68+
69+ In this point of view, ``open(filename) `` is not readable.
70+
71+ ``encoding=locale.getpreferredencoding(False) `` can be used to
72+ specify the locale encoding explicitly. But it is too long and easy
73+ to misuse. (e.g. forget to pass ``False `` to its parameter)
74+
75+ This PEP provides an explicit way to specify the locale encoding.
76+
77+
5678Prepare to change the default encoding to UTF-8
5779-----------------------------------------------
5880
59- We had chosen to use locale encoding for the default text encoding in
60- Python 3.0. But UTF-8 has been adopted very widely since then .
81+ Since UTF-8 becomes de-facto standard text encoding, we might change
82+ the default text encoding to UTF-8 in the future .
6183
62- We might change the default text encoding to UTF-8 in the future.
63- But this change will affect many applications and libraries.
64- Many ``DeprecationWarning `` will be emitted if we start emitting the
65- warning by default. It will be too noisy.
84+ But this change will affect many applications and libraries. If we
85+ start emitting ``DeprecationWarning `` everywhere ``encoding `` option
86+ is omitted by default, it will be too noisy and painful.
6687
6788Although this PEP doesn't propose to change the default encoding,
68- this PEP will help to reduce the warning in the future if we decide
69- to change the default encoding.
89+ this PEP will the change:
90+
91+ * Reduce the number of omitted ``encoding `` option in many libraries
92+ before emitting the warning by default.
93+
94+ * Users will be able to use ``encoding="locale" `` option to suppress
95+ the warning without dropping Python 3.10 support.
7096
7197
7298Specification
@@ -75,7 +101,7 @@ Specification
75101``EncodingWarning ``
76102--------------------
77103
78- Add new ``EncodingWarning `` warning class which is a subclass of
104+ Add a new ``EncodingWarning `` warning class which is a subclass of
79105``Warning ``. It is used to warn when the ``encoding `` option is
80106omitted and the default encoding is locale-specific.
81107
@@ -94,6 +120,9 @@ When the option is enabled, ``io.TextIOWrapper()``, ``open()``, and
94120other modules using them will emit ``EncodingWarning `` when
95121``encoding `` is omitted.
96122
123+ Since ``EncodingWarning `` is a subclass of ``Warning ``, they are
124+ shown by default, unlike ``DeprecationWarning ``.
125+
97126
98127``encoding="locale" `` option
99128----------------------------
@@ -102,21 +131,6 @@ other modules using them will emit ``EncodingWarning`` when
102131same to current ``encoding=None ``. But ``io.TextIOWrapper `` doesn't
103132emit ``EncodingWarning `` when ``encoding="locale" `` is specified.
104133
105- Add ``io.LOCALE_ENCODING = "locale" `` constant too. This constant can
106- be used to avoid confusing ``LookupError: unknown encoding: locale ``
107- error when the code is run in old Python accidentally.
108-
109- The constant can be used to test that ``encoding="locale" `` option is
110- supported too. For example,
111-
112- .. code-block ::
113-
114- # Want to suppress an EncodingWarning but still need support
115- # old Python versions.
116- locale_encoding = getattr(io, "LOCALE_ENCODING", None)
117- with open(filename, encoding=locale_encoding) as f:
118- ...
119-
120134
121135``io.text_encoding() ``
122136-----------------------
@@ -145,7 +159,7 @@ Pure Python implementation will be like this::
145159 import warnings
146160 warnings.warn("'encoding' option is omitted",
147161 EncodingWarning, stacklevel + 2)
148- encoding = LOCALE_ENCODING
162+ encoding = "locale"
149163 return encoding
150164
151165For example, ``pathlib.Path.read_text() `` can use the function like:
@@ -158,20 +172,20 @@ For example, ``pathlib.Path.read_text()`` can use the function like:
158172 return f.read()
159173
160174 By using ``io.text_encoding() ``, ``EncodingWarning `` is emitted for
161- the caller of ``read_text() `` instead of ``read_text() ``.
175+ the caller of ``read_text() `` instead of ``read_text() `` itself .
162176
163177
164178Affected stdlibs
165- -------------------
179+ -----------------
166180
167181Many stdlibs will be affected by this change.
168182
169183Most APIs accepting ``encoding=None `` will use ``io.text_encoding() ``
170184as written in the previous section.
171185
172186Where using locale encoding as the default encoding is reasonable,
173- ``encoding=io.LOCALE_ENCODING `` will be used instead. For example,
174- ``subprocess `` module will use locale encoding for the default
187+ ``encoding="locale" `` will be used instead. For example,
188+ the ``subprocess `` module will use locale encoding for the default
175189encoding of the pipes.
176190
177191Many tests use ``open() `` without ``encoding `` specified to read
@@ -185,7 +199,7 @@ Opt-in warning
185199---------------
186200
187201Although ``DeprecationWarning `` is suppressed by default, emitting
188- ``DeprecationWarning `` always when ``encoding `` option is omitted
202+ ``DeprecationWarning `` always when the ``encoding `` option is omitted
189203would be too noisy.
190204
191205Noisy warnings may lead developers to dismiss the
@@ -203,12 +217,82 @@ when ``encoding=None``. This behavior can not be implemented in
203217the codec.
204218
205219
220+ Backward Compatibility
221+ ======================
222+
223+ The new warning is not emitted by default. So this PEP is 100%
224+ backward compatible.
225+
226+
227+ Forward Compatibility
228+ =====================
229+
230+ ``encoding="locale" `` option is not forward compatible. Codes
231+ using the option will not work on Python older than 3.10. It will
232+ raise ``LookupError: unknown encoding: locale ``.
233+
234+ Until developers can drop Python 3.9 support, ``EncodingWarning ``
235+ can be used only for finding missing ``encoding="utf-8" `` options.
236+
237+
238+ How to teach this
239+ =================
240+
241+ For new users
242+ -------------
243+
244+ Since ``EncodingWarning `` is used to write a cross-platform code,
245+ no need to teach it to new users.
246+
247+ We can just recommend using UTF-8 for text files and use
248+ ``encoding="utf-8" `` when opening test files.
249+
250+
251+ For experienced users
252+ ---------------------
253+
254+ Using ``open(filename) `` to read text files encoded in UTF-8 is a
255+ common mistake. It may not work on Windows because UTF-8 is not the
256+ default encoding.
257+
258+ You can use ``-X warn_encoding `` or ``PYTHONWARNENCODING=1 `` to find
259+ this type of mistake.
260+
261+ Omitting ``encoding `` option is not a bug when opening text files
262+ encoded in locale encoding. But ``encoding="locale" `` is recommended
263+ after Python 3.10 because it is more explicit.
264+
265+
206266Reference Implementation
207267========================
208268
209269https://github.com/python/cpython/pull/19481
210270
211271
272+ Discussions
273+ ===========
274+
275+ * Why not implement this in linters?
276+
277+ * ``encoding="locale" `` and ``io.text_encoding() `` must be in
278+ Python.
279+
280+ * It is difficult to find all caller of functions wrapping
281+ ``open() `` or ``TextIOWrapper() ``. (See ``io.text_encoding() ``
282+ section.)
283+
284+ * Many developers will not use the option.
285+
286+ * Some developers use the option and report the warnings to
287+ libraries they use. So the option is worth enough even though
288+ many developers won't use it.
289+
290+ * For example, I find [7 ]_ and [8 ]_ by running
291+ ``pip install -U pip `` and find [9 ]_ by running ``tox ``
292+ with the reference implementation. It demonstrates how this
293+ option find potential issues.
294+
295+
212296References
213297==========
214298
@@ -225,11 +309,28 @@ References
225309 .. [4 ] ``json.tool `` had used locale encoding to read JSON files.
226310 (https://bugs.python.org/issue33684)
227311
312+ .. [5 ] site: Potential UnicodeDecodeError when handling pth file
313+ (https://bugs.python.org/issue33684)
314+
315+ .. [6 ] pypa/pip: "Installing packages fails if Python 3 installed
316+ into path with non-ASCII characters"
317+ (https://github.com/pypa/pip/issues/9054)
318+
319+ .. [7 ] "site: Potential UnicodeDecodeError when handling pth file"
320+ (https://bugs.python.org/issue43214)
321+
322+ .. [8 ] "[pypa/pip] Use ``encoding `` option or binary mode for open()"
323+ (https://github.com/pypa/pip/pull/9608)
324+
325+ .. [9 ] "Possible UnicodeError caused by missing encoding="utf-8""
326+ (https://github.com/tox-dev/tox/issues/1908)
327+
228328
229329 Copyright
230330=========
231331
232- This document has been placed in the public domain.
332+ This document is placed in the public domain or under the
333+ CC0-1.0-Universal license, whichever is more permissive.
233334
234335
235336..
0 commit comments