gh-105156: Deprecate the old Py_UNICODE type in C API #105157

vstinner · 2023-05-31T16:43:51Z

Deprecate the old Py_UNICODE and PY_UNICODE_TYPE types in the C API: use wchar_t instead.

Replace Py_UNICODE with wchar_t in multiple C files.

Issue: C API: Deprecate Py_UNICODE type #105156

📚 Documentation preview 📚: https://cpython-previews--105157.org.readthedocs.build/

Deprecate the old Py_UNICODE and PY_UNICODE_TYPE types in the C API: use wchar_t instead. Replace Py_UNICODE with wchar_t in multiple C files.

vstinner · 2023-05-31T16:44:03Z

cc @methane

methane · 2023-05-31T17:12:33Z

Sourcegraph results:

It seems two releases is not enough for removing Py_UNICODE. But let's see it two years later.

Include/cpython/unicodeobject.h

vstinner · 2023-05-31T17:14:15Z

It seems two releases is not enough for removing Py_UNICODE. But let's see it two years later.

This PR is mostly about deprecation. I prefer to announce a Python release when these types will be removed, Python 3.15. But we will have to do this usage study again when these types will be removed for real.

The warning should help users to find old code still using Py_UNICODE by mistake or not.

methane · 2023-05-31T17:14:55Z

Fix here too.
https://github.com/python/cpython/pull/105157/files#file-modules-posixmodule-c-L5653

vstinner · 2023-05-31T17:15:39Z

Sourcegraph results: Py_UNICODE

The first result is Py_UNICODE *inp = PyUnicode_AS_UNICODE(in);. This code is already broken by Python 3.12: the function got removed.

Co-authored-by: Inada Naoki <songofacandy@gmail.com>

vstinner · 2023-05-31T17:19:16Z

Fix here too. https://github.com/python/cpython/pull/105157/files#file-modules-posixmodule-c-L5653

I planned to write a separated PR for code generated by Argument Clinic. It's now done with: PR #105161.

vstinner · 2023-05-31T17:24:26Z

I will wait until they 2 other PRs of this issue will be merged, to avoid emitting new compiler warnings.

arhadthedev · 2023-05-31T18:32:19Z

use wchar_t instead.

Can we use char16_t from С11? Docs: https://en.cppreference.com/w/c/string/multibyte/char16_t.

It would avoid 2-vs-4-byte size discrepancy.

methane · 2023-05-31T18:39:05Z

Can we use char16_t from С11? Docs: https://en.cppreference.com/w/c/string/multibyte/char16_t.

It would avoid 2-vs-4-byte size discrepancy.

At where?

Py_UNICODE has been wchar_t since Python 3.3.
So user should use wchar_t where Py_UNICODE was required before.

Where Py_UNICODE was not required, my recommendation is "use UTF-8 always".

arhadthedev · 2023-05-31T18:49:18Z

Ah, I got it that the parent issue is about removal of a thin thus unnecessary typedef, not about changing the multybyte machinery for the next major version of CPython.

arhadthedev · 2023-05-31T19:08:32Z

Initially I've got an impression that the PEP-393 removal of Py_UNICODE leaves the C API without a wide character type at all (so we need to fill the gap with any other wide char type).

Now I see that this would require a PEP before the removal.

vstinner · 2023-06-01T06:55:49Z

Can we use char16_t from С11?

That would be wrong. Python has many C functions which really expect 16-bit or 32-bit wchar_t like PyUnicode_FromWideChar().

Initially I've got an impression that the PEP-393 removal of Py_UNICODE leaves the C API without a wide character type at all

There is Py_UCS4 which should be 32-bit and is able to store all Unicode characters.

Where Py_UNICODE was not required, my recommendation is "use UTF-8 always".

Right. PEP 393 implementation first added many functions using Py_UCS4 arrays. It was inefficient since most of the time, all code points could be stored in Py_UCS1 arrays (4x smaller). Many strings are just ASCII. There are now more memory efficient structures. I also wrote _PyUnicodeWriter private API to change the internal storage depending on the maximum code point.

pythongh-105156: Deprecate the old Py_UNICODE type in C API

3165ff7

Deprecate the old Py_UNICODE and PY_UNICODE_TYPE types in the C API: use wchar_t instead. Replace Py_UNICODE with wchar_t in multiple C files.

bedevere-bot added the awaiting core review label May 31, 2023

bedevere-bot mentioned this pull request May 31, 2023

C API: Deprecate Py_UNICODE type #105156

Closed

methane approved these changes May 31, 2023

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels May 31, 2023

methane reviewed May 31, 2023

View reviewed changes

Include/cpython/unicodeobject.h Outdated Show resolved Hide resolved

Update Include/cpython/unicodeobject.h

93f06f7

Co-authored-by: Inada Naoki <songofacandy@gmail.com>

vstinner merged commit 8ed705c into python:main Jun 1, 2023

bedevere-bot removed the awaiting merge label Jun 1, 2023

vstinner deleted the deprecate_py_unicode branch June 1, 2023 06:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-105156: Deprecate the old Py_UNICODE type in C API #105157

gh-105156: Deprecate the old Py_UNICODE type in C API #105157

Uh oh!

vstinner commented May 31, 2023 •

edited by github-actions bot

Loading

Uh oh!

vstinner commented May 31, 2023

Uh oh!

methane commented May 31, 2023

Uh oh!

Uh oh!

vstinner commented May 31, 2023

Uh oh!

methane commented May 31, 2023

Uh oh!

vstinner commented May 31, 2023

Uh oh!

vstinner commented May 31, 2023

Uh oh!

vstinner commented May 31, 2023

Uh oh!

arhadthedev commented May 31, 2023

Uh oh!

methane commented May 31, 2023

Uh oh!

arhadthedev commented May 31, 2023

Uh oh!

arhadthedev commented May 31, 2023

Uh oh!

vstinner commented Jun 1, 2023

Uh oh!

Uh oh!

Uh oh!

gh-105156: Deprecate the old Py_UNICODE type in C API #105157

gh-105156: Deprecate the old Py_UNICODE type in C API #105157

Uh oh!

Conversation

vstinner commented May 31, 2023 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented May 31, 2023

Uh oh!

methane commented May 31, 2023

Uh oh!

Uh oh!

vstinner commented May 31, 2023

Uh oh!

methane commented May 31, 2023

Uh oh!

vstinner commented May 31, 2023

Uh oh!

vstinner commented May 31, 2023

Uh oh!

vstinner commented May 31, 2023

Uh oh!

arhadthedev commented May 31, 2023

Uh oh!

methane commented May 31, 2023

Uh oh!

arhadthedev commented May 31, 2023

Uh oh!

arhadthedev commented May 31, 2023

Uh oh!

vstinner commented Jun 1, 2023

Uh oh!

Uh oh!

vstinner commented May 31, 2023 •

edited by github-actions bot

Loading