-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-105156: Deprecate the old Py_UNICODE type in C API #105157
Conversation
Deprecate the old Py_UNICODE and PY_UNICODE_TYPE types in the C API: use wchar_t instead. Replace Py_UNICODE with wchar_t in multiple C files.
cc @methane |
Sourcegraph results: It seems two releases is not enough for removing |
This PR is mostly about deprecation. I prefer to announce a Python release when these types will be removed, Python 3.15. But we will have to do this usage study again when these types will be removed for real. The warning should help users to find old code still using Py_UNICODE by mistake or not. |
The first result is |
Co-authored-by: Inada Naoki <songofacandy@gmail.com>
I planned to write a separated PR for code generated by Argument Clinic. It's now done with: PR #105161. |
I will wait until they 2 other PRs of this issue will be merged, to avoid emitting new compiler warnings. |
Can we use It would avoid 2-vs-4-byte size discrepancy. |
At where? Py_UNICODE has been wchar_t since Python 3.3. Where Py_UNICODE was not required, my recommendation is "use UTF-8 always". |
Ah, I got it that the parent issue is about removal of a thin thus unnecessary typedef, not about changing the multybyte machinery for the next major version of CPython. |
Initially I've got an impression that the PEP-393 removal of Now I see that this would require a PEP before the removal. |
That would be wrong. Python has many C functions which really expect 16-bit or 32-bit wchar_t like PyUnicode_FromWideChar().
There is Py_UCS4 which should be 32-bit and is able to store all Unicode characters.
Right. PEP 393 implementation first added many functions using Py_UCS4 arrays. It was inefficient since most of the time, all code points could be stored in Py_UCS1 arrays (4x smaller). Many strings are just ASCII. There are now more memory efficient structures. I also wrote _PyUnicodeWriter private API to change the internal storage depending on the maximum code point. |
Deprecate the old Py_UNICODE and PY_UNICODE_TYPE types in the C API: use wchar_t instead.
Replace Py_UNICODE with wchar_t in multiple C files.
📚 Documentation preview 📚: https://cpython-previews--105157.org.readthedocs.build/