-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
gh-111089: Add PyUnicode_AsUTF8Unsafe() function #111672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -971,6 +971,12 @@ These are the UTF-8 codec APIs: | |
returned buffer always has an extra null byte appended (not included in | ||
*size*), regardless of whether there are any other null code points. | ||
|
||
If *size* is NULL and the *unicode* string contains embedded null | ||
characters, raise an exception. To accept embedded null characters and | ||
truncate on purpose at the first null byte, :c:func:`PyUnicode_AsUTF8Unsafe` | ||
and :c:func:`PyUnicode_AsUTF8AndSize(unicode, &size) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a reference to self. Unlikely it will be useful. |
||
<PyUnicode_AsUTF8AndSize>` can be used instead. | ||
|
||
On error, set an exception, set *size* to ``-1`` (if it's not NULL) and | ||
return ``NULL``. | ||
|
||
|
@@ -987,15 +993,21 @@ These are the UTF-8 codec APIs: | |
.. versionchanged:: 3.10 | ||
This function is a part of the :ref:`limited API <limited-c-api>`. | ||
|
||
.. versionchanged:: 3.13 | ||
Raise an exception if *size* is NULL and the string contains embedded | ||
null characters. | ||
|
||
|
||
.. c:function:: const char* PyUnicode_AsUTF8(PyObject *unicode) | ||
|
||
As :c:func:`PyUnicode_AsUTF8AndSize`, but does not store the size. | ||
Similar to :c:func:`PyUnicode_AsUTF8AndSize(unicode, NULL) | ||
<PyUnicode_AsUTF8AndSize>`, but does not store the size. | ||
Comment on lines
+1003
to
+1004
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Maybe just say that it is equivalent to |
||
|
||
Raise an exception if the *unicode* string contains embedded null | ||
characters. To accept embedded null characters and truncate on purpose | ||
at the first null byte, ``PyUnicode_AsUTF8AndSize(unicode, NULL)`` can be | ||
used instead. | ||
characters. To accept embedded null characters and truncate on purpose at | ||
the first null byte, :c:func:`PyUnicode_AsUTF8Unsafe` and | ||
:c:func:`PyUnicode_AsUTF8AndSize(unicode, &size) <PyUnicode_AsUTF8AndSize>` | ||
can be used instead. | ||
|
||
.. versionadded:: 3.3 | ||
|
||
|
@@ -1005,6 +1017,16 @@ These are the UTF-8 codec APIs: | |
.. versionchanged:: 3.13 | ||
Raise an exception if the string contains embedded null characters. | ||
|
||
.. c:function:: const char* PyUnicode_AsUTF8Unsafe(PyObject *unicode) | ||
|
||
Similar to :c:func:`PyUnicode_AsUTF8`, but do not raise an exception if the | ||
string contains embedded null characters. | ||
|
||
This function can be used to truncate a string on purpose at the first null | ||
character. | ||
|
||
.. versionchanged:: 3.13 | ||
|
||
|
||
UTF-32 Codecs | ||
""""""""""""" | ||
|
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -451,7 +451,13 @@ PyAPI_FUNC(PyObject*) PyUnicode_AsUTF8String( | |
// This function caches the UTF-8 encoded string in the Unicode object | ||
// and subsequent calls will return the same string. The memory is released | ||
// when the Unicode object is deallocated. | ||
PyAPI_FUNC(const char *) PyUnicode_AsUTF8(PyObject *unicode); | ||
PyAPI_FUNC(const char*) PyUnicode_AsUTF8(PyObject *unicode); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW, this function should only be available in the Limited C API 3.13. |
||
|
||
// Similar to PyUnicode_AsUTF8(), but do not raise an exception if the string | ||
// contains embedded null characters. | ||
#if !defined(Py_LIMITED_API) || Py_LIMITED_API+0 >= 0x030D0000 | ||
PyAPI_FUNC(const char*) PyUnicode_AsUTF8Unsafe(PyObject *unicode); | ||
#endif | ||
Comment on lines
+458
to
+460
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe not add it to the Limited C API? |
||
|
||
// Returns a pointer to the UTF-8 encoding of the | ||
// Unicode object unicode and the size of the encoded representation | ||
|
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,4 @@ | ||
The :c:func:`PyUnicode_AsUTF8` function now raises an exception if the | ||
string contains embedded null characters. Patch by Victor Stinner. | ||
The :c:func:`PyUnicode_AsUTF8` and | ||
:c:func:`PyUnicode_AsUTF8AndSize(unicode, NULL) <PyUnicode_AsUTF8AndSize>` | ||
functions now raise an exception if the string contains embedded null | ||
characters. Patch by Victor Stinner. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2480,3 +2480,5 @@ | |
added = '3.13' | ||
[function.PyUnicode_AsUTF8] | ||
added = '3.13' | ||
[function.PyUnicode_AsUTF8Unsafe] | ||
added = '3.13' |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wording differs from the one for
PyUnicode_AsWideCharString()
. It would be better to have the same wording for the same behavior, so the user do not need to search non-existing differences.