-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-39087: Optimize PyUnicode_AsUTF8AndSize(). #18327
Conversation
Avoid using temporary bytes object.
Objects/stringlib/codecs.h
Outdated
|
||
#if STRINGLIB_SIZEOF_CHAR > 1 | ||
error: | ||
Py_XDECREF(rep); | ||
Py_XDECREF(error_handler_obj); | ||
Py_XDECREF(exc); | ||
_PyBytesWriter_Dealloc(&writer); | ||
_PyBytesWriter_Dealloc(writer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you pass a writer to a function, I prefer that the caller is responsible to call _PyBytesWriter_Finish() and _PyBytesWriter_Dealloc(). Here it doesn't if it's success (non-NULL) or an error (NULL) :-(
} | ||
|
||
char *start = writer.use_small_buffer ? writer.small_buffer : | ||
PyBytes_AS_STRING(writer.buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use _PyBytesWriter_AsString() function here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is static function: we can not call it from here.
|
||
char *start = writer.use_small_buffer ? writer.small_buffer : | ||
PyBytes_AS_STRING(writer.buffer); | ||
Py_ssize_t len = end - start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: _PyBytesWriter_GetSize() exists, but using it would be slower and your code is correct ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is static function too.
} | ||
_PyUnicode_UTF8(unicode) = cache; | ||
_PyUnicode_UTF8_LENGTH(unicode) = len; | ||
memcpy(cache, start, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read somewhere that memcpy(dst, src, 0) is an undefined behavior. Maybe:
memcpy(cache, start, len); | |
if (len != 0) { memcpy(cache, start, len); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, memcpy(dst, src, 0)
is undefined when dst
is invalid address. When dest is valid address, it is safe.
https://stackoverflow.com/questions/29844298/is-it-legal-to-call-memcpy-with-zero-length-on-a-pointer-just-past-the-end-of-an?rq=1
Codecov Report
@@ Coverage Diff @@
## master #18327 +/- ##
===========================================
+ Coverage 79.45% 83.13% +3.68%
===========================================
Files 385 1571 +1186
Lines 174490 414817 +240327
Branches 0 44458 +44458
===========================================
+ Hits 138638 344854 +206216
- Misses 35852 60338 +24486
- Partials 0 9625 +9625 Continue to review full report at Codecov.
|
Thanks, that's a nice optimization ;-) |
Avoid using temporary bytes object.
ref: bpo-39087