Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-39087: Optimize PyUnicode_AsUTF8AndSize(). #18327

Merged
merged 3 commits into from
Feb 27, 2020

Conversation

methane
Copy link
Member

@methane methane commented Feb 3, 2020

Avoid using temporary bytes object.

ref: bpo-39087

Avoid using temporary bytes object.
@methane methane added performance Performance or resource usage and removed awaiting core review labels Feb 3, 2020
@methane methane changed the title bpo-17659: Optimize PyUnicode_AsUTF8AndSize(). bpo-39087: Optimize PyUnicode_AsUTF8AndSize(). Feb 3, 2020
@methane methane added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Feb 12, 2020
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @methane for commit 33a3a8a 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Feb 12, 2020
@methane methane requested a review from vstinner February 14, 2020 08:56

#if STRINGLIB_SIZEOF_CHAR > 1
error:
Py_XDECREF(rep);
Py_XDECREF(error_handler_obj);
Py_XDECREF(exc);
_PyBytesWriter_Dealloc(&writer);
_PyBytesWriter_Dealloc(writer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you pass a writer to a function, I prefer that the caller is responsible to call _PyBytesWriter_Finish() and _PyBytesWriter_Dealloc(). Here it doesn't if it's success (non-NULL) or an error (NULL) :-(

}

char *start = writer.use_small_buffer ? writer.small_buffer :
PyBytes_AS_STRING(writer.buffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use _PyBytesWriter_AsString() function here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is static function: we can not call it from here.


char *start = writer.use_small_buffer ? writer.small_buffer :
PyBytes_AS_STRING(writer.buffer);
Py_ssize_t len = end - start;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: _PyBytesWriter_GetSize() exists, but using it would be slower and your code is correct ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is static function too.

}
_PyUnicode_UTF8(unicode) = cache;
_PyUnicode_UTF8_LENGTH(unicode) = len;
memcpy(cache, start, len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read somewhere that memcpy(dst, src, 0) is an undefined behavior. Maybe:

Suggested change
memcpy(cache, start, len);
if (len != 0) { memcpy(cache, start, len); }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, memcpy(dst, src, 0) is undefined when dst is invalid address. When dest is valid address, it is safe.
https://stackoverflow.com/questions/29844298/is-it-legal-to-call-memcpy-with-zero-length-on-a-pointer-just-past-the-end-of-an?rq=1

@codecov
Copy link

codecov bot commented Feb 21, 2020

Codecov Report

Merging #18327 into master will increase coverage by 3.68%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #18327       +/-   ##
===========================================
+ Coverage   79.45%   83.13%    +3.68%     
===========================================
  Files         385     1571     +1186     
  Lines      174490   414817   +240327     
  Branches        0    44458    +44458     
===========================================
+ Hits       138638   344854   +206216     
- Misses      35852    60338    +24486     
- Partials        0     9625     +9625     
Impacted Files Coverage Δ
Modules/_codecsmodule.c
Objects/methodobject.c
Modules/clinic/_codecsmodule.c.h
Modules/clinic/_heapqmodule.c.h
Modules/_sha3/sha3module.c
Python/Python-ast.c
Modules/_sqlite/module.c
Objects/clinic/moduleobject.c.h
Programs/python.c
Modules/_json.c
... and 1944 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1f44e77...ad88c38. Read the comment docs.

@methane methane merged commit 02a4d57 into python:master Feb 27, 2020
@methane methane deleted the fast-asutf8 branch February 27, 2020 04:49
@vstinner
Copy link
Member

vstinner commented Mar 2, 2020

Thanks, that's a nice optimization ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants