Skip to content

bpo-46323: Use PyObject_Vectorcall while calling ctypes callback function #31138

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 8, 2022

Conversation

corona10
Copy link
Member

@corona10 corona10 commented Feb 5, 2022

https://bugs.python.org/issue46323


With v3 benchmark

Mean +- std dev: [base] 1.32 us +- 0.02 us -> [opt] 1.18 us +- 0.02 us: 1.13x faster

@corona10
Copy link
Member Author

corona10 commented Feb 5, 2022

@erlend-aasland Can you please take a look?

@corona10
Copy link
Member Author

corona10 commented Feb 5, 2022


Raised RLIMIT_NOFILE: 256 -> 1024
0:00:00 load avg: 4.50 Run tests sequentially
0:00:00 load avg: 4.50 [1/1] test_ctypes
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

Total duration: 1.9 sec
Tests result: SUCCES

@corona10
Copy link
Member Author

corona10 commented Feb 5, 2022

Tests / Address sanitizer (pull_request)

             ^^^^^^^^^^^
  File "/home/runner/work/cpython/cpython/Lib/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib.error.HTTPError: HTTP Error 404: Not Found

Unrelated

@corona10 corona10 added the performance Performance or resource usage label Feb 5, 2022
@corona10 corona10 changed the title bpo-46323: Use PyObject_Vectorcall while calling ctypes callback funtion bpo-46323: Use PyObject_Vectorcall while calling ctypes callback function Feb 5, 2022
@corona10 corona10 closed this Feb 5, 2022
@corona10 corona10 reopened this Feb 5, 2022
@vstinner
Copy link
Member

vstinner commented Feb 5, 2022

Mean +- std dev: [base] 552 ns +- 6 ns -> [vectorcall] 541 ns +- 5 ns: 1.02x faster

-11 ns is not really impressive :-(

@corona10 corona10 requested a review from vstinner February 5, 2022 13:13
@corona10
Copy link
Member Author

corona10 commented Feb 5, 2022

@vstinner Updated!

@erlend-aasland
Copy link
Contributor

@erlend-aasland Can you please take a look?

Sorry, I don't have time until earliest next weekend :(

(I remember doing similar stuff with some of the sqlite3 callbacks some months ago, but I discarded those changes because the added complexity did not outweigh the performance gain.)

@corona10
Copy link
Member Author

corona10 commented Feb 6, 2022

@vstinner cc @erlend-aasland

Yeah the performance improvement itself is not impressive,
but considering _ctypes callback feature is widely used, doing something is better than doing nothing since it shows consistently 1-2 % improvement.
And we are fighting for 1-2 % improvement for these days :)

@vstinner
Copy link
Member

vstinner commented Feb 6, 2022

Yeah the performance improvement itself is not impressive, but considering _ctypes callback feature is widely used, doing something is better than doing nothing since it shows consistently 1-2 % improvement. And we are fighting for 1-2 % improvement for these days :)

My rule is that an micro-optimization is worth it if it's at least 10% faster on a micro-benchmark. On a macro-benchmark like pyperformance, smaller speedup are worth it, but I have no global rule for that.

Here I would expect to save 20 ns by avoiding the creating a tuple for positional arguments, but it's about -11 ns. Maybe "20 ns" is what I had in mind for my laptop CPU, but on your CPU, it's closer to 11 ns?

@vstinner
Copy link
Member

vstinner commented Feb 6, 2022

My rule is that an micro-optimization is worth it if it's at least 10% faster on a micro-benchmark.

It's not a strict rule and you're free to not follow it ;-)

@corona10

This comment was marked as resolved.

@corona10 corona10 closed this Feb 6, 2022
@corona10 corona10 reopened this Feb 7, 2022
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@corona10 corona10 merged commit b552768 into python:main Feb 8, 2022
@corona10 corona10 deleted the bpo-46323 branch February 8, 2022 13:09
Copy link
Contributor

@erlend-aasland erlend-aasland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (post merge 😆), good job! (I'll revisit my sqlite3 vectorcall branches now.)

Comment on lines 158 to 160
/* Hm. What to return in case of error?
For COM, 0xFFFFFFFF seems better than 0.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is now misleading; it should be removed. Previously, PySequence_Length could return -1 on error, but PyTuple_GET_SIZE always succeeds (or, more correct: it does no error checking).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants