-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-38644: Add _PyObject_VectorcallTstate() #17052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jdemeyer: Would it be possible to change _PyObject_Vectorcall() and _PyObject_MakeTpCall() to add a tstate parameter? Or is it better to add new functions? The functions are private, so we don't provide any backward compatibility warranty. |
Please benchmark this very carefully on Windows. One note from Mark was that there's a performance loss on Windows when passing more than 4 arguments in C. That the main reason why Vectorcall combines the "number of args" and "flags" into a single argument. |
tl; dr I don't see any significant performance difference when running microbenchmarks on Linux. Linux x86-64 ABI allows to pass up to six function parameters as registers: https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI I ran a benchmark on Linux (Fedora 30) with CPU isolation ( EDIT: I compiled Python with PGO+LTO: I used all benchmarks called "bench*call.py" in my https://github.com/vstinner/pymicrobench project: collection of Python microbenchmarks.
If I ignore differences smaller than 5%, all results are shown as "Not significant". They are micro benchmarks. IMHO on a microbenchmark, a significant difference should be at least 10% (-10% or +10%).
Raw results:
Script to run the benchmark:
I moved all .json to ref/ (reference Python) or patch/ (patched Python). Script to compare results:
|
That's not surprising. Are you planning to run Windows benchmarks as well? |
I'm now trying to redo the same benchmark on Windows. |
It's hard to me to understand microbenchmark results on Windows because I don't know how to minimize the std dev. I installed psutil, so pyperf calls proc.nice(psutil.REALTIME_PRIORITY_CLASS) to set the benchmark process to the highest priority. But all differences are smaller than 10%. Some microbenchmarks are faster, some are slower. But it may come from the benchmark "noise". Benchmarks on Windows. I used the following commands to build Python:
I cloned https://github.com/vstinner/pymicrobench To create the venv, I used:
I downloaded https://files.pythonhosted.org/packages/03/9a/95c4b3d0424426e5fd94b5302ff74cea44d5d4f53466e1228ac8e73e14b4/psutil-5.6.5.tar.gz and extracted using "python -m tarfile -e psutil-5.6.5.tar.gz". I hacked its setup.py to add
Then I ran benchmarks using:
Comparison ignoring differences smaller than 5%:
Comparison:
The problem is that I don't know how to run benchmar ks on Windows. For example, on \bench_fastcall_partial.json, the std dev is between +- 0.8 ns and +- 9 ns:
When I read |
* Add _PyObject_VectorcallTstate() function: similar to _PyObject_Vectorcall(), but with tstate parameter * Add tstate parameter to _PyObject_MakeTpCall()
I removed _PyObject_MakeTpCallTstate(): I modified _PyObject_MakeTpCall() to add a tstate parameter instead. You should not call _PyObject_MakeTpCall() directly. But I chose to leave _PyObject_Vectorcall() unchanged, and add _PyObject_VectorcallTstate(). |
I didn't notice any significant performance overhead of this change, nor speedup. The balance seems to be null. This change is more about correctness rather than performance. See https://bugs.python.org/issue36710 for the rationale. |
I'm not really keen on this change, especially the change to What would be helpful from a performance point of view is to fix up |
@markshannon: "I'm not really keen on this change, (...)" I started a thread on python-dev: https://mail.python.org/archives/list/python-dev@python.org/thread/PQBGECVGVYFTVDLBYURLCXA3T7IPEHHO/ I invite you to participate to the discussion there ;-) |
* Add _PyObject_VectorcallTstate() function: similar to _PyObject_Vectorcall(), but with tstate parameter * Add tstate parameter to _PyObject_MakeTpCall()
* Add _PyObject_VectorcallTstate() function: similar to _PyObject_Vectorcall(), but with tstate parameter * Add tstate parameter to _PyObject_MakeTpCall()
_PyObject_Vectorcall() with tstate parameter
https://bugs.python.org/issue38644