-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-44850: Improve the performance of methodcaller. #27782
Conversation
0429366
to
c074967
Compare
Modules/_operator.c
Outdated
mc->vectorcall_args[1 + nargs + i] = value; | ||
++i; | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP 7 says to put else
on the next line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
Modules/_operator.c
Outdated
return NULL; | ||
result = PyObject_Call(method, mc->args, mc->kwds); | ||
} else if (meth_found) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else if (...) {
on its own line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Modules/_operator.c
Outdated
mc->vectorcall_args[0] = obj; | ||
result = PyObject_Vectorcall( | ||
method, mc->vectorcall_args, PyTuple_GET_SIZE(mc->args) + 1, mc->vectorcall_kwnames); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else {
on its own line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Modules/_operator.c
Outdated
result = PyObject_Vectorcall( | ||
method, mc->vectorcall_args, PyTuple_GET_SIZE(mc->args) + 1, mc->vectorcall_kwnames); | ||
} else { | ||
result = PyObject_Call(method, mc->args, mc->kwds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just brainstorming, but couldn't PyObject_VectorcallMethod
be used to replace a bunch of this logic and remove this branch? Or is there a reason not to default to vectorcall in this case? Maybe mc->args
and mc->kwds
could even be removed entirely?
It might also be nice to add more tests to make sure this case is covered for various C- and Python-implemented objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that's a great idea, and makes everything much simpler (Hopefully the removal of the branch also means I can get away with no additional tests...). I squashed the commits together as a result, as the distinction didn't make much sense anymore. Performance doesn't change.
For now I kept args
and kwds
as 1) they avoid having to fiddle with the refcounts of the individual args, and 2) I didn't want to rewrite all of methodcaller_repr and methodcaller_reduce.
@@ -0,0 +1,3 @@ | |||
The performance of `operator.methodcaller` has been improved by using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs ``double backticks``
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
Modules/_operator.c
Outdated
return NULL; | ||
result = PyObject_Call(method, mc->args, mc->kwds); | ||
} else if (meth_found) { | ||
mc->vectorcall_args[0] = obj; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this become a problem if mc() calls itself? I don't think it's a problem, but I just wanted to make sure it was thought through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think yes, also the very existence of https://docs.python.org/3/c-api/call.html#c.PY_VECTORCALL_ARGUMENTS_OFFSET suggests that this pattern can work in "normal" conditions.
It seems tricky to come up with a meaningful test, but
from operator import methodcaller
barcall = methodcaller("bar")
class A:
def bar(self):
print(self)
return barcall(self)
a = A()
a.bar()
correctly results in an infinite recursion.
c074967
to
563988d
Compare
A thought: there could be another performance boost in adding a tp_vectorcall_offset to the |
This is not possible because methodcaller is a heap type, for which vectorcall is not recommended (https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_vectorcall_offset). |
Is that restriction still applicable since |
I think #27001 have made this possible. |
OK, I gave it a try, but I wasn't able to make it work :( If you want to give it a try, the patch is below; the printf() shows that methodcaller_vectorcall is never reached. diff --git i/Modules/_operator.c w/Modules/_operator.c
index de4613a6a8..cee6df24fe 100644
--- i/Modules/_operator.c
+++ w/Modules/_operator.c
@@ -1,5 +1,6 @@
#include "Python.h"
#include "pycore_moduleobject.h" // _PyModule_GetState()
+#include "structmember.h"
#include "clinic/_operator.c.h"
typedef struct {
@@ -1480,8 +1481,44 @@ typedef struct {
PyObject *kwds;
PyObject **vectorcall_args; /* Borrowed references */
PyObject *vectorcall_kwnames;
+ vectorcallfunc vectorcall;
} methodcallerobject;
+static PyObject *
+methodcaller_vectorcall(
+ methodcallerobject *mc, PyObject *const *args, size_t nargsf, PyObject* kwnames)
+{
+ printf("vectorcall\n");
+ if (!_PyArg_NoKwnames("methodcaller", kwnames))
+ return NULL;
+ if (PyVectorcall_NARGS(nargsf) != 1) {
+ PyErr_Format(
+ PyExc_TypeError,
+ "methodcaller expected 1 argument, got %zd",
+ PyVectorcall_NARGS(nargsf));
+ return NULL;
+ }
+ mc->vectorcall_args[0] = args[0];
+ return PyObject_VectorcallMethod(
+ mc->name, mc->vectorcall_args,
+ (1 + PyTuple_GET_SIZE(mc->args)) | PY_VECTORCALL_ARGUMENTS_OFFSET,
+ mc->vectorcall_kwnames);
+}
+
+static PyObject *
+methodcaller_call(methodcallerobject *mc, PyObject *args, PyObject *kw)
+{
+ if (!_PyArg_NoKeywords("methodcaller", kw))
+ return NULL;
+ if (!_PyArg_CheckPositional("methodcaller", PyTuple_GET_SIZE(args), 1, 1))
+ return NULL;
+ mc->vectorcall_args[0] = PyTuple_GET_ITEM(args, 0);
+ return PyObject_VectorcallMethod(
+ mc->name, mc->vectorcall_args,
+ (1 + PyTuple_GET_SIZE(mc->args)) | PY_VECTORCALL_ARGUMENTS_OFFSET,
+ mc->vectorcall_kwnames);
+}
+
/* AC 3.5: variable number of arguments, not currently support by AC */
static PyObject *
methodcaller_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
@@ -1548,6 +1585,7 @@ methodcaller_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
else {
mc->vectorcall_kwnames = NULL;
}
+ mc->vectorcall = (vectorcallfunc)methodcaller_vectorcall;
PyObject_GC_Track(mc);
return (PyObject *)mc;
@@ -1584,20 +1622,6 @@ methodcaller_traverse(methodcallerobject *mc, visitproc visit, void *arg)
return 0;
}
-static PyObject *
-methodcaller_call(methodcallerobject *mc, PyObject *args, PyObject *kw)
-{
- if (!_PyArg_NoKeywords("methodcaller", kw))
- return NULL;
- if (!_PyArg_CheckPositional("methodcaller", PyTuple_GET_SIZE(args), 1, 1))
- return NULL;
- mc->vectorcall_args[0] = PyTuple_GET_ITEM(args, 0);
- return PyObject_VectorcallMethod(
- mc->name, mc->vectorcall_args,
- (1 + PyTuple_GET_SIZE(mc->args)) | PY_VECTORCALL_ARGUMENTS_OFFSET,
- mc->vectorcall_kwnames);
-}
-
static PyObject *
methodcaller_repr(methodcallerobject *mc)
{
@@ -1722,6 +1746,12 @@ static PyMethodDef methodcaller_methods[] = {
reduce_doc},
{NULL}
};
+
+static PyMemberDef methodcaller_members[] = {
+ {"__vectorcalloffset__", T_PYSSIZET, offsetof(methodcallerobject, vectorcall), READONLY},
+ {NULL}
+};
+
PyDoc_STRVAR(methodcaller_doc,
"methodcaller(name, ...) --> methodcaller object\n\
\n\
@@ -1737,6 +1767,7 @@ static PyType_Slot methodcaller_type_slots[] = {
{Py_tp_traverse, methodcaller_traverse},
{Py_tp_clear, methodcaller_clear},
{Py_tp_methods, methodcaller_methods},
+ {Py_tp_members, methodcaller_members},
{Py_tp_new, methodcaller_new},
{Py_tp_getattro, PyObject_GenericGetAttr},
{Py_tp_repr, methodcaller_repr}, |
The patch should work adding |
563988d
to
a4a04bf
Compare
Ah yes, thanks, it now works, and this provided another ~25-33% speedup on the same benchmarks. |
@@ -1696,6 +1745,12 @@ static PyMethodDef methodcaller_methods[] = { | |||
reduce_doc}, | |||
{NULL} | |||
}; | |||
|
|||
static PyMemberDef methodcaller_members[] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this being added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the logic has been introduced by #20026 instead of introducing a type slot for tp_vectorcall_offset. (Documentation is here)
Without benchmarks showing This PR also makes |
Here's a still toyish example, but actually representative of the original motivation (which was in fact to speed up from operator import methodcaller
import secrets
from timeit import Timer
# Construct many rows of pseudo-random data...
randdata = [secrets.token_hex(4) for _ in range(1_000_000)]
# ... on which we want to apply some simple operation. We'll also sum() these
# (or perform some other kind of reduction).
strcount0 = methodcaller("count", "0")
tglobals = {"randdata": randdata, "strcount0": strcount0}
# With a genexp.
timer = Timer("sum(s.count('0') for s in randdata)",
globals=tglobals)
n, t = timer.autorange()
print(t/n)
# With map + methodcaller.
timer = Timer("sum(map(strcount0, randdata))",
globals=tglobals)
n, t = timer.autorange()
print(t/n)
# Check that we got our implementation right.
assert sum(s.count('0') for s in randdata) == sum(map(strcount0, randdata)) Before this patch, the timings where
After, they go to
i.e. speeding up methodcaller significantly improved the performance. Note that the docs of the itertools module expressly suggest using the "high-speed functions in the operator module", which (admittedly implicitly) hints that the per-iteration performance of operators is perhaps more worthy of consideration that the construction time. |
Edit: I missed the comment about lambdas at https://bugs.python.org/issue44850#msg399903, so here's the same benchmark as above including the lambda approach (unsuprisingly the slowest): from operator import methodcaller
import secrets
from timeit import Timer
# Construct many rows of pseudo-random data...
randdata = [secrets.token_hex(4) for _ in range(1_000_000)]
# ... on which we want to apply some simple operation. We'll also sum() these
# (or perform some other kind of reduction).
func = lambda x: x.count("0")
mc = methodcaller("count", "0")
tglobals = {"randdata": randdata, "func": func, "mc": mc}
# With a genexp.
timer = Timer("sum(s.count('0') for s in randdata)", globals=tglobals)
n, t = timer.autorange()
print("genexp ", t/n)
# With map + lambda.
timer = Timer("sum(map(func, randdata))", globals=tglobals)
n, t = timer.autorange()
print("lambda ", t/n)
# With map + methodcaller.
timer = Timer("sum(map(mc, randdata))", globals=tglobals)
n, t = timer.autorange()
print("methcall", t/n)
# Check that we got our implementation right.
assert sum(s.count('0') for s in randdata) == sum(map(func, randdata)) == sum(map(mc, randdata)) Results:
|
Modules/_operator.c
Outdated
static PyObject * | ||
methodcaller_call(methodcallerobject *mc, PyObject *args, PyObject *kw) | ||
{ | ||
PyObject *method, *obj, *result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
methodcaller_call()
will be almost unused ?
If so, I prefer to keep the original unchanged for maintenability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not changed at all, just moved next to methodcaller_vectorcall as I think that makes things easier to track (and then both have to come before methodcaller_new, or else we'd need a forward declaration; I can switch to that if you prefer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible for methodcaller_call
to rollback to the previous version implemented with PyObject_Call
? Any position seems fine.
It is over 5 years old (well tested), and sufficient for a fallback function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you mean, sorry I forgot the change. I just put the whole thing back where it was and with the old version, to decrease the diff.
@@ -0,0 +1,3 @@ | |||
The performance of ``operator.methodcaller`` has been improved by using | |||
``_PyObject_GetMethod`` (avoiding the creation of a bound method in many |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NEWS seems outdated (_PyObject_GetMethod
is not in the diff).
And it might be better to rewrite this PR's introduction (which now looks historically complicated), including benchmark results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, edited.
22b89cf
to
5897c07
Compare
5897c07
to
12a80bd
Compare
This PR is stale because it has been open for 30 days with no activity. |
This is still ready from my side, AFAICT. |
This all looks reasonable to me. |
@anntzer The PR has merge conflicts, but they are trivial to resolve (feel free to merge https://github.com/eendebakpt/cpython/tree/fastmethodcaller where I merged with current main). Compared to current main this PR is still faster. With:
I get
|
@eendebakpt If you are interested, do you mind taking over this patch? I've kind of moved on since 2y ago. |
This pr can be closed, since the alternate has been merged. |
See discussion at https://bugs.python.org/issue44850.
This is currently split into two separate commits, so that the separate speedups from _PyObject_GetMethod and vectorcall can be evaluated separately (plus a third doc-only commit). Edit: Now a single commit, per #27782 (comment), so only consider the first and last benchmarks.
The benchmark script tests builtin and python methods, with and without kwargs:
Before:
Using _PyObject_GetMethod:
Using _PyObject_GetMethod and vectorcall:
https://bugs.python.org/issue44850