Faster builtin functions by performing more work at specialization time. #296

markshannon · 2022-02-24T14:13:03Z

markshannon
Feb 24, 2022
Collaborator

A lot of the work done by builtin functions, especially simpler ones, is argument parsing, type checking, unboxing and boxing.

If you take a look at PyArg_ParseTupleAndKeywords you will the see the amount of work that needs to done to handle the general case.

Many builtin functions have five phases:

Parse arguments
Type check arguments
Unbox arguments
Do the actual work
Box any result

Argument clinic already generates wrappers that break the above down into 3 phases:

Parse, type check and unbox using PyArg_Parse...
Do the actual work
Box any result

We can change this so that the argument clinic generated code skips the parse phase:

Type check arguments
Unbox arguments
Do the actual work
Box any result

The interpreter can do the "0th" phase of parsing the arguments.

This would work as follows:
Add a new METH_N calling convention that takes exactly N arguments, as defined in the MethodDef struct.
So, if N were 1, the function pointer would have the signature f(PyObject *callable, PyObject *args[1])
There would need to be an upper limit on N, probably about 6.

The vectorcall implementation of this would need to do the parsing, but that's no less efficient that what we do at the moment.

For example consider the builtin method str.encode with the signature encode(self, /, encoding='utf-8', errors='strict').
If we call "hi".encode(errors="ignore") there is a lot of parsing of arguments that has to be done for every call.

With METH_N we can have argument clinic define this function:

PyObject *str_encode(PyObject * _unused, PyObject *args[3])
{
     PyObject *self = args[0];
     PyObject *encoding = args[1];
     if (encoding == NULL) {
          encoding = &PyGlobals.strings.utf_8
     }
     PyObject *errors = args[2];
     if (errors == NULL) {
          errors = &PyGlobals.strings.strict
     }
     return str_encode_impl(self, encoding, errors);
}

Which is reasonably slick.

Note that we must call str_encode with exactly 3 arguments, correctly parsed. This is fiddly in the vectorcall wrapper, but can be specialized nicely.

Going back to our example of "hi".encode(errors="ignore") the three arguments we should be passing are "hi", NULL, "ignore"
We can parse the arguments at specializing time, creating a permutation array that can be evaluated quickly for each call.
One way that could work is:

Unconditional push NULL to the stack.
Do a series of swaps to get the stack in the correct layout.

Most calls don't do anything fancy, so the we would probably special case the "already in the right order" case.

Implementing this

This only really works if we can move most existing builtin functions to the new form.
To do that we need to make it:

Easy to use, the implementation function should be the same as, or simpler than, the current argument clinic form.
It should be faster than the current form in all cases when called from Python code.
It shouldn't be slower than the current form when called using the vectorcall calling convention.

Some open questions

There are a few details I've glossed over in the above discussion:

Where do we store the signature? We need the signature for specialization and vectorcall. The signature could be represented as something like a _PyArg_Parser struct.
How do we check for and handle errors passing the wrong number of arguments? We could NULL terminate the array, at least for debug builds, but would that be sufficient?

gvanrossum · 2022-02-24T16:57:43Z

gvanrossum
Feb 24, 2022
Maintainer

Isn't there a near-infinite number of builtin functions and methods that we'd have to convert this way before it pays off? Even if we could collect the set of most-used builtins from PyPerformance (or the Bloomberg demo for that matter), that wouldn't necessarily translate to other apps.

0 replies

markshannon · 2022-02-24T18:18:57Z

markshannon
Feb 24, 2022
Collaborator Author

It's up to the extension authors. If they want their extensions to be fast, they can use this. I don't expect any immediate results.

0 replies

cfbolz · 2022-02-25T12:50:34Z

cfbolz
Feb 25, 2022

One possible idea in this general space would be to make cython-generated extensions use the new mechanism automatically, which would bring the benefits to a whole bunch of extensions at once.

0 replies

markshannon · 2022-03-21T16:00:36Z

markshannon
Mar 21, 2022
Collaborator Author

It looks like we might get results for this sooner rather than later.

The latest stats show large slowdowns for the regex benchmarks relative to 3.10, which is (from a fairly superficial inspection) due to non-specialization (and thus repeatedly cycling though the specializer) of re.Pattern.search and maybe other methods.
In this particular case, re.Pattern.search uses the flags METH_METHOD|METH_FASTCALL|METH_KEYWORDS

Replacing all these special cases with METH_N would be profitable.

0 replies

markshannon · 2022-03-21T16:00:59Z

markshannon
Mar 21, 2022
Collaborator Author

@cfbolz Would this be useful for PyPy?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Faster builtin functions by performing more work at specialization time. #296

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Faster builtin functions by performing more work at specialization time. #296

Uh oh!

markshannon Feb 24, 2022 Collaborator

Implementing this

Some open questions

Replies: 5 comments

Uh oh!

gvanrossum Feb 24, 2022 Maintainer

Uh oh!

markshannon Feb 24, 2022 Collaborator Author

Uh oh!

cfbolz Feb 25, 2022

Uh oh!

markshannon Mar 21, 2022 Collaborator Author

Uh oh!

markshannon Mar 21, 2022 Collaborator Author

markshannon
Feb 24, 2022
Collaborator

gvanrossum
Feb 24, 2022
Maintainer

markshannon
Feb 24, 2022
Collaborator Author

cfbolz
Feb 25, 2022

markshannon
Mar 21, 2022
Collaborator Author

markshannon
Mar 21, 2022
Collaborator Author