Mining type information from the argument clinic

Hi Faster CPython team,

TL;DR: The argument clinic has information that we can mine to better
type-specialize code. This is probably most helpful in a JIT setting.

## Overview

We can make calls to native and extension code faster. Right now, extension
code is an opaque blob with these interpreter-style boundary checks. This
works, and works well. We can make it do more with not much more code!

The clinic has information about types, both primitive and managed, for both
arguments and returned values. It uses this information to generate stubs to
check arity, argument types, unbox arguments into primitives, and to box return
values.

In the future, if/when you add a JIT, it will be helpful to have type
information about what C extension code you call[^hpy]. This can help avoid
unnecessary checks and boxing at the boundaries (call and return). It might even
help today with interpreter inline caches.

[^hpy]: Other runtimes like Cinder, PyPy, and GraalPython would like it as
    well.

## Implementation

I prototyped this for [Cinder][cinder] using our existing
[TypedMethodDef][typed methoddef] information. But where would we
store that information in a normal `PyMethodDef`?

[cinder]: https://github.com/facebookincubator/cinder/
[typed methoddef]: https://github.com/facebookincubator/cinder/blob/ba672f357269e80e529fa9bd84d60133ec8b5f2b/Include/classloader.h#L99
[primitives]: https://github.com/facebookincubator/cinder/blob/ba672f357269e80e529fa9bd84d60133ec8b5f2b/Include/classloader.h#L462

The simple approach for this would be to modify the `PyMethodDef` structs with
type information, but this breaks the ABI, which is a non-starter.

My [current approach][gist] is to generate a new struct:

[gist]: https://gist.github.com/tekknolagi/f4acd2202f6448d4a5813a43eda90121

```c
struct TypedMethodMetadata {
  // This `type' field is a placeholder and could be any metadata we deem
  // useful. In this small demo I made it an `int' to be self-contained, but it
  // could be something similar to Cinder's TypedMethodDef, or more, or less.
  int type;
  const char ml_name[100];
};
```

and then embed the name pointer inside a `PyMethodDef`:

```c
// Can be generated by the clinic
TypedMethodMetadata foo = {12345, "method.foo"};
TypedMethodMetadata bar = {67890, "method.bar"};


PyMethodDef defs[] = {
  // Can be generated by the clinic
  {(const char*)foo.ml_name, NULL, 0, "foo doc"},
  {(const char*)bar.ml_name, NULL, 0, "bar doc"},
  {NULL, NULL},
};
```

This preserves existing behavior of the `PyMethodDef` struct but also allows
someone who knows what they are looking for to go find the additional metadata:

```c
int main() {
  for (int i = 0; defs[i].ml_name != NULL; i++) {
    TypedMethodMetadata* def = (TypedMethodMetadata*)(defs[i].ml_name - offsetof(TypedMethodMetadata, ml_name));
    fprintf(stderr, "type is %10d\tname is %s\tdoc is %s\n", def->type, defs[i].ml_name, defs[i].ml_doc);
  }
}
```

But how does someone know to look for more metadata? I propose adding a new
flag, `METH_TYPED` that can be or-ed with the existing `ml_flags`. The clinic
can generate this fairly easily.

## What this looks like in a hypothetical JIT

Consider a JIT that works on an IR that is on a very similar level of
abstraction to CPython bytecode. An admittedly contrived [Python
function][trycinder] like:

[trycinder]: https://trycinder.com/#state=eyJjb2RlIjoiZGVmIHRlc3QoKTpcbiAgeCA9IFwiaGVsbG8sd29ybGRcIlxuICByZXR1cm4geC51cHBlcigpIiwicGFzc2VzIjpbIkJ5dGVjb2RlIiwiRGVhZENvZGVFbGltaW5hdGlvbiJdLCJ1c2Vfc3RhdGljX3B5dGhvbiI6ZmFsc2UsImFzbV9zeW50YXgiOiJpbnRlbCJ9

```python
def test():
  x = "hello,world"
  return x.upper()
```

Produces the following (as of 3.10) bytecode:

```
0:  LOAD_CONST 1: 'hello,world'
2:  STORE_FAST 0: x
4:  LOAD_FAST 0: x
6:  LOAD_METHOD 0: upper
8:  CALL_METHOD 0
10: RETURN_VALUE 0
```

From which the optimizer (before these changes) can generate the following
register-based JIT IR:

```
v6:UnicodeExact["hello,world"] = LoadConst<UnicodeExact["hello,world"]>
v9:ObjectUser[method_descriptor:0x7ff8f4b94950] = LoadConst<ObjectUser[method_descriptor:0x7ff8f4b94950]>
v11:Object = VectorCallStatic<1> v9 v6
Return v11
```

Note that the JIT has done some flow typing, so that it knows the constant is a
`str`. Because of the flow typing, it has also been able to elide the run-time
method lookup, since the self type is known at JIT-time. But it can't optimize
through the method[^metadata] and it does not know anything about its
properties: `upper` is an opaque C extension function.

[^metadata]: You will see a slightly different result on the Cinder Explorer
    because Cinder 3.10 has some manually added metadata tables inside the JIT.
    This allows the optimizer to learn that the result of `str.upper` is always
    a `str`, for example. I have omitted this for the demo here since the
    clinic changes would remove the need for such a table.

At run-time, this vectorcall will:

* Check the number of arguments provided
* Check that the `self` is a `str`
* Always return a `str`

We can avoid doing all of this work based on information that the clinic
already knows. The hypothetical JIT could reach into the method descriptor and
see if it's typed. If it's typed, it could fetch its signature and check that
it's correct at JIT-time. If it's correct, it need not call the existing clinic
wrapper function: it can directly call the underlying function instead.

This might look something like:

```
v6:UnicodeExact["hello,world"] = LoadConst<UnicodeExact["hello,world"]>
v12:ObjectUser[PyCFunction:0xdeadbeef] = LoadConst<ObjectUser[PyCFunction:0xdeadbeef]>
v11:UnicodeExact = CallCFunction<1> v12 v6
Return v11
```

Note that the return type is now known, that we are skipping vectorcall
overhead (including arity checks, type checks, recursive call checks), and that
we are now doing a direct C call to the underlying (non-clinic) function.

## Wrap-up

All in all, my (shoddy prototype) patch to Cinder is ~200 lines of hand-written
source and some more generated clinic changes.

Please let me know what you think.

Thanks,

Max

## Future work

* Writing an upstreamable clinic patch: my current patch is really ugly since I
  did not invest enough time understanding the right place to make edits to the
  clinic
* Adding support for default arguments
* Writing a standard library in (a typed dialect of) Python so that it can be
  easily understood by an optimizer. We largely did this in
  [Skybison][skybison], but did not get to write a JIT before the project got
  wound down.

[skybison]: https://github.com/tekknolagi/skybison

cc @carljm


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mining type information from the argument clinic #546

Overview

Implementation

What this looks like in a hypothetical JIT

Wrap-up

Future work

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mining type information from the argument clinic #546

Description

Overview

Implementation

What this looks like in a hypothetical JIT

Wrap-up

Future work

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions