Skip to content

Reference counting bug with manually allocated heap types #128923

Closed
@colesbury

Description

@colesbury

Bug report

Found by @vfdev-5.

This is specific to the free threading build and 3.14.

XLA/Jax uses the following code to create a heap type:

  // We need to use heap-allocated type objects because we want to add
  // additional methods dynamically.
...
    nb::str name = nb::str("PmapFunction");
    nb::str qualname = nb::str("PmapFunction");
    PyHeapTypeObject* heap_type = reinterpret_cast<PyHeapTypeObject*>(
        PyType_Type.tp_alloc(&PyType_Type, 0));
    // Caution: we must not call any functions that might invoke the GC until
    // PyType_Ready() is called. Otherwise the GC might see a half-constructed
    // type object.
    CHECK(heap_type) << "Unable to create heap type object";
    heap_type->ht_name = name.release().ptr();
    heap_type->ht_qualname = qualname.release().ptr();
   ...

https://github.com/openxla/xla/blob/19a8e8e05fb34c5c4b8c38c9a8225e89f008c8c1/xla/python/pmap_lib.cc#L1027-L1058

In other words, the heap type is created by by calling PyType_Type.tp_alloc and filling in the fields, instead of the more common use of PyType_FromSpec. This leaves unique_id zero initialized. The problem is that unique_id=0 currently looks like a valid unique id for per-thread reference counting, which leads to reference counting errors and use-after-frees.

I think we should change the per-thread reference counting so that unique_id=0 is the sentinel value indicating that it's not assigned instead of the current unique_id=-1 convention.

Full repro

Linked PRs

Metadata

Metadata

Assignees

Labels

3.14new features, bugs and security fixestopic-free-threadingtype-bugAn unexpected behavior, bug, or errortype-crashA hard crash of the interpreter, possibly with a core dump

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions