Skip to content

make marshal output not dependent on reference count #98819

Open
@carljm

Description

@carljm

Bug report

Currently the marshal module will emit a previously-unseen object flagged as a potential reference from later objects, unless the object has a reference count of 1. See

if (Py_REFCNT(v) == 1 &&

This is an overly-conservative heuristic -- it's easy to construct cases where an object has a reference count >1 but is not actually referenced by any other object about to be marshaled, so FLAG_REF is set when it does not need to be.

This makes marshal output unstable depending on accidents of reference counting behavior in the code calling marshal.dumps.

I ran into this because the Cinder JIT is able to reduce unnecessary increfs, and that resulted in some importlib tests failing on comparison of marshal output at

data.extend(marshal.dumps(code_object))
self.assertEqual(self.loader.written[self.cached], bytes(data))
because under Cinder JIT the reference count of code_object in that method is 1.

This previously caused issues in distutils reproducibility, resulting in a partial fix that applies only to interned strings: #8226

It would be better if marshal would actually determine which objects have multiple parents in the DAG and deterministically use FLAG_REF or not based on that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)type-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions