Skip to content

Code objects have incorrect hash/equality when code is instrumented for sys.monitoring #111984

Closed
@nedbat

Description

@nedbat

Bug report

Bug description:

A thread stress test in the coverage.py test suite was failing with seemingly impossible behavior: a code object is used as a key in a dict, and then cannot be found. I eventually tracked it down to a mismatch between how bytecodes are examined in code_hash and in code_richcompare.

I have instructions for demonstrating the problem if you like, though it's a thread race condition, so it's a bit involved: https://gist.github.com/nedbat/2fb43527d102b8bb6e7fb7f3400fa3a9

Hash computation

When computing the hash, opcodes are checked if they are instrumented, and if so, use the original opcodes:

static Py_hash_t
code_hash(PyCodeObject *co)
{
    //...
    for (int i = 0; i < Py_SIZE(co); i++) {
        int deop = _Py_GetBaseOpcode(co, i);
        SCRAMBLE_IN(deop);
        SCRAMBLE_IN(_PyCode_CODE(co)[i].op.arg);
        i += _PyOpcode_Caches[deop];
    }
/* Get the underlying opcode, stripping instrumentation */
int _Py_GetBaseOpcode(PyCodeObject *code, int i)
{
    int opcode = _PyCode_CODE(code)[i].op.code;
    if (opcode == INSTRUMENTED_LINE) {
        opcode = code->_co_monitoring->lines[i].original_opcode;
    }
    if (opcode == INSTRUMENTED_INSTRUCTION) {
        opcode = code->_co_monitoring->per_instruction_opcodes[i];
    }
    CHECK(opcode != INSTRUMENTED_INSTRUCTION);
    CHECK(opcode != INSTRUMENTED_LINE);
    int deinstrumented = DE_INSTRUMENT[opcode];
    if (deinstrumented) {
        return deinstrumented;
    }
    return _PyOpcode_Deopt[opcode];
}
#define _PyCode_CODE(CO) _Py_RVALUE((_Py_CODEUNIT *)(CO)->co_code_adaptive)

Equality

When checking equality, opcodes are not checked if they are instrumented:

static PyObject *
code_richcompare(PyObject *self, PyObject *other, int op)
{
    // ...
    for (int i = 0; i < Py_SIZE(co); i++) {
        _Py_CODEUNIT co_instr = _PyCode_CODE(co)[i];
        _Py_CODEUNIT cp_instr = _PyCode_CODE(cp)[i];
        co_instr.op.code = _PyOpcode_Deopt[co_instr.op.code];
        cp_instr.op.code = _PyOpcode_Deopt[cp_instr.op.code];
        eq = co_instr.cache == cp_instr.cache;
        if (!eq) {
            goto unequal;
        }
        i += _PyOpcode_Caches[co_instr.op.code];
    }

It seems like the hash and the equality will be well matched if the code hasn't been instrumented. But if it has been, then they don't match.

CPython versions tested on:

3.12

Operating systems tested on:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.12only security fixestype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions