Skip to content

segmentation fault in PyContextVar_set under py3.9 ARM64 #122529

Closed as not planned
@emmettbutler

Description

@emmettbutler

Bug report

Bug description:

A segmentation fault sometimes occurs in an application using Datadog's ddtrace-run 2.9.0, CPython 3.9, and sqlalchemy on ARM64. Our (@sanchda and my) analysis suggests that this is the result of a bug in CPython 3.9, which may have been fixed in CPython 3.12. We are unable to replicate this issue directly because it happens sporadically on infrastructure operated by a Datadog customer.

One instance of the segfault for which we managed to capture a core dump looks like this:

Current thread 0x0000ffff9a4915a0 (most recent call first):,1719935856047
"File ""/usr/local/lib/python3.9/site-packages/ddtrace/_trace/provider.py"", line 126 in activate",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/ddtrace/_trace/tracer.py"", line 807 in _start_span",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/ddtrace/_trace/tracer.py"", line 898 in trace",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/ddtrace/contrib/dbapi/__init__.py"", line 311 in _trace_method",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/ddtrace/contrib/dbapi/__init__.py"", line 335 in rollback",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py"", line 683 in do_rollback",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py"", line 1038 in _reset",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py"", line 763 in _finalize_fairy",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py"", line 1008 in _checkin",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py"", line 1166 in close",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py"", line 1251 in close",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py"", line 928 in close",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py"", line 846 in commit",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py"", line 1454 in commit",1719935856047
# application stack frames....

The leaf node of this Python stacktrace is this call to contextvars.ContextVar().set(), which operates on a contextvar instance internal to Datadog code that's initialized here.

The core dump also includes this native backtrace:

*** Segmentation fault,1719935856047
Backtrace:,1719935856047
/lib/aarch64-linux-gnu/libc.so.6(+0x80a10)[0xffff99fb0a10],1719935856048
/lib/aarch64-linux-gnu/libc.so.6(gsignal+0x1c)[0xffff99f6a76c],1719935856048
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffff9a49d850],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0xd2538)[0xffff9a1b2538],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0xd2ce0)[0xffff9a1b2ce0],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0xd3414)[0xffff9a1b3414],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0xcd82c)[0xffff9a1ad82c],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(PyContextVar_Set+0x10c)[0xffff9a1add0c],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x13a4c8)[0xffff9a21a4c8],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x137df0)[0xffff9a217df0],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x1540)[0xffff9a24ae20],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x137df0)[0xffff9a217df0],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x1540)[0xffff9a24ae20],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x137eec)[0xffff9a217eec],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x1be8)[0xffff9a24b4c8],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x137df0)[0xffff9a217df0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x3134)[0xffff9a24ca14],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x137df0)[0xffff9a217df0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x3134)[0xffff9a24ca14],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x1540)[0xffff9a24ae20],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x137df0)[0xffff9a217df0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x3134)[0xffff9a24ca14],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x137df0)[0xffff9a217df0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x3134)[0xffff9a24ca14],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x137df0)[0xffff9a217df0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x1540)[0xffff9a24ae20],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x6d4)[0xffff9a249fb4],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x3134)[0xffff9a24ca14],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalCodeWithName+0x64)[0xffff9a2aeb74],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(PyEval_EvalCodeEx+0x40)[0xffff9a2aeb00],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(PyEval_EvalCode+0x2c)[0xffff9a2aeaac],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x1e08fc)[0xffff9a2c08fc],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x1e0868)[0xffff9a2c0868],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x1e0798)[0xffff9a2c0798],1719935856050
/usr/local/bin/../lib/libpython3.9.so.1.0(PyRun_SimpleFileExFlags+0x190)[0xffff9a2c03e4],1719935856050
/usr/local/bin/../lib/libpython3.9.so.1.0(Py_RunMain+0x3c4)[0xffff9a2c9624],1719935856050
/usr/local/bin/../lib/libpython3.9.so.1.0(Py_BytesMain+0x38)[0xffff9a2c9078],1719935856050
/lib/aarch64-linux-gnu/libc.so.6(+0x27780)[0xffff99f57780],1719935856050
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98)[0xffff99f57858],1719935856050
/usr/local/bin/python3(_start+0x30)[0xaaaaad2308b0],1719935856050

These lines in particular warrant scrutiny:

/usr/local/bin/../lib/libpython3.9.so.1.0(+0xd2538)[0xffff9a1b2538],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0xd2ce0)[0xffff9a1b2ce0],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0xd3414)[0xffff9a1b3414],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0xcd82c)[0xffff9a1ad82c],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(PyContextVar_Set+0x10c)[0xffff9a1add0c],1719935856048

We grabbed the libpython3.9.so.1.0 from Dockerhub's python:3.9.19-bookworm, (arm64) image, version sha256:37823bd8e7266bac0399e3b55e67cfe00297686f09fe5cf73411ffe1d75fd93e. This .so file is bitwise-identical to the artifact used in the production environment that the core dump came from.

We used objdump to symbolize the containing functions within libpython3.9.so.1.0 and found the following:

0xd2538 -> hamt_node_bitmap_clone
0xd2ce0 -> hamt_node_bitmap_assoc
0xd3414 -> _PyHamt_Assoc
0xcd82c -> contextvar_set
0x10c   -> PyContextVar_set

The leaf of this call stack is defined in Py3.9 here.

ghidra gives us the following decompilation of the hamt_node_bitmap_clone function:

void hamt_node_bitmap_clone(long param_1)

{
  long lVar1;
  long lVar2;
  long *plVar3;
  long lVar4;

  lVar1 = hamt_node_bitmap_new(*(undefined8 *)(param_1 + 0x10));
  if (lVar1 != 0) {
    lVar4 = *(long *)(param_1 + 0x10);
    for (lVar2 = 0; lVar2 < lVar4; lVar2 = lVar2 + 1) {
      plVar3 = *(long **)(param_1 + 0x20 + lVar2 * 8);                     // 0xd2530
      if (plVar3 != (long *)0x0) {
        *plVar3 = *plVar3 + 1;
      }
      *(long **)(lVar1 + 0x20 + lVar2 * 8) = plVar3;
    }
    *(undefined4 *)(lVar1 + 0x18) = *(undefined4 *)(param_1 + 0x18);
  }
  return;
}

objdump tells us that the specific fault site is 0xd2538, which ghidra tells us is on the line

plVar3 = *(long **)(param_1 + 0x20 + lVar2 * 8);

The source code counterpart of this decompiled line is here, a call to Py_XINCREF.

ghidra gives the failing instructions from this line as

001d2530 a2 78 61 f8     ldr        x2,[x5, x1, LSL #0x3]
001d2534 82 00 00 b4     cbz        x2,LAB_001d2544
001d2538 43 00 40 f9     ldr        x3,[x2]               # this is the site of the fault
001d253c 63 04 00 91     add        x3,x3,#0x1
  • ldr: load the given value into the register x2.
  • cbz: conditional jump if zero. This is looking at x2 as a value, not as a pointer.
    It’s probably part of Py_XINCREF where it checks non-null. Since the next instruction is executed, x2 is NOT null.
  • ldr: de-references x2 and puts it in x3. x2 is not NULL (we know that because the computation on the ldr line is an iteration)
    So, x2 is an invalid address
  • add: this is just the increment operation from the Py_XINCREF

In terms of the source code, this suggests that node->b_array[i] is an invalid nonzero pointer here.

Some speculative reasons why this could happen:

  1. The given item was cleaned up and is no longer valid, although sometimes it is valid because it is freed but not unmapped
    This might explain why the problem occurs on arm but not x86, since the mapping behavior at the level of the allocator may just so happen to be different for system reasons
  2. The size of the array is incorrect
  3. The item is uninitialized

In this case, the "item" is a HAMT node underlying a ContextVar instance being set() during a sqlalchemy rollback operation.

Aside from this analysis, another reason we're opening this issue on CPython is that the Py_XINCREF line in question was changed since Python 3.9. #99317 replaced it with a call to Py_XNewRef here. While we haven't tested this against Python 3.12 (the first to include the change), the presence of this change suggests that a bug may have been fixed in newer versions that could be worth backporting. Even without a backport, confirmation that this is or was a real issue in CPython itself would be helpful.

CPython versions tested on:

3.9

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.9only security fixestype-crashA hard crash of the interpreter, possibly with a core dump

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions