Description
Bug report
Bug description:
A segmentation fault sometimes occurs in an application using Datadog's ddtrace-run
2.9.0, CPython 3.9, and sqlalchemy on ARM64. Our (@sanchda and my) analysis suggests that this is the result of a bug in CPython 3.9, which may have been fixed in CPython 3.12. We are unable to replicate this issue directly because it happens sporadically on infrastructure operated by a Datadog customer.
One instance of the segfault for which we managed to capture a core dump looks like this:
Current thread 0x0000ffff9a4915a0 (most recent call first):,1719935856047
"File ""/usr/local/lib/python3.9/site-packages/ddtrace/_trace/provider.py"", line 126 in activate",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/ddtrace/_trace/tracer.py"", line 807 in _start_span",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/ddtrace/_trace/tracer.py"", line 898 in trace",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/ddtrace/contrib/dbapi/__init__.py"", line 311 in _trace_method",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/ddtrace/contrib/dbapi/__init__.py"", line 335 in rollback",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py"", line 683 in do_rollback",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py"", line 1038 in _reset",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py"", line 763 in _finalize_fairy",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py"", line 1008 in _checkin",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py"", line 1166 in close",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py"", line 1251 in close",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py"", line 928 in close",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py"", line 846 in commit",1719935856047
"File ""/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py"", line 1454 in commit",1719935856047
# application stack frames....
The leaf node of this Python stacktrace is this call to contextvars.ContextVar().set()
, which operates on a contextvar instance internal to Datadog code that's initialized here.
The core dump also includes this native backtrace:
*** Segmentation fault,1719935856047
Backtrace:,1719935856047
/lib/aarch64-linux-gnu/libc.so.6(+0x80a10)[0xffff99fb0a10],1719935856048
/lib/aarch64-linux-gnu/libc.so.6(gsignal+0x1c)[0xffff99f6a76c],1719935856048
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffff9a49d850],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0xd2538)[0xffff9a1b2538],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0xd2ce0)[0xffff9a1b2ce0],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0xd3414)[0xffff9a1b3414],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0xcd82c)[0xffff9a1ad82c],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(PyContextVar_Set+0x10c)[0xffff9a1add0c],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x13a4c8)[0xffff9a21a4c8],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x137df0)[0xffff9a217df0],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x1540)[0xffff9a24ae20],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x137df0)[0xffff9a217df0],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x1540)[0xffff9a24ae20],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x137eec)[0xffff9a217eec],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x1be8)[0xffff9a24b4c8],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x137df0)[0xffff9a217df0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x3134)[0xffff9a24ca14],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x137df0)[0xffff9a217df0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x3134)[0xffff9a24ca14],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x1540)[0xffff9a24ae20],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x137df0)[0xffff9a217df0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x3134)[0xffff9a24ca14],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x137df0)[0xffff9a217df0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x3134)[0xffff9a24ca14],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x137df0)[0xffff9a217df0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x1540)[0xffff9a24ae20],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x6d4)[0xffff9a249fb4],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x3134)[0xffff9a24ca14],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb0)[0xffff9a215bd0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x135da0)[0xffff9a215da0],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x548)[0xffff9a249e28],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x168c24)[0xffff9a248c24],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(_PyEval_EvalCodeWithName+0x64)[0xffff9a2aeb74],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(PyEval_EvalCodeEx+0x40)[0xffff9a2aeb00],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(PyEval_EvalCode+0x2c)[0xffff9a2aeaac],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x1e08fc)[0xffff9a2c08fc],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x1e0868)[0xffff9a2c0868],1719935856049
/usr/local/bin/../lib/libpython3.9.so.1.0(+0x1e0798)[0xffff9a2c0798],1719935856050
/usr/local/bin/../lib/libpython3.9.so.1.0(PyRun_SimpleFileExFlags+0x190)[0xffff9a2c03e4],1719935856050
/usr/local/bin/../lib/libpython3.9.so.1.0(Py_RunMain+0x3c4)[0xffff9a2c9624],1719935856050
/usr/local/bin/../lib/libpython3.9.so.1.0(Py_BytesMain+0x38)[0xffff9a2c9078],1719935856050
/lib/aarch64-linux-gnu/libc.so.6(+0x27780)[0xffff99f57780],1719935856050
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98)[0xffff99f57858],1719935856050
/usr/local/bin/python3(_start+0x30)[0xaaaaad2308b0],1719935856050
These lines in particular warrant scrutiny:
/usr/local/bin/../lib/libpython3.9.so.1.0(+0xd2538)[0xffff9a1b2538],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0xd2ce0)[0xffff9a1b2ce0],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0xd3414)[0xffff9a1b3414],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(+0xcd82c)[0xffff9a1ad82c],1719935856048
/usr/local/bin/../lib/libpython3.9.so.1.0(PyContextVar_Set+0x10c)[0xffff9a1add0c],1719935856048
We grabbed the libpython3.9.so.1.0
from Dockerhub's python:3.9.19-bookworm, (arm64)
image, version sha256:37823bd8e7266bac0399e3b55e67cfe00297686f09fe5cf73411ffe1d75fd93e
. This .so file is bitwise-identical to the artifact used in the production environment that the core dump came from.
We used objdump
to symbolize the containing functions within libpython3.9.so.1.0
and found the following:
0xd2538 -> hamt_node_bitmap_clone
0xd2ce0 -> hamt_node_bitmap_assoc
0xd3414 -> _PyHamt_Assoc
0xcd82c -> contextvar_set
0x10c -> PyContextVar_set
The leaf of this call stack is defined in Py3.9 here.
ghidra
gives us the following decompilation of the hamt_node_bitmap_clone
function:
void hamt_node_bitmap_clone(long param_1)
{
long lVar1;
long lVar2;
long *plVar3;
long lVar4;
lVar1 = hamt_node_bitmap_new(*(undefined8 *)(param_1 + 0x10));
if (lVar1 != 0) {
lVar4 = *(long *)(param_1 + 0x10);
for (lVar2 = 0; lVar2 < lVar4; lVar2 = lVar2 + 1) {
plVar3 = *(long **)(param_1 + 0x20 + lVar2 * 8); // 0xd2530
if (plVar3 != (long *)0x0) {
*plVar3 = *plVar3 + 1;
}
*(long **)(lVar1 + 0x20 + lVar2 * 8) = plVar3;
}
*(undefined4 *)(lVar1 + 0x18) = *(undefined4 *)(param_1 + 0x18);
}
return;
}
objdump
tells us that the specific fault site is 0xd2538
, which ghidra tells us is on the line
plVar3 = *(long **)(param_1 + 0x20 + lVar2 * 8);
The source code counterpart of this decompiled line is here, a call to Py_XINCREF
.
ghidra
gives the failing instructions from this line as
001d2530 a2 78 61 f8 ldr x2,[x5, x1, LSL #0x3]
001d2534 82 00 00 b4 cbz x2,LAB_001d2544
001d2538 43 00 40 f9 ldr x3,[x2] # this is the site of the fault
001d253c 63 04 00 91 add x3,x3,#0x1
ldr
: load the given value into the register x2.cbz
: conditional jump if zero. This is looking at x2 as a value, not as a pointer.
It’s probably part ofPy_XINCREF
where it checks non-null. Since the next instruction is executed, x2 is NOT null.ldr
: de-references x2 and puts it in x3. x2 is not NULL (we know that because the computation on the ldr line is an iteration)
So, x2 is an invalid addressadd
: this is just the increment operation from thePy_XINCREF
In terms of the source code, this suggests that node->b_array[i]
is an invalid nonzero pointer here.
Some speculative reasons why this could happen:
- The given item was cleaned up and is no longer valid, although sometimes it is valid because it is freed but not unmapped
This might explain why the problem occurs on arm but not x86, since the mapping behavior at the level of the allocator may just so happen to be different for system reasons - The size of the array is incorrect
- The item is uninitialized
In this case, the "item" is a HAMT node underlying a ContextVar
instance being set()
during a sqlalchemy rollback operation.
Aside from this analysis, another reason we're opening this issue on CPython is that the Py_XINCREF
line in question was changed since Python 3.9. #99317 replaced it with a call to Py_XNewRef
here. While we haven't tested this against Python 3.12 (the first to include the change), the presence of this change suggests that a bug may have been fixed in newer versions that could be worth backporting. Even without a backport, confirmation that this is or was a real issue in CPython itself would be helpful.
CPython versions tested on:
3.9
Operating systems tested on:
Linux