Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
We have no way of automatically detecting memory leaks in our library
Feature Description
ASAN can be used to detect memory leaks with a relatively low cost. The following code can show how to build pandas with ASAN and find leaks. Note that fast_unwind_on_malloc=0
severely slows down execution at the cost of a more informative traceback. This also reports leaks that likely cannot be controlled by pandas.
CC=clang CXX=clang++ CFLAGS="-fsanitize=address -fno-omit-frame-pointer -shared-libasan" LDSHARED="clang -shared" python setup.py build_ext --inplace -j8 --with-debugging-symbols
ASAN_OPTIONS="fast_unwind_on_malloc=0" LD_PRELOAD=$(clang -print-file-name=libclang_rt.asan-x86_64.so) python -m pytest pandas/tests/io/json/ 2> leaks.txt
This produces the following output, which contains items like:
...
Indirect leak of 24 byte(s) in 6 object(s) allocated from:
#0 0x7fa3374d2c7e in __interceptor_malloc build-llvm/tools/clang/stage2-bins/runtimes/runtimes-bins/compiler-rt/lib/asan/asan_malloc_linux.cpp:69:3
#1 0x7fa2c96590e4 in traced_malloc /home/willayd/clones/pandas/pandas/_libs/src/klib/khash_python.h:27:18
#2 0x7fa2c96c793f in kh_resize_uint64 /home/willayd/clones/pandas/pandas/_libs/src/klib/khash.h:712:1
#3 0x7fa2c96e80d3 in __pyx_pf_6pandas_5_libs_9hashtable_15UInt64HashTable___cinit__ /home/willayd/clones/pandas/pandas/_libs/hashtable.c:33343:3
#4 0x7fa2c96e7861 in __pyx_pw_6pandas_5_libs_9hashtable_15UInt64HashTable_1__cinit__ /home/willayd/clones/pandas/pandas/_libs/hashtable.c:33275:13
#5 0x7fa2c96c92f0 in __pyx_tp_new_6pandas_5_libs_9hashtable_UInt64HashTable /home/willayd/clones/pandas/pandas/_libs/hashtable.c:158300:7
#6 0x55f9503681f5 in type_call /usr/local/src/conda/python-3.11.3/Objects/typeobject.c:1100:11
#7 0x7fa2c5703998 in __Pyx_PyObject_Call /home/willayd/clones/pandas/pandas/_libs/index.c:82754:14
#8 0x7fa2c5739d3b in __Pyx__PyObject_CallOneArg /home/willayd/clones/pandas/pandas/_libs/index.c:83070:14
#9 0x7fa2c57370e9 in __Pyx_PyObject_CallOneArg /home/willayd/clones/pandas/pandas/_libs/index.c:83089:12
#10 0x7fa2c571f6ab in __pyx_f_6pandas_5_libs_5index_12UInt64Engine__make_hash_table /home/willayd/clones/pandas/pandas/_libs/index.c:26319:89
#11 0x7fa2c570fa8f in __pyx_f_6pandas_5_libs_5index_11IndexEngine__ensure_mapping_populated /home/willayd/clones/pandas/pandas/_libs/index.c:9532:17
#12 0x7fa2c570cd7d in __pyx_f_6pandas_5_libs_5index_11IndexEngine__do_unique_check /home/willayd/clones/pandas/pandas/_libs/index.c:8653:15
#13 0x7fa2c575e77a in __pyx_pf_6pandas_5_libs_5index_11IndexEngine_9is_unique___get__ /home/willayd/clones/pandas/pandas/_libs/index.c:8583:17
#14 0x7fa2c575e67c in __pyx_pw_6pandas_5_libs_5index_11IndexEngine_9is_unique_1__get__ /home/willayd/clones/pandas/pandas/_libs/index.c:8549:13
#15 0x7fa2c575e578 in __pyx_getprop_6pandas_5_libs_5index_11IndexEngine_is_unique /home/willayd/clones/pandas/pandas/_libs/index.c:75082:10
#16 0x55f95038854d in _PyObject_GenericGetAttrWithDict /usr/local/src/conda/python-3.11.3/Objects/object.c:1278:19
#17 0x55f95036adcf in PyObject_GenericGetAttr /usr/local/src/conda/python-3.11.3/Objects/object.c:1368:12
#18 0x55f95036adcf in PyObject_GetAttr /usr/local/src/conda/python-3.11.3/Objects/object.c:916:19
#19 0x55f950374b61 in _PyEval_EvalFrameDefault /usr/local/src/conda/python-3.11.3/Python/ceval.c:3465:29
#20 0x55f95039acd2 in _PyEval_EvalFrame /usr/local/src/conda/python-3.11.3/Include/internal/pycore_ceval.h:73:16
#21 0x55f95039acd2 in _PyEval_Vector /usr/local/src/conda/python-3.11.3/Python/ceval.c:6438:24
#22 0x55f95039acd2 in _PyFunction_Vectorcall /usr/local/src/conda/python-3.11.3/Objects/call.c:393:16
#23 0x7fa2c8db64d8 in __Pyx_PyObject_Call /home/willayd/clones/pandas/pandas/_libs/properties.c:5508:14
#24 0x7fa2c8db684b in __Pyx__PyObject_CallOneArg /home/willayd/clones/pandas/pandas/_libs/properties.c:5576:14
#25 0x7fa2c8db6309 in __Pyx_PyObject_CallOneArg /home/willayd/clones/pandas/pandas/_libs/properties.c:5595:12
#26 0x7fa2c8db7f67 in __pyx_pf_6pandas_5_libs_10properties_14CachedProperty_2__get__ /home/willayd/clones/pandas/pandas/_libs/properties.c:1972:93
#27 0x7fa2c8db6d2c in __pyx_pw_6pandas_5_libs_10properties_14CachedProperty_3__get__ /home/willayd/clones/pandas/pandas/_libs/properties.c:1723:13
#28 0x7fa2c8db3b28 in __pyx_tp_descr_get_6pandas_5_libs_10properties_CachedProperty /home/willayd/clones/pandas/pandas/_libs/properties.c:4149:7
#29 0x55f95038854d in _PyObject_GenericGetAttrWithDict /usr/local/src/conda/python-3.11.3/Objects/object.c:1278:19
#30 0x55f95036adcf in PyObject_GenericGetAttr /usr/local/src/conda/python-3.11.3/Objects/object.c:1368:12
#31 0x55f95036adcf in PyObject_GetAttr /usr/local/src/conda/python-3.11.3/Objects/object.c:916:19
SUMMARY: AddressSanitizer: 9878368 byte(s) leaked in 8924 allocation(s).
So from running the JSON tests ASAN thinks we leak 9878368 bytes with 24 being leaked by an interaction between properties.pyx -> index.pyx -> hashtable.pyx -> khash
Many of the arguments pieced together to make the above work are taken from:
google/sanitizers/issues/918
https://stackoverflow.com/questions/48833176/get-location-of-libasan-from-gcc-clang
https://clang.llvm.org/docs/AddressSanitizer.html
Alternative Solutions
Valgrind is another great tool for detecting memory leaks (amongst other things), but has larger overhead compared to the sanitizers, which may make it unsuitable for CI
https://github.com/google/sanitizers/wiki/AddressSanitizerComparisonOfMemoryTools
Additional Context
cc @lithomas1