-
Notifications
You must be signed in to change notification settings - Fork 16k
Description
Repro:
- In a clean environment, run
pip install apache-beam==2.51.0. - Download + unpack pipeline.pb (see attached https://github.com/protocolbuffers/upb/files/12898151/pipeline.pb.zip).
- Create a
memleak_repro.py
from apache_beam.portability.api import beam_runner_api_pb2
from apache_beam.io.gcp import gcsio
from apache_beam.io.gcp import gcsfilesystem
if len(sys.argv) <= 1:
print("Specify proto path!")
sys.exit(1)
path = sys.argv[1]
if path.startswith('gs'):
open = gcsio.GcsIO().open
with open(path, 'rb') as f:
pipeline = beam_runner_api_pb2.Pipeline()
pipeline.ParseFromString(f.read())
for transform in pipeline.components.transforms:
for _ in range(100000):
reference = pipeline.components.transforms[transform].outputs
-
Run
python memleak_repro.py pipeline.pb, observe RAM usage in top/htop, etc, increasing the number of iterations as necessary to keep the process running longer. -
For profiling, run:
pip install memray
memray run -o output.bin --force memleak_repro.py pipeline.pb ; memray table --leak output.bin -o table.html --force ; memray flamegraph --leak output.bin -o flamegraph.html -f
- Open table.html, double click on the Size column, note ~36MB reported leaked memory in
memleak_repro.py:20
Leak doesn't happen with PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python set.
Leak can also be observed with C-stack tools, such as tcmalloc+pprof:
apt install google-pprof
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4:$LD_PRELOAD HEAPPROFILE=`pwd`/ python memleak_repro.py pipeline.pb
google-pprof /home/valentyn/.pyenv/versions/py310/bin/python --inuse_space _219720.0005.heap --base=_219720.0004.heap
Using local file /home/valentyn/.pyenv/versions/py310/bin/python.
Using local file _219720.0005.heap.
Welcome to pprof! For help, type 'help'.
(pprof) top
Total: 1152.0 MB
1152.0 100.0% 100.0% 1152.0 100.0% _upb_Arena_SlowMalloc
(pprof)
valgrind+massif:
apt install valgrind
PYTHONMALLOC=malloc valgrind --tool=massif --verbose /home/valentyn/.pyenv/versions/py310/bin/python memleak_repro.py pipeline.pb
ms_print massif.out.341396
...
->18.09% (817,040B) 0x59A17F1: _upb_Arena_SlowMalloc (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| ->07.00% (316,176B) 0x59A0051: upb_strtable_insert (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | ->06.05% (273,072B) 0x599FDDF: upb_strtable_resize (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | | ->06.05% (273,072B) 0x599FED6: upb_strtable_insert (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | | ->04.17% (188,096B) 0x598F0D7: _upb_DefPool_InsertSym (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | | | ->03.89% (175,568B) 0x5990B19: _upb_EnumValueDefs_New (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | | | | ->03.89% (175,568B) 0x5990332: _upb_EnumDefs_New (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | | | | ->03.89% (175,568B) 0x5995356: _upb_MessageDefs_New (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | | | | ->03.89% (175,568B) 0x5992E88: _upb_FileDef_Create (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | | | | ->03.89% (175,568B) 0x598FC4B: upb_DefBuilder_AddFileToPool (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | | | | ->03.89% (175,568B) 0x598F699: _upb_DefPool_AddFile (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | | | | ->03.89% (175,568B) 0x598165C: PyUpb_DescriptorPool_DoAddSerializedFile (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | | | | ->03.89% (175,568B) 0x31A7B1: method_vectorcall_O (descrobject.c:460)
| | | | | ->03.89% (175,568B) 0x16687A: _PyObject_VectorcallTstate (abstract.h:114)
| | | | | ->03.89% (175,568B) 0x16687A: PyObject_Vectorcall (abstract.h:123)
| | | | | ->03.89% (175,568B) 0x16687A: call_function (ceval.c:5867)
| | | | | ->03.89% (175,568B) 0x16687A: _PyEval_EvalFrameDefault (ceval.c:4198)
| | | | | ->03.89% (175,568B) 0x22F7B4: _PyEval_EvalFrame (pycore_ceval.h:46)
| | | | | ->03.89% (175,568B) 0x22F7B4: _PyEval_Vector (ceval.c:5065)
| | | | | ->03.89% (175,568B) 0x22F7B4: PyEval_EvalCode (ceval.c:1134)
| | | | | ->03.89% (175,568B) 0x3593F8: builtin_exec_impl (bltinmodule.c:1056)
| | | | | ->03.89% (175,568B) 0x3593F8: builtin_exec (bltinmodule.c.h:371)
...
Using memray in --native mode also points to _upb_Arena_SlowMalloc .
Command line: memray run --native -o output.bin --force memleak_repro.py pipeline.pb ; memray table --leak output.bin -o table.html --force ; memray flamegraph --leak output.bin -o flamegraph.html -f