Skip to content

Python references that go out of scope might not release the memory allocated by _upb_Arena_SlowMalloc  #14571

@tvalentyn

Description

@tvalentyn

Repro:

  1. In a clean environment, run pip install apache-beam==2.51.0.
  2. Download + unpack pipeline.pb (see attached https://github.com/protocolbuffers/upb/files/12898151/pipeline.pb.zip).
  3. Create a memleak_repro.py
from apache_beam.portability.api import beam_runner_api_pb2
from apache_beam.io.gcp import gcsio
from apache_beam.io.gcp import gcsfilesystem

if len(sys.argv) <= 1:
  print("Specify proto path!")
  sys.exit(1)

path = sys.argv[1]
if path.startswith('gs'):
  open = gcsio.GcsIO().open

with open(path, 'rb') as f:
  pipeline = beam_runner_api_pb2.Pipeline()
  pipeline.ParseFromString(f.read())

for transform in pipeline.components.transforms:
  for _ in range(100000):
    reference = pipeline.components.transforms[transform].outputs
  1. Run python memleak_repro.py pipeline.pb, observe RAM usage in top/htop, etc, increasing the number of iterations as necessary to keep the process running longer.

  2. For profiling, run:

pip install memray

memray run -o output.bin --force memleak_repro.py pipeline.pb ; memray table --leak output.bin -o table.html --force ; memray flamegraph --leak output.bin -o flamegraph.html -f

  1. Open table.html, double click on the Size column, note ~36MB reported leaked memory in memleak_repro.py:20

Leak doesn't happen with PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python set.

Leak can also be observed with C-stack tools, such as tcmalloc+pprof:

apt install google-pprof
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4:$LD_PRELOAD  HEAPPROFILE=`pwd`/  python memleak_repro.py pipeline.pb
google-pprof /home/valentyn/.pyenv/versions/py310/bin/python --inuse_space  _219720.0005.heap  --base=_219720.0004.heap 
Using local file /home/valentyn/.pyenv/versions/py310/bin/python.
Using local file _219720.0005.heap.
Welcome to pprof!  For help, type 'help'.
(pprof) top
Total: 1152.0 MB
  1152.0 100.0% 100.0%   1152.0 100.0% _upb_Arena_SlowMalloc
(pprof) 

valgrind+massif:

apt install valgrind
PYTHONMALLOC=malloc  valgrind  --tool=massif   --verbose  /home/valentyn/.pyenv/versions/py310/bin/python memleak_repro.py pipeline.pb

ms_print massif.out.341396 

...
->18.09% (817,040B) 0x59A17F1: _upb_Arena_SlowMalloc (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| ->07.00% (316,176B) 0x59A0051: upb_strtable_insert (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | ->06.05% (273,072B) 0x599FDDF: upb_strtable_resize (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | | ->06.05% (273,072B) 0x599FED6: upb_strtable_insert (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | |   ->04.17% (188,096B) 0x598F0D7: _upb_DefPool_InsertSym (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | |   | ->03.89% (175,568B) 0x5990B19: _upb_EnumValueDefs_New (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | |   | | ->03.89% (175,568B) 0x5990332: _upb_EnumDefs_New (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | |   | |   ->03.89% (175,568B) 0x5995356: _upb_MessageDefs_New (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | |   | |     ->03.89% (175,568B) 0x5992E88: _upb_FileDef_Create (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | |   | |       ->03.89% (175,568B) 0x598FC4B: upb_DefBuilder_AddFileToPool (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | |   | |         ->03.89% (175,568B) 0x598F699: _upb_DefPool_AddFile (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | |   | |           ->03.89% (175,568B) 0x598165C: PyUpb_DescriptorPool_DoAddSerializedFile (in /home/valentyn/.pyenv/versions/3.10.4/envs/py310/lib/python3.10/site-packages/google/_upb/_message.abi3.so)
| | |   | |             ->03.89% (175,568B) 0x31A7B1: method_vectorcall_O (descrobject.c:460)
| | |   | |               ->03.89% (175,568B) 0x16687A: _PyObject_VectorcallTstate (abstract.h:114)
| | |   | |                 ->03.89% (175,568B) 0x16687A: PyObject_Vectorcall (abstract.h:123)
| | |   | |                   ->03.89% (175,568B) 0x16687A: call_function (ceval.c:5867)
| | |   | |                     ->03.89% (175,568B) 0x16687A: _PyEval_EvalFrameDefault (ceval.c:4198)
| | |   | |                       ->03.89% (175,568B) 0x22F7B4: _PyEval_EvalFrame (pycore_ceval.h:46)
| | |   | |                         ->03.89% (175,568B) 0x22F7B4: _PyEval_Vector (ceval.c:5065)
| | |   | |                           ->03.89% (175,568B) 0x22F7B4: PyEval_EvalCode (ceval.c:1134)
| | |   | |                             ->03.89% (175,568B) 0x3593F8: builtin_exec_impl (bltinmodule.c:1056)
| | |   | |                               ->03.89% (175,568B) 0x3593F8: builtin_exec (bltinmodule.c.h:371)

...

Using memray in --native mode also points to _upb_Arena_SlowMalloc .

Command line: memray run --native -o output.bin --force memleak_repro.py pipeline.pb ; memray table --leak output.bin -o table.html --force ; memray flamegraph --leak output.bin -o flamegraph.html -f

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions