Closed
Description
Describe the bug (mandatory)
Memory leak when calling Document.insert_pdf()
To Reproduce (mandatory)
import os
import sys
import fitz
import psutil
with open(sys.argv[1], "rb") as f:
doc = fitz.open(stream=f.read())
proc = psutil.Process(os.getpid())
first_ram = proc.memory_info().rss
calls = 0
while True:
for page_idx in range(doc.page_count):
copied_page = doc[page_idx]
page_pdf = fitz.Document()
page_pdf.insert_pdf(
doc,
from_page=page_idx,
to_page=page_idx,
start_at=0,
rotate=copied_page.rotation,
)
calls += 1
# I tried these after seeing suggestions in other issues but no difference
copied_page = None
page_pdf.close()
page_pdf = None
fitz.TOOLS.store_shrink(100)
ram = proc.memory_info().rss
ram_diff = (ram - first_ram) // 1024
print(f"{calls} calls\t{ram // 1024}KB total\t+{ram_diff}KB since start ({ram_diff / calls:0.02f}/call)")
Expected behavior (optional)
Not leaking memory :)
Screenshots (optional)
Reproduction script output:
$ python leak.py 500pages-A4.pdf
500 calls 26252KB total +3364KB since start (6.73/call)
1000 calls 28364KB total +5476KB since start (5.48/call)
1500 calls 30476KB total +7588KB since start (5.06/call)
2000 calls 32324KB total +9436KB since start (4.72/call)
2500 calls 34436KB total +11548KB since start (4.62/call)
3000 calls 36548KB total +13660KB since start (4.55/call)
3500 calls 38396KB total +15508KB since start (4.43/call)
4000 calls 40508KB total +17620KB since start (4.41/call)
4500 calls 42356KB total +19468KB since start (4.33/call)
5000 calls 44468KB total +21580KB since start (4.32/call)
5500 calls 46580KB total +23692KB since start (4.31/call)
6000 calls 48428KB total +25540KB since start (4.26/call)
6500 calls 50540KB total +27652KB since start (4.25/call)
7000 calls 52388KB total +29500KB since start (4.21/call)
7500 calls 54500KB total +31612KB since start (4.21/call)
8000 calls 56612KB total +33724KB since start (4.22/call)
8500 calls 58460KB total +35572KB since start (4.18/call)
9000 calls 60572KB total +37684KB since start (4.19/call)
9500 calls 62420KB total +39532KB since start (4.16/call)
10000 calls 64532KB total +41644KB since start (4.16/call)
10500 calls 66644KB total +43756KB since start (4.17/call)
11000 calls 68492KB total +45604KB since start (4.15/call)
11500 calls 70604KB total +47716KB since start (4.15/call)
12000 calls 72452KB total +49564KB since start (4.13/call)
12500 calls 74564KB total +51676KB since start (4.13/call)
13000 calls 76676KB total +53788KB since start (4.14/call)
13500 calls 78524KB total +55636KB since start (4.12/call)
14000 calls 80636KB total +57748KB since start (4.12/call)
14500 calls 82484KB total +59596KB since start (4.11/call)
15000 calls 84596KB total +61708KB since start (4.11/call)
15500 calls 86708KB total +63820KB since start (4.12/call)
16000 calls 88556KB total +65668KB since start (4.10/call)
16500 calls 90668KB total +67780KB since start (4.11/call)
17000 calls 92516KB total +69628KB since start (4.10/call)
It stabilizes around 4KB/call on that PDF, seems to leak more with more pages.
Valgrind summary on a debug build (tag 1.19.6
, MuPDF 1.19.0
, suppressing PyMem_RawMalloc
calls):
==5508== HEAP SUMMARY:
==5508== in use at exit: 1,507,032 bytes in 615 blocks
==5508== total heap usage: 4,721 allocs, 4,106 frees, 8,493,027 bytes allocated
==5508==
==5508== 4,072 (32 direct, 4,040 indirect) bytes in 1 blocks are definitely lost in loss record 326 of 357
==5508== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5508== by 0x692F78B: do_scavenging_malloc (memory.c:51)
==5508== by 0x692F85F: fz_calloc (memory.c:111)
==5508== by 0x6981238: pdf_new_graft_map (pdf-graft.c:42)
==5508== by 0x689DFF5: new_Graftmap (fitz_wrap.c:17104)
==5508== by 0x68B37A3: _wrap_new_Graftmap (fitz_wrap.c:29204)
==5508== by 0x5ECFF7: cfunction_call (methodobject.c:552)
==5508== by 0x50F338: _PyObject_MakeTpCall (call.c:191)
==5508== by 0x574E12: _PyObject_VectorcallTstate (abstract.h:116)
==5508== by 0x574E12: PyObject_Vectorcall (abstract.h:127)
==5508== by 0x574E12: call_function (ceval.c:5077)
==5508== by 0x574E12: _PyEval_EvalFrameDefault (ceval.c:3489)
==5508== by 0x5100A1: _PyEval_EvalFrame (pycore_ceval.h:40)
==5508== by 0x5100A1: function_code_fastcall (call.c:330)
==5508== by 0x5100A1: _PyFunction_Vectorcall (call.c:367)
==5508== by 0x5100A1: _PyObject_FastCallDictTstate (call.c:118)
==5508== by 0x5100A1: _PyObject_Call_Prepend (call.c:489)
==5508== by 0x54B2A2: slot_tp_init (typeobject.c:6969)
==5508== by 0x54880A: type_call (typeobject.c:1026)
==5508==
==5508== LEAK SUMMARY:
==5508== definitely lost: 32 bytes in 1 blocks
==5508== indirectly lost: 4,040 bytes in 1 blocks
==5508== possibly lost: 0 bytes in 0 blocks
==5508== still reachable: 606,358 bytes in 115 blocks
==5508== suppressed: 896,602 bytes in 498 blocks
Your configuration (mandatory)
- OS: MacOS Catalina 10.15.7 and Ubuntu 18.04.5 LTS
- Python version: 3.9.4 and 3.9.12
- PyMuPDF version: 1.19.6 from wheel
>>> print(sys.version, "\n", sys.platform, "\n", fitz.__doc__)
3.9.4 (v3.9.4:1f2e3088f3, Apr 4 2021, 12:32:44)
[Clang 6.0 (clang-600.0.57)]
darwin
PyMuPDF 1.19.6: Python bindings for the MuPDF 1.19.0 library.
Version date: 2022-03-03 00:00:01.
Built for Python 3.9 on darwin (64-bit).
>>> print(sys.version, "\n", sys.platform, "\n", fitz.__doc__)
3.9.12 (main, Mar 24 2022, 16:21:12)
[GCC 7.5.0]
linux
PyMuPDF 1.19.6: Python bindings for the MuPDF 1.19.0 library.
Version date: 2022-03-03 00:00:01.
Built for Python 3.9 on linux (64-bit).