Skip to content

Memory leak in Document.insert_pdf() #1738

Closed
@Zeletochoy

Description

@Zeletochoy

Describe the bug (mandatory)

Memory leak when calling Document.insert_pdf()

To Reproduce (mandatory)

import os
import sys

import fitz
import psutil


with open(sys.argv[1], "rb") as f:
    doc = fitz.open(stream=f.read())


proc = psutil.Process(os.getpid())
first_ram = proc.memory_info().rss


calls = 0
while True:
    for page_idx in range(doc.page_count):
        copied_page = doc[page_idx]
        page_pdf = fitz.Document()
        page_pdf.insert_pdf(
            doc,
            from_page=page_idx,
            to_page=page_idx,
            start_at=0,
            rotate=copied_page.rotation,
        )
        calls += 1
        # I tried these after seeing suggestions in other issues but no difference
        copied_page = None
        page_pdf.close()
        page_pdf = None
    fitz.TOOLS.store_shrink(100)

    ram = proc.memory_info().rss
    ram_diff = (ram - first_ram) // 1024
    print(f"{calls} calls\t{ram // 1024}KB total\t+{ram_diff}KB since start ({ram_diff / calls:0.02f}/call)")

Expected behavior (optional)

Not leaking memory :)

Screenshots (optional)

Reproduction script output:

$ python leak.py 500pages-A4.pdf
500 calls	26252KB total	+3364KB since start (6.73/call)
1000 calls	28364KB total	+5476KB since start (5.48/call)
1500 calls	30476KB total	+7588KB since start (5.06/call)
2000 calls	32324KB total	+9436KB since start (4.72/call)
2500 calls	34436KB total	+11548KB since start (4.62/call)
3000 calls	36548KB total	+13660KB since start (4.55/call)
3500 calls	38396KB total	+15508KB since start (4.43/call)
4000 calls	40508KB total	+17620KB since start (4.41/call)
4500 calls	42356KB total	+19468KB since start (4.33/call)
5000 calls	44468KB total	+21580KB since start (4.32/call)
5500 calls	46580KB total	+23692KB since start (4.31/call)
6000 calls	48428KB total	+25540KB since start (4.26/call)
6500 calls	50540KB total	+27652KB since start (4.25/call)
7000 calls	52388KB total	+29500KB since start (4.21/call)
7500 calls	54500KB total	+31612KB since start (4.21/call)
8000 calls	56612KB total	+33724KB since start (4.22/call)
8500 calls	58460KB total	+35572KB since start (4.18/call)
9000 calls	60572KB total	+37684KB since start (4.19/call)
9500 calls	62420KB total	+39532KB since start (4.16/call)
10000 calls	64532KB total	+41644KB since start (4.16/call)
10500 calls	66644KB total	+43756KB since start (4.17/call)
11000 calls	68492KB total	+45604KB since start (4.15/call)
11500 calls	70604KB total	+47716KB since start (4.15/call)
12000 calls	72452KB total	+49564KB since start (4.13/call)
12500 calls	74564KB total	+51676KB since start (4.13/call)
13000 calls	76676KB total	+53788KB since start (4.14/call)
13500 calls	78524KB total	+55636KB since start (4.12/call)
14000 calls	80636KB total	+57748KB since start (4.12/call)
14500 calls	82484KB total	+59596KB since start (4.11/call)
15000 calls	84596KB total	+61708KB since start (4.11/call)
15500 calls	86708KB total	+63820KB since start (4.12/call)
16000 calls	88556KB total	+65668KB since start (4.10/call)
16500 calls	90668KB total	+67780KB since start (4.11/call)
17000 calls	92516KB total	+69628KB since start (4.10/call)

It stabilizes around 4KB/call on that PDF, seems to leak more with more pages.

Valgrind summary on a debug build (tag 1.19.6, MuPDF 1.19.0, suppressing PyMem_RawMalloc calls):

==5508== HEAP SUMMARY:
==5508==     in use at exit: 1,507,032 bytes in 615 blocks
==5508==   total heap usage: 4,721 allocs, 4,106 frees, 8,493,027 bytes allocated
==5508==
==5508== 4,072 (32 direct, 4,040 indirect) bytes in 1 blocks are definitely lost in loss record 326 of 357
==5508==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5508==    by 0x692F78B: do_scavenging_malloc (memory.c:51)
==5508==    by 0x692F85F: fz_calloc (memory.c:111)
==5508==    by 0x6981238: pdf_new_graft_map (pdf-graft.c:42)
==5508==    by 0x689DFF5: new_Graftmap (fitz_wrap.c:17104)
==5508==    by 0x68B37A3: _wrap_new_Graftmap (fitz_wrap.c:29204)
==5508==    by 0x5ECFF7: cfunction_call (methodobject.c:552)
==5508==    by 0x50F338: _PyObject_MakeTpCall (call.c:191)
==5508==    by 0x574E12: _PyObject_VectorcallTstate (abstract.h:116)
==5508==    by 0x574E12: PyObject_Vectorcall (abstract.h:127)
==5508==    by 0x574E12: call_function (ceval.c:5077)
==5508==    by 0x574E12: _PyEval_EvalFrameDefault (ceval.c:3489)
==5508==    by 0x5100A1: _PyEval_EvalFrame (pycore_ceval.h:40)
==5508==    by 0x5100A1: function_code_fastcall (call.c:330)
==5508==    by 0x5100A1: _PyFunction_Vectorcall (call.c:367)
==5508==    by 0x5100A1: _PyObject_FastCallDictTstate (call.c:118)
==5508==    by 0x5100A1: _PyObject_Call_Prepend (call.c:489)
==5508==    by 0x54B2A2: slot_tp_init (typeobject.c:6969)
==5508==    by 0x54880A: type_call (typeobject.c:1026)
==5508==
==5508== LEAK SUMMARY:
==5508==    definitely lost: 32 bytes in 1 blocks
==5508==    indirectly lost: 4,040 bytes in 1 blocks
==5508==      possibly lost: 0 bytes in 0 blocks
==5508==    still reachable: 606,358 bytes in 115 blocks
==5508==         suppressed: 896,602 bytes in 498 blocks

Your configuration (mandatory)

  • OS: MacOS Catalina 10.15.7 and Ubuntu 18.04.5 LTS
  • Python version: 3.9.4 and 3.9.12
  • PyMuPDF version: 1.19.6 from wheel
>>> print(sys.version, "\n", sys.platform, "\n", fitz.__doc__)
3.9.4 (v3.9.4:1f2e3088f3, Apr  4 2021, 12:32:44)
[Clang 6.0 (clang-600.0.57)]
 darwin

PyMuPDF 1.19.6: Python bindings for the MuPDF 1.19.0 library.
Version date: 2022-03-03 00:00:01.
Built for Python 3.9 on darwin (64-bit).
>>> print(sys.version, "\n", sys.platform, "\n", fitz.__doc__)
3.9.12 (main, Mar 24 2022, 16:21:12)
[GCC 7.5.0]
 linux

PyMuPDF 1.19.6: Python bindings for the MuPDF 1.19.0 library.
Version date: 2022-03-03 00:00:01.
Built for Python 3.9 on linux (64-bit).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions