Description
Description of the bug
Hi Team. Based on some criteria, I have written a script that removed some text, overlapping images, and vector graphics from a PDF.
Two days ago, we upgraded the PyMuPDF version on our server from 1.25.3
to 1.25.4
. Today, we received the following Error Exception raised corresponding to a PDF file:
[ERROR] 2025-04-01 14:58:43 Example.pdf - sample Traceback (most recent call last):
File "/Users/user/Documents/sample.py", line 45, in sample_func
doc.ez_save(dst_pdf)
File "/Users/user/miniconda3/envs/test4_env/lib/python3.10/site-packages/pymupdf/__init__.py", line 4223, in ez_save
return self.save(
File "/Users/user/miniconda3/envs/test4_env/lib/python3.10/site-packages/pymupdf/__init__.py", line 5584, in save
mupdf.pdf_write_document(pdf, out, opts)
File "/Users/user/miniconda3/envs/test4_env/lib/python3.10/site-packages/pymupdf/mupdf.py", line 53942, in pdf_write_document
return _mupdf.pdf_write_document(doc, out, opts)
pymupdf.mupdf.FzErrorFormat: code=7: cannot find object in xref (21 0 R)
Today, I checked that a new version of PyMuPDF has released, that is, 1.25.5
. I upgraded my server to that version to see if the error goes away, but it persisted. I also experimented with different save parameters and their values to see if the error resolves (an example below), but the same error persisted.
# doc.ez_save(dst_pdf)
doc.save(dst_pdf, garbage=4, clean=True, deflate=True, use_objstms=1)
I then downgraded the version to 1.25.3
. The script execution resulted in the following error message but no Error Exception was raised and the file got saved successfully:
MuPDF error: format error: cannot find object in xref (21 0 R)
How to reproduce the bug
Sharing below my example script for your reference. I have changed the script to keep the broad logic same. The script does contain all the PyMuPDF methods that I have used in the original script.
import logging
from io import BytesIO
import fitz
logger = logging.getLogger(__file__)
TARGET_TEXT = "xyz"
def sample_func(src_pdf):
if isinstance(src_pdf, BytesIO):
# if input_pdf is a BytesIO object
src_pdf.seek(0)
doc = fitz.open(stream=src_pdf, filetype="pdf")
elif isinstance(src_pdf, str):
doc = fitz.open(src_pdf)
for page_num in range(len(doc)):
# Load the page
page = doc.load_page(page_num)
logger.info(f"page_num: {page_num + 1}")
text_blocks = page.get_text("dict")["blocks"]
for block in text_blocks:
if block["type"] == 0: # text block
for line in block["lines"]:
for span in line["spans"]:
text_rect = fitz.Rect(span['bbox'])
logger.debug(f"span: {span}")
# Extract text within the specified rectangle
text = page.get_text("text", clip=text_rect).strip()
if text == TARGET_TEXT:
# Create redaction annotation
redact_annot = page.add_redact_annot(text_rect)
# images=2 blanks out overlapping pixels
# graphics=2 removes any overlapping vector graphics
# text=0 removes all characters whose boundary box overlaps any redaction rectangle
page.apply_redactions(images=2, graphics=2, text=0)
# Save the modified document
dst_pdf = BytesIO()
doc.ez_save(dst_pdf)
doc.close()
dst_pdf.seek(0)
return dst_pdf.read()
Following is the Error Exception raised on running the above script:
[ERROR] 2025-04-01 14:58:43 Example.pdf - sample Traceback (most recent call last):
File "/Users/user/Documents/sample.py", line 45, in sample_func
doc.ez_save(dst_pdf)
File "/Users/user/miniconda3/envs/test4_env/lib/python3.10/site-packages/pymupdf/__init__.py", line 4223, in ez_save
return self.save(
File "/Users/user/miniconda3/envs/test4_env/lib/python3.10/site-packages/pymupdf/__init__.py", line 5584, in save
mupdf.pdf_write_document(pdf, out, opts)
File "/Users/user/miniconda3/envs/test4_env/lib/python3.10/site-packages/pymupdf/mupdf.py", line 53942, in pdf_write_document
return _mupdf.pdf_write_document(doc, out, opts)
pymupdf.mupdf.FzErrorFormat: code=7: cannot find object in xref (21 0 R)
PyMuPDF version
1.25.5
Operating system
Linux
Python version
3.10