Closed
Description
Description of the bug
We've encountered a number of PDFs recently (all from the same source, so I suspect this is specific to a quirk of their format) where calling clean_contents()
removes all visible page content. I found that setting sanitize=False
causes the content to be retained.
Our process requires adding text to the PDFs, and we call clean_contents()
because we've found that without that, text sometimes isn't successfully added.
I'm happy to add sanitize=False
to our code if this isn't a bug. Thanks for taking a look!
How to reproduce the bug
import pymupdf
single_page = pymupdf.open("./single_page.pdf")
for page in single_page.pages():
page.clean_contents()
single_page.save("./single_page_cleaned.pdf", garbage=2, deflate=True)
PyMuPDF version
1.25.0
Operating system
MacOS
Python version
3.11