Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault trying to call clean_contents on certain pdfs with python 3.12 #2907

Closed
Luux opened this issue Dec 18, 2023 · 3 comments
Closed

segfault trying to call clean_contents on certain pdfs with python 3.12 #2907

Luux opened this issue Dec 18, 2023 · 3 comments

Comments

@Luux
Copy link
Contributor

Luux commented Dec 18, 2023

Description of the bug

Trying to use clean_contents on certain pdf files causes a segmentation fault when using Python 3.12. Interestingly, the same setup on Python 3.11 works just fine.
I managed to strip one of the pdfs down with mutool trim as far as possible to reproduce the bug. Trimming it further results in the segfault to disappear.

How to reproduce the bug

import pathlib
import fitz


def test_segfault():
    pdf_file = pathlib.Path("test11.pdf").read_bytes()
    fitz_document = fitz.open(stream=pdf_file, filetype="application/pdf")

    pdf_pages = list(fitz_document.pages())
    (page,) = pdf_pages
    page.clean_contents()

test_segfault()

test11.pdf

PyMuPDF version

1.23.7

Operating system

Linux

Python version

3.12

@julian-smith-artifex-com
Copy link
Collaborator

Thanks for the clear and simple reproducer. I've reproduced the problem.

Interestingly, the new "rebased" implementation of PyMuPDF appears to work fine here. The rebased implementation is included in current releases and is a drop-in replacement for the classic implementation, so i recommend you use it yourself here - just change import fitz to import fitz_new as fitz. See #2680 for more information.

The rebased implementation will become the default (with import fitz) quite soon after this week's release of 1.23.8.

Seeing as rebased works here, we do not intend to fix this issue in the classic implementation. Of course please let us know if you have any problems with import fitz_new as fitz.

@Luux
Copy link
Contributor Author

Luux commented Dec 19, 2023

I can confirm that it works with fitz_new

julian-smith-artifex-com added a commit to ArtifexSoftware/PyMuPDF-julian that referenced this issue Dec 20, 2023
This is for pymupdf#2907, 'segfault trying to call clean_contents on certain pdfs with
python 3.12'.

We are not intending to fix this bug, so actually this test only runs on
rebased.
julian-smith-artifex-com added a commit to ArtifexSoftware/PyMuPDF-julian that referenced this issue Dec 21, 2023
This is for pymupdf#2907, 'segfault trying to call clean_contents on certain pdfs with
python 3.12'.

We are not intending to fix this bug, so actually this test only runs on
rebased.
julian-smith-artifex-com added a commit to ArtifexSoftware/PyMuPDF-julian that referenced this issue Dec 22, 2023
This is for pymupdf#2907, 'segfault trying to call clean_contents on certain pdfs with
python 3.12'.

We are not intending to fix this bug, so actually this test only runs on
rebased.
julian-smith-artifex-com added a commit that referenced this issue Dec 22, 2023
This is for #2907, 'segfault trying to call clean_contents on certain pdfs with
python 3.12'.

We are not intending to fix this bug, so actually this test only runs on
rebased.
@julian-smith-artifex-com
Copy link
Collaborator

Fixed in 1.23.9 where import fitz gets the rebased implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants