Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal crash calling page.getPixmap() with certain pdf file pages #605

Closed
derwyddon opened this issue Aug 20, 2020 · 5 comments
Closed

Fatal crash calling page.getPixmap() with certain pdf file pages #605

derwyddon opened this issue Aug 20, 2020 · 5 comments
Assignees
Labels

Comments

@derwyddon
Copy link

Please provide all mandatory information!
In some pages of certain pdf files the getPixmap crashes without showing any nformatio.

You could use these files from the IMF:
https://www.imf.org/~/media/Websites/IMF/imported-flagship-issues/external/pubs/ft/weo/2013/01/pdf/_c1pdf.ashx
https://www.imf.org/~/media/Websites/IMF/imported-flagship-issues/external/pubs/ft/weo/2013/01/pdf/_c3pdf.ashx

to reproduce the crash

Describe the bug (mandatory)

The _c1pdf.pdf file crashes when processig page 1 and the _c3pdf.pdf file crashes processing page 2 (starting from page 0).

To Reproduce (mandatory)

Step 1) Open the file with fitz.open
Step 2) Using a bucle to call getPixmap for every doc Page
Step 2.a) The specific parameters used where mat = fitz.Matrix(2.0, 2.0) and alpha=False, but the getPixmap crashes even with the default parrameters: getPixmap() and not only with getPixmap(mat,alpha=False)

Expected behavior (optional)

Getting the pixmap or a Exception

Your configuration (mandatory)

  • Windows 10 64bit v1809
  • Python v3.6.3
  • PyMuPDF v1.17.5

The output of print(sys.version, "\n", sys.platform, "\n", fitz.__doc__) :
3.6.3 (v3.6.3:2c5fed8, Oct 3 2017, 18:11:49) [MSC v.1900 64 bit (AMD64)]
win32

PyMuPDF 1.17.5: Python bindings for the MuPDF 1.17.0 library.
Version date: 2020-08-06 06:31:06.
Built for Python 3.6 on win32 (64-bit).

@JorjMcKie
Copy link
Collaborator

I don't know why you see no information, but those PDFs do contain errors, which also are revealed when subjecting them to e.g. mutool draw _c1pdf%d.png _c1pdf.pdf.
When I do page.getPixmap() under IDLE I see a lot errors, 2 or more per page around "recursive colorspace" definitions. When setting off ICC color management, fitz.TOOLS.set_icc(False), the situation is better and _c3pdf.pdf is at least processing each page - although not without still some errors.
Investigating further ...

@JorjMcKie
Copy link
Collaborator

Found the point where an issue is generated.
Is this urgent for you? In this case I can make a pre-vesion and upload a wheel to here.

@JorjMcKie
Copy link
Collaborator

what the hell - here is a pre-version:
PyMuPDF-1.17.6-cp36-cp36m-win_amd64.zip
Change the extension to "whl" and execute python -m pip install -U PyMuPDF-1.17.6-cp36-cp36m-win_amd64.whl

@derwyddon
Copy link
Author

Thank you.
In order to continue working, I have moved the getPixmap version to another thread to be able to control the unstability. I am processing a set of pdf files (looking for the most heterogeneous pdf files you could found into the "jungle") and I have found some addtional files with the same behaviour (the "crash"), but I do not know if the problem comes from the same origin exactly. There are a lot of pd files that ara really bad built and contains a lot of different errors (as you mention).
Then, it is not urgent for me now, but I wil try the wheel and I will let you know if any of the files that I have found continues to have problems with the patch.

@JorjMcKie
Copy link
Collaborator

resolved in 1.17.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants