-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Jupyter Kernal Dies after reading some pages in pyMuPdf #651
Comments
Thanks for submitting this - and especially with usable reproduction data! |
yes i agree to send this file. Thanks |
Thanks! Just saw, they have even more recent updates to the C file in question. I'll re-build my local MuPDF with it and try again before bothering them with stuff already fixed in their development version. |
This was a worthwhile try! The new C file does fix the bug. |
its not that much urgent, i have 2 days time for me to complete this task. It would be great if you could able to provide me an updated patch asap. Thanks once again :) |
Well, that is somewhat urgent then isn't it. I need your config: please show me the output of |
Here is my config: |
PyMuPDF-1.17.7-cp37-cp37m-win_amd64.zip |
@JorjMcKie Thanks a lot for providing quick solution along with new patch .whl . Also want to know if i want to install this in some other version of python environment like i have python 3.6.8 installed in server. is there any possibility that this PyMuPDF 1.17.7 will be available globally for compatible python 3.6.8/3.7 etc? |
I iwll publish v1.17.7 some time this week. Never worked with Linux server, but the generated Linux wheels should work. |
Just had another idea: import fitz
doc = fitz.open("...")
for page in doc:
page.cleanContents() # clean page description syntax
xref = page.getContents()[0] # the remaining /Contents object
cont = bytearray(doc.xrefStream(xref)) # read as modifyable
i1 = 0 # all text objects are wrapped in string pairs b"BT" ... b"ET"
while i1 < len(cont):
i1 = cont.find(b"BT")
if i1 < 0:
break
i2 = cont.find(b"ET", i1)
if i2 < 0:
break
cont[i1 : i2 + 2] = b"" # remove text object
doc.updateStream(xref, cont) # replace the /Contents
page.cleanContents() # remove fonts no longer used
doc.save("no-text.pdf", garbage=3, deflate=True) |
New version 1.17.7 is being uploaded right now. |
i am trying to compare some pdf having extensive pages, however some pdf's having 80 pages pass successfully with current logic, but some(even though less than 50 pages) stuck at some particular page and kernal dies, its not producing any error message at all.
in my code i am reading pdf page by page and each pae has been sent for remove_txts(). in remove_txts method i want to remove all bbox from a page, it stops at some page while doing page.apply_redactions() nd kernal dies.
my partial code is like this:
def remove_txts(page):
try:
blocks = page.getTextBlocks() # get blocks of text
for block in blocks:
bbox = list(block[0:4])
rect = fitz.Rect(bbox)
page.addRedactAnnot(rect, text=" ")
page.apply_redactions()
except Exception as e:
print("Exception occured in remove texts method: " + str(e))
one of the page i have attached here, which produces same issue.
new-doc-linear-32-33edited.pdf
The text was updated successfully, but these errors were encountered: