-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ghostscript backup when validating a PDF #16
Conversation
What about switching to gs and removing use of pdftk. It isn't used for anything else now. Also need to wrap all system calls in timeouts so that if there is a bug in the other software and it doesn't return in a short time we keep working. Need to check if the terminator code kills gs or if you also need to use the terminate script. |
@macite Thought about removing pdftk from the validation altogether but thought I'd check with you first. We still use pdtk to aggregate PDFs together. I can try and swapping this with ghostscript also. I could add a system call helper that will wrap all system calls with the terminate script? What do you think? |
Not any more. Task and portfolio pdfs are created using Latex. The old code is still there, but needs to be removed. |
Ah right. Well all updated there... not sure how much is actually used now after the LaTeX updates but we can always remove it. |
Just did a quick time check and gs is about 3x slower than PDFtk at this task 😞 We should think about this further... we also had issues with gs not terminating with some PDFs in the compression. This also wasn't killed by terminator, so we added the timeout script. So if we do use this then need to include the timeout in all locations. Also thinking about including: https://github.com/yob/pdf-reader/tree/master/lib/pdf/reader We could test that as an alternative to system calls for checking pdf integrity. We also need this to check page count etc. |
I added an extension to the I think there are some other options that we can add that reduces the time. Quality reduction etc. I'll look into it as soon as I can. I'll look into PDF-reader too 😊
|
I tested using unix |
Thanks I'll try with Do you remember what the benchmark was for pdftk?
|
It was about 100ms-150ms... gs was 300-400ms but that depends on the machine... |
@macite Looking into to resolving this PR. Noticed that I'm going to replace the |
@macite Sorry! Ignore the above... realised I hadn't been up-to-date with |
Getting some issues with |
The latest changes do this using a simple file read and scan for %%END in the last 1024 characters. See http://stackoverflow.com/questions/28156467/fastest-way-to-check-that-a-pdf-is-corrupted-or-just-missing-eof-in-ruby Unless we need the PDF reader for other purposes, I think the file scanning option should be sufficient. |
@macite Yeah I bumped into this after updating... cool I agree. If that suffices, then we don't need it :) |
@alexcu Ok, lets close this pull request. |
@macite I think something went wrong with GitHub that isn't showing other the additions made in this PR that haven't been merged in (e.g., removing PDFtk from the documentation). See develop...final-year-project:enhance/validate-pdfs-using-ghostscript. I think these should be merged in still? Shall I create another PR with only these changes/removal of references to PDFtk in documentation etc.? |
We killed Ghostscript, right? I'll remove it from the documentation if so... |
The others look like changes I made and are in develop already. Ghostscript is used to compress the PDFs... or did we remove that as well? |
Ah my bad... yes Ghostscript is still used to compress PDFs. Thought we had switched to something else. Not quite... the documentation is still out of date on Let me double check. I'll make sure |
Ah yes, I think the |
Ahh... makes sense. Was trying to find it in |
Problem
Whenever PDFs are validated using pdftk (under OS X El Capitan) there seems to be some stalling issues where pdftk fails (refer to this SO post).
E.g., under OS X 10.11 try:
It stalls for every El Capitan install using
pdftk 2.02
. Instead, the following SO post shows how we can do this speedily using ghostscript:TL;DR
This PR adds ghostscript validation when pdftk fails. As an alternative to pdftk for validation, we could use ghostscript instead for development purposes, but depends on what you think RE this @macite...