-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance #26
Comments
Hi @sirupsen, thanks for the bug reports! Yes, performance is a bit of an issue. The majority of the time is spent on cropping the pages of the pdf to remove excess white space. After some profiling it seems that the expensive part of that operation is creating an image of a pdf page using the pdfplumber package and I don't see a way to avoid that (the reason this is done in Python at all is that the bounding box returned by pdfcrop isn't very accurate (4cb3af8)). I did just push a commit (a2e833a) that speeds up using the Alternatively, I could add a |
Might it be possible to remove whitespace without converting to an image first? 👂 |
I'm not sure, it might be necessary to render the pdf to figure out what the page looks like. That said, there is a pdfparser package that links directly to libpoppler, which seems a lot faster but is not a drop-in replacement for the current method with pdfplumber. I'll look into this a bit more and see what I can do. |
@sirupsen Thanks again for reporting this issue! I've just pushed some changes that give about an 8x speedup. I'll prepare a new release of the package soon. |
Excellent!!! I’ll upgrade as soon as it’s released. I do think a no crop flag would be useful. Some may prefer to have tools (change pen, highlight, etc) visible at all times — or perhaps the crop could leave just enough space to always have the tools available on the left side? |
Try version 0.5.4, both a |
😍 |
This is not overly a nuisance for me because I run the script async, but it does make it a little less pleasant to use. This 41 page PDF takes over one minute!
The text was updated successfully, but these errors were encountered: