You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi I use pypdfocr almost everyday. Thank you for sharing it !
How about adding the following options:
adding an commandline option which disables the preprocessing step (sometimes pdf files are already in good quality and preprocessing is only waste of time. (for example try this file http://comjnl.oxfordjournals.org/content/24/2/167.full.pdf)
making somehow possible to pause the execution(maybe in an interactive mode?) after each step. This could enable features as:
Edit some of the jpg/tiff files externally
fix the mistakes made by tesseract editing the horc files externally
Thank you!
The text was updated successfully, but these errors were encountered:
I'll add the skip-preprocess option in the next release. Need to think about the latter, but I was thinking about creating a flow option where you specify the pieces of the flow that you want to run (preprocess, image extract, tesseract, merge, etc). So you could first run only the tesseract part of the flow, then edit the hocr files, for example, and then rerun just the merge/output flow. But right now I'm working on putting in a parallel queue to speed up multi-page ocr, so I might need to finish that before this.
Hi I use pypdfocr almost everyday. Thank you for sharing it !
How about adding the following options:
Thank you!
The text was updated successfully, but these errors were encountered: