Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More options maybe... #22

Closed
ChristosT opened this issue Nov 13, 2014 · 3 comments
Closed

More options maybe... #22

ChristosT opened this issue Nov 13, 2014 · 3 comments
Assignees

Comments

@ChristosT
Copy link

Hi I use pypdfocr almost everyday. Thank you for sharing it !
How about adding the following options:

  • adding an commandline option which disables the preprocessing step (sometimes pdf files are already in good quality and preprocessing is only waste of time. (for example try this file http://comjnl.oxfordjournals.org/content/24/2/167.full.pdf)
  • making somehow possible to pause the execution(maybe in an interactive mode?) after each step. This could enable features as:
    • Edit some of the jpg/tiff files externally
    • fix the mistakes made by tesseract editing the horc files externally

Thank you!

@virantha
Copy link
Owner

virantha commented Dec 4, 2014

I'll add the skip-preprocess option in the next release. Need to think about the latter, but I was thinking about creating a flow option where you specify the pieces of the flow that you want to run (preprocess, image extract, tesseract, merge, etc). So you could first run only the tesseract part of the flow, then edit the hocr files, for example, and then rerun just the merge/output flow. But right now I'm working on putting in a parallel queue to speed up multi-page ocr, so I might need to finish that before this.

@ChristosT
Copy link
Author

Yeah the flow option sounds great! I agree that parallel processing should be the priority right now. Keep up the good work !

@virantha virantha self-assigned this Dec 12, 2014
@virantha
Copy link
Owner

parallel processing is now in as of 0.8.2. Skip-preprocess is in as of 0.8.1. Next release will probably have these flow steps...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants