More options maybe... #22

ChristosT · 2014-11-13T17:40:41Z

Hi I use pypdfocr almost everyday. Thank you for sharing it !
How about adding the following options:

adding an commandline option which disables the preprocessing step (sometimes pdf files are already in good quality and preprocessing is only waste of time. (for example try this file http://comjnl.oxfordjournals.org/content/24/2/167.full.pdf)
making somehow possible to pause the execution(maybe in an interactive mode?) after each step. This could enable features as:
- Edit some of the jpg/tiff files externally
- fix the mistakes made by tesseract editing the horc files externally

Thank you!

virantha · 2014-12-04T22:23:44Z

I'll add the skip-preprocess option in the next release. Need to think about the latter, but I was thinking about creating a flow option where you specify the pieces of the flow that you want to run (preprocess, image extract, tesseract, merge, etc). So you could first run only the tesseract part of the flow, then edit the hocr files, for example, and then rerun just the merge/output flow. But right now I'm working on putting in a parallel queue to speed up multi-page ocr, so I might need to finish that before this.

ChristosT · 2014-12-06T16:51:11Z

Yeah the flow option sounds great! I agree that parallel processing should be the priority right now. Keep up the good work !

virantha · 2014-12-12T15:26:42Z

parallel processing is now in as of 0.8.2. Skip-preprocess is in as of 0.8.1. Next release will probably have these flow steps...

virantha added the enhancement label Dec 12, 2014

virantha self-assigned this Dec 12, 2014

ChristosT closed this as completed Jan 1, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More options maybe... #22

More options maybe... #22

ChristosT commented Nov 13, 2014

virantha commented Dec 4, 2014

ChristosT commented Dec 6, 2014

virantha commented Dec 12, 2014

More options maybe... #22

More options maybe... #22

Comments

ChristosT commented Nov 13, 2014

virantha commented Dec 4, 2014

ChristosT commented Dec 6, 2014

virantha commented Dec 12, 2014