Open
Description
On some page images full of text Tesseract does not detect any text when using the default settings. Typically it prints Empty page!!
twice for such pages. See issue #3021 for details and examples.
In some rare cases Tesseract prints Empty page!!
only once and finds text in a 2nd pass. That text is written to ALTO and text output, but hOCR output does not show that text.
Example:
tesseract https://digi.bib.uni-mannheim.de/periodika/fileadmin/data/DeutReunP_856399094_19140210/max/856399094_1910_035_03.jpg 856399094_1910_035_03 alto hocr txt