Tesseract creates hOCR output without text results

On some page images full of text Tesseract does not detect any text when using the default settings. Typically it prints `Empty page!!` twice for such pages. See issue #3021 for details and examples.

In some rare cases Tesseract prints `Empty page!!` only once and finds text in a 2nd pass. That text is written to ALTO and text output, but hOCR output does not show that text.

Example:

```
tesseract https://digi.bib.uni-mannheim.de/periodika/fileadmin/data/DeutReunP_856399094_19140210/max/856399094_1910_035_03.jpg 856399094_1910_035_03 alto hocr txt
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tesseract creates hOCR output without text results #4112

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development