The diff between the actual text (left) and the OCR result (json) of this image (right)...
<iframe src="http://prose.io/#ngawangtrinley/starter"></iframe>https://github.com/ngawangtrinley/ocr-tests/compare/f0035c4...3baffd5
...highlights several types of issues:
- '༥' 0f25, at the end of the header wasn't detected, but somehow an extra '།' 0f0d appeared at the end of the text
- '࿒' 0FD2 is replaced by a ':' 003a at the start of lines, and by '་' 0f0b in the middle of lines
- 'ཿ' 0f7f are ignored
- Tibetan enclosed alphanumerics (replaced by ①...) aren't detected at all. The reason most probably being that these aren't part of the Tibetan Unicode table
- a '་' 0f0b has been added between two sentences in line 18, most probably from the text on the backside of the page.
- the remaining issues are letter combinations used in transliterating sanskrit (very common in buddhist literature) and that might not have featured in training data.