ocr-tests

Test 1:

The diff between the actual text (left) and the OCR result (json) of this image (right)...

...highlights several types of issues:

'༥' 0f25, at the end of the header wasn't detected, but somehow an extra '།' 0f0d appeared at the end of the text
'࿒' 0FD2 is replaced by a ':' 003a at the start of lines, and by '་' 0f0b in the middle of lines
'ཿ' 0f7f are ignored
Tibetan enclosed alphanumerics (replaced by ①...) aren't detected at all. The reason most probably being that these aren't part of the Tibetan Unicode table
a '་' 0f0b has been added between two sentences in line 18, most probably from the text on the backside of the page.
the remaining issues are letter combinations used in transliterating sanskrit (very common in buddhist literature) and that might not have featured in training data.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
google-vision/test1		google-vision/test1
README.md		README.md