14,980 Images PPT OCR Data of 8 Languages. This dataset includes 8 languages, multiple scenes, different photographic angles, different photographic distances, different light conditions. For annotation, line-level quadrilateral bounding box annotation and transcription for the texts were annotated in the data. The dataset can be used for tasks such as OCR of multi-language.
For more details, please refer to the link:https://www.nexdata.ai/datasets/ocr/979?source=Github
14,980 images, 8 languages
including meeting room, conference room
French, Korean, Japanese, Spanish, German, Italian, Portuguese and Russian
multiple scenes, multiple languages, different photographic angles, different photographic distances, different light conditions
cellphone
front, left, right, looking up angle
the image data format is .jpg, the annotation file data format is .json
line-level quadrilateral bounding box annotation and transcription for the texts
the error bound of each vertex of quadrilateral bounding box is within 5 pixels, which is a qualified annotation, the accuracy of bounding boxes is not less than 95%; the texts transcription accuracy is not less than 95%
Commercial License