14980-Images-PPT-OCR-Data-of-8-Languages

Description

14,980 Images PPT OCR Data of 8 Languages. This dataset includes 8 languages, multiple scenes, different photographic angles, different photographic distances, different light conditions. For annotation, line-level quadrilateral bounding box annotation and transcription for the texts were annotated in the data. The dataset can be used for tasks such as OCR of multi-language.

For more details, please refer to the link:https://www.nexdata.ai/datasets/ocr/979?source=Github

Data size

14,980 images, 8 languages

Data environment

including meeting room, conference room

Language types

French, Korean, Japanese, Spanish, German, Italian, Portuguese and Russian

Data diversity

multiple scenes, multiple languages, different photographic angles, different photographic distances, different light conditions

Device

cellphone

Collecting angles

front, left, right, looking up angle

Data format

the image data format is .jpg, the annotation file data format is .json

Annotation content

line-level quadrilateral bounding box annotation and transcription for the texts

Accuracy

the error bound of each vertex of quadrilateral bounding box is within 5 pixels, which is a qualified annotation, the accuracy of bounding boxes is not less than 95%; the texts transcription accuracy is not less than 95%

Licensing Information

Commercial License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

14980-Images-PPT-OCR-Data-of-8-Languages

Description

Data size

Data environment

Language types

Data diversity

Device

Collecting angles

Data format

Annotation content

Accuracy

Licensing Information

Files

README.md

Latest commit

History

README.md

File metadata and controls

14980-Images-PPT-OCR-Data-of-8-Languages

Description

Data size

Data environment

Language types

Data diversity

Device

Collecting angles

Data format

Annotation content

Accuracy

Licensing Information