Skip to content

Multilingual OCR Development Plan #1048

Closed
@D-DanielYang

Description

model name description model size download Update Date
ch Chinese and English 3.71M inference model / trained model 2020.9.22
ch_tra chinese traditional 5.63M inference model / trained model 2021.1.21
en English 2.56M inference model / trained model 2020.9.22
fr French 2.65M inference model / trained model 2021.9.22
ar Arabic 2.53M inference model / trained model 2021.1.21
es Spanish 2.53M inference model / trained model 2021.1.21
pt Portuguese 2.63M inference model / trained model 2021.1.21
ru Russia 2.63M inference model / trained model 2021.1.21
ge german 2.65M inference model / trained model 2020.9.22
kr Korean 3.9M inference model / trained model 2020.9.22
jp Japanese 4.23M inference model / trained model 2020.9.22
it Italian 2.53M inference model / trained model 2021.1.21
hi Hindi 2.63M inference model / trained model 2021.1.21
ug Uyghur 2.63M inference model / trained model 2021.1.21
fa Persian 2.63M inference model / trained model 2021.1.21
ur Urdu 2.63M inference model / trained model 2021.1.21
oc Occitan 2.53M inference model / trained model 2021.1.21
mr Marathi 2.63M inference model / trained model 2021.1.21
ne Nepali 2.63M inference model / trained model 2021.1.21
rs_cyrillic Serbian(cyrillic) 2.63M inference model / trained model 2021.1.21
rs_latin Serbian(latin) 2.53M inference model / trained model 2021.1.21
bg Bulgarian 2.63M inference model / trained model 2021.1.21
uk Ukranian 2.63M inference model / trained model 2021.1.21
be Belarusian 2.63M inference model / trained model 2021.1.21
te Telugu 2.63M inference model / trained model 2021.1.21
kn Kannada 2.63M inference model / trained model 2021.1.21
ta Tamil 2.63M inference model / trained model 2021.1.21
mg Mongolian -- Ongoing
bg Bangla -- Need dict and corpus
bm Burmese -- Need dict and corpus call for contribution
ku_cent kurdish central -- PR8347 call for contribution
od Odia -- PR6348 call for contribution
th thai -- PR6719 issue chat call for contribution
More TBC

Guideline for new language requests

If you want to request a new language support, a PR with 2 following files are needed:

  1. In folder ppocr/utils/dict,
    it is necessary to submit the dict text to this path and name it with {language}_dict.txt that contains a list of all characters. Please see the format example from other files in that folder.

  2. In folder ppocr/utils/corpus,
    it is necessary to submit the corpus to this path and name it with {language}_corpus.txt that contains a list of words in your language.
    Maybe, 50000 words per language is necessary at least.
    Of course, the more, the better.

  3. call for contributions to add new language support for PaddleOCR.
    For anyone might be insterested in traing the new language model, Guidance to train the model is provided. We are calling contributions to add new language support for PaddleOCR.

If your language has unique elements, please tell me in advance within any way, such as useful links, wikipedia and so on.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions