Description
I will update/edit this issue to track development process of new language. The current list is
Group 1 (Arabic script)
- Arabic (DONE, August, 5 2020)
- Uyghur (DONE, August, 5 2020)
- Persian (DONE, August, 5 2020)
- Urdu (DONE, August, 5 2020)
Group 2 (Latin script)
- Serbian-latin (DONE, July,12 2020)
- Occitan (DONE, July,12 2020)
Group 3 (Devanagari)
- Hindi (DONE, July,24 2020)
- Marathi (DONE, July,24 2020)
- Nepali (DONE, July,24 2020)
- Rajasthani (NEED HELP)
- Awadhi, Haryanvi, Sanskrit (if possible)
Group 4 (Cyrillic script)
- Russian (DONE, July,29 2020)
- Serbian-cyrillic (DONE, July,29 2020)
- Bulgarian (DONE, July,29 2020)
- Ukranian (DONE, July,29 2020)
- Mongolian (DONE, July,29 2020)
- Belarusian (DONE, July,29 2020)
- Tajik (DONE, April,20 2021)
- Kyrgyz (NEED HELP)
Group 5
- Telugu (DONE, November,17 2020)
- Kannada (DONE, November,17 2020)
Group 6 (Language that doesn't share characters with others)
- Tamil (DONE, August, 10 2020)
- Hebrew (ready to train)
- Malayalam (ready to train)
- Bengali + Assamese (DONE, August, 23 2020)
- Punjabi (ready to train)
- Abkhaz (ready to train)
Group 7 (Improvement and possible extra models)
- Japanese version 2 (DONE, March, 21 2021)+ vertical text
- Chinese version2 (DONE, March, 21 2021)+ vertical text
- Korean version 2(DONE, March, 21 2021)
- Latin version 2 (DONE, March, 21 2021)
- Math + Greek?
- Number+symbol only
Guideline for new language request
To request a new language support, I need you to send a PR with 2 following files
- In folder easyocr/character, we need 'yourlanguagecode_char.txt' that contains list of all characters. Please see format/example from other files in that folder.
- In folder easyocr/dict, we need 'yourlanguagecode.txt' that contains list of words in your language. On average we have ~30000 words per language with more than 50000 words for popular one. More is better in this file.
If your language has unique elements (such as 1. Arabic: characters change form when attach to each other + write from right to left 2. Thai: Some characters need to be above the line and some below), please educate me with your best ability and/or give useful links. It is important to take care of the detail to achieve a system that really works.
Lastly, please understand that my priority will have to go to popular language or set of languages that share most of characters together (also tell me if your language share a lot of characters with other). It takes me at least a week to work for new model. You may have to wait a while for new model to be released.