Skip to content

List of languages in development #91

Closed
@rkcosmos

Description

@rkcosmos

I will update/edit this issue to track development process of new language. The current list is

Group 1 (Arabic script)

  • Arabic (DONE, August, 5 2020)
  • Uyghur (DONE, August, 5 2020)
  • Persian (DONE, August, 5 2020)
  • Urdu (DONE, August, 5 2020)

Group 2 (Latin script)

  • Serbian-latin (DONE, July,12 2020)
  • Occitan (DONE, July,12 2020)

Group 3 (Devanagari)

  • Hindi (DONE, July,24 2020)
  • Marathi (DONE, July,24 2020)
  • Nepali (DONE, July,24 2020)
  • Rajasthani (NEED HELP)
  • Awadhi, Haryanvi, Sanskrit (if possible)

Group 4 (Cyrillic script)

  • Russian (DONE, July,29 2020)
  • Serbian-cyrillic (DONE, July,29 2020)
  • Bulgarian (DONE, July,29 2020)
  • Ukranian (DONE, July,29 2020)
  • Mongolian (DONE, July,29 2020)
  • Belarusian (DONE, July,29 2020)
  • Tajik (DONE, April,20 2021)
  • Kyrgyz (NEED HELP)

Group 5

  • Telugu (DONE, November,17 2020)
  • Kannada (DONE, November,17 2020)

Group 6 (Language that doesn't share characters with others)

  • Tamil (DONE, August, 10 2020)
  • Hebrew (ready to train)
  • Malayalam (ready to train)
  • Bengali + Assamese (DONE, August, 23 2020)
  • Punjabi (ready to train)
  • Abkhaz (ready to train)

Group 7 (Improvement and possible extra models)

  • Japanese version 2 (DONE, March, 21 2021)+ vertical text
  • Chinese version2 (DONE, March, 21 2021)+ vertical text
  • Korean version 2(DONE, March, 21 2021)
  • Latin version 2 (DONE, March, 21 2021)
  • Math + Greek?
  • Number+symbol only

Guideline for new language request

To request a new language support, I need you to send a PR with 2 following files

  1. In folder easyocr/character, we need 'yourlanguagecode_char.txt' that contains list of all characters. Please see format/example from other files in that folder.
  2. In folder easyocr/dict, we need 'yourlanguagecode.txt' that contains list of words in your language. On average we have ~30000 words per language with more than 50000 words for popular one. More is better in this file.

If your language has unique elements (such as 1. Arabic: characters change form when attach to each other + write from right to left 2. Thai: Some characters need to be above the line and some below), please educate me with your best ability and/or give useful links. It is important to take care of the detail to achieve a system that really works.

Lastly, please understand that my priority will have to go to popular language or set of languages that share most of characters together (also tell me if your language share a lot of characters with other). It takes me at least a week to work for new model. You may have to wait a while for new model to be released.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Language RequestRequest for new language supporthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions