Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified & Traditional Chinese #192

Open
reececomo opened this issue Dec 18, 2023 · 2 comments
Open

Simplified & Traditional Chinese #192

reececomo opened this issue Dec 18, 2023 · 2 comments

Comments

@reececomo
Copy link

reececomo commented Dec 18, 2023

We should treat Simplified Chinese and Traditional Chinese as two completely seperate languages. There's obviously more nuance to it than that, but echoing #6 and #80, its super important to distinguish between the two very different overarching types of Chinese.

It's close to saying "well both English and German look the same to me so they must be interoperable" 😆

A code of zh conventionally refers to Simplified Chinese.

# Common Simplified Chinese Codes
zh
zh-Hans
zh-CN (Mainland China variant - Historically used for all Simplified Chinese)

# Common Traditional Chinese Codes
zh-Hant
zh-TW (Taiwan variant - Historically used for all Traditional Chinese)
zh-HK (Hong Kong variant)

If it helps you from a training data point of view, they're two totally different ISO Language Scripts (Hans vs Hant).

@pemistahl
Copy link
Owner

Hello Reece,

thank you for this clarification. I wasn't aware of the fact that simplified and traditional Chinese are as different as English and German, for instance. I will try to find better training data for each variant and let the library handle the variants separately.

@jibaro
Copy link

jibaro commented Apr 12, 2024

Hello @pemistahl , the difference between simplified and traditional characters is very important for Chinese. When can you support it? -_-!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants