Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XLMR results on POS tagging (ja, zh and yo) #84

Open
amineabdaoui opened this issue Jan 28, 2022 · 2 comments
Open

XLMR results on POS tagging (ja, zh and yo) #84

amineabdaoui opened this issue Jan 28, 2022 · 2 comments

Comments

@amineabdaoui
Copy link
Contributor

Hi,
Any idea why XLMR results on UDPOS are so bad for Japanese, Chinese and Yoruba?
Thanks

@sebastianruder
Copy link
Collaborator

Hi Amine,
Part of the reason may be due to the non-Latin script for Japanese and Chinese and for Yoruba due to there being relatively little pre-training data available.

@amineabdaoui
Copy link
Contributor Author

Hi Sebastian,

You are right, this is probably part of the reason.

Regarding Yoruba, it seems that this language was not considered in the pre-training data of XLMR (it is not present in the list of languages of the CC-100 corpus). While it has been included in mBERT pretraining data.

But the cross lingual transfer from English to Japanese and to Chinese is much better on the remaining tasks. The drop seems to be significant only in Token-Level tasks (NER and POS).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants