XLMR results on POS tagging (ja, zh and yo) #84

amineabdaoui · 2022-01-28T18:22:47Z

Hi,
Any idea why XLMR results on UDPOS are so bad for Japanese, Chinese and Yoruba?
Thanks

sebastianruder · 2022-02-05T15:35:39Z

Hi Amine,
Part of the reason may be due to the non-Latin script for Japanese and Chinese and for Yoruba due to there being relatively little pre-training data available.

amineabdaoui · 2022-02-06T09:03:13Z

Hi Sebastian,

You are right, this is probably part of the reason.

Regarding Yoruba, it seems that this language was not considered in the pre-training data of XLMR (it is not present in the list of languages of the CC-100 corpus). While it has been included in mBERT pretraining data.

But the cross lingual transfer from English to Japanese and to Chinese is much better on the remaining tasks. The drop seems to be significant only in Token-Level tasks (NER and POS).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XLMR results on POS tagging (ja, zh and yo) #84

XLMR results on POS tagging (ja, zh and yo) #84

amineabdaoui commented Jan 28, 2022

sebastianruder commented Feb 5, 2022

amineabdaoui commented Feb 6, 2022

XLMR results on POS tagging (ja, zh and yo) #84

XLMR results on POS tagging (ja, zh and yo) #84

Comments

amineabdaoui commented Jan 28, 2022

sebastianruder commented Feb 5, 2022

amineabdaoui commented Feb 6, 2022