Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023
-
Updated
Apr 20, 2024 - Python
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023
Improving Bilingual Lexicon Induction with Cross-Encoder Reranking (Findings of EMNLP 2022). Keywords: Bilingual Lexicon Induction, Word Translation, Cross-Lingual Word Embeddings.
Zero-shot and Translation Experiments on XQuAD, MLQA and TyDiQA
In this work we applied multilingual zero-shot transfer concept for the task of toxic comments detection. This concept allows a model trained only on a single-language dataset to work in arbitrary language, even low-resource.
Advancing Homepage2Vec with LLM-Generated Datasets for Multilingual Website Classification
mBERT and XLM-R for encodeing of Scandinavian languages
Collection of scripts used to create SRL datasets for Galician and Spanish using a verbal indexing method, as well as fine-tuned BERT and XLM-R models for SRL on each language
Add a description, image, and links to the xlm-r topic page so that developers can more easily learn about it.
To associate your repository with the xlm-r topic, visit your repo's landing page and select "manage topics."