low-resource-languages

Here are 74 public repositories matching this topic...

csebuetnlp / xl-sum

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.

multilingual machine-learning deep-learning dataset text-summarization abstractive-text-summarization abstractive-summarization text-summarisation low-resource-languages multilinguality summarization-corpora summarization-dataset multilingual-text-summarization text-summarization-dataset text-summarization-model low-resource-summarization low-resource-text-summarizarion multilingual-summarization

Updated Mar 26, 2024
Python

cisnlp / GlotLID

Star

💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023

language-detection multlingual language-detector language-recognition glot lid language-identification language-classification language-identification-toolkit low-resource-languages language-detection-library language-identifier language-detection-lib langid low-resource-nlp glotcc glotlid

Updated Jun 5, 2025
Python

csebuetnlp / banglanmt

Star

This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.

machine-translation neural-machine-translation parallel-corpus parallel-corpora bangla-nlp low-resource-languages bangla-machine-translation bangla-dataset-machine-translation emnlp-2020 low-resource-nlp low-resource-machine-translation

Updated Oct 23, 2024
Python

back-kh / SADA-Ancient-Palm-Leaf-Manuscripts-Recognitions

Star

[PRL 2025, APSIPA 2022] Syllable Analysis Data Augmentation (SADA), This project introduces a glyph dictionary and grammar-aware augmentation strategy designed to enhance Khmer palm leaf manuscript recognition. By modeling the language's grammatical structure, we support more robust OCR performance in low-resource settings.

text-recognition data-augmentation ancient-languages low-resource-languages

Updated Sep 5, 2025
Python

ljvmiranda921 / calamanCy

Star

NLP pipelines for Tagalog using spaCy

nlp machine-learning natural-language-processing spacy computational-linguistics ner low-resource-languages low-resource-nlp

Updated Jul 20, 2025
Python

jcblaisecruz02 / Filipino-Text-Benchmarks

Star

Open-source benchmark datasets and pretrained transformer models in the Filipino language.

benchmark deep-learning text-classification corpus transformer transfer-learning tagalog bert filipino electra nli low-resource-languages tagalog-transformers electra-models

Updated Aug 26, 2024
Python

Rumeysakeskin / Turkish-Text-to-Speech

Sponsor

Star

Speech synthesis (TTS) in low-resource languages by training from scratch with Fastpitch and fine-tuning with HifiGan

pytorch tts speech-synthesis nvidia-docker waveform-generator low-resource-languages nvidia-nemo hifigan fastpitch turkish-text-to-speech phonetical-conversion spectrogram-generator

Updated Dec 5, 2023
Python

EveryVoiceTTS / EveryVoice

Star

The EveryVoice TTS Toolkit - Text To Speech for your language

python text-to-speech speech pytorch tts speech-synthesis speech-processing language-revitalization low-resource-languages pytorch-lightning

Updated Oct 31, 2025
Python

alexandra-chron / relm_unmt

Star

Python source code for EMNLP 2020 paper "Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT".

transfer-learning language-models cross-lingual low-resource-languages residual-adapters pretraining unsupervised-machine-translation

Updated Mar 16, 2022
Python

luciusssss / mc2_corpus

Star

[ACL'24] MC^2: A Multilingual Corpus of Minority Languages in China (Tibetan, Uyghur, Kazakh, and Mongolian)

multilingual natural-language-processing corpus mongolian tibetan tibetan-nlp uyghur kazakh low-resource-languages low-resource-nlp

Updated Jun 16, 2025
Python

luciusssss / ZhuangBench

Star

[ACL'24 Findings] Teaching Large Language Models an Unseen Language on the Fly

low-resource-languages zhuang low-resource-nlp large-language-models llm

Updated Mar 13, 2025
Python

jhdeov / interlingual-MFA

Star

Workflow for forced alignment between languages

forced-alignment cross-language low-resource-languages montreal-forced-aligner multilingual-alignment cross-language-alignment

Updated Feb 16, 2024
Python

CoEDL / vad-sli-asr

Star

A pipeline to isolate and transcribe one language in mixed-language speech

automatic-speech-recognition endangered-languages voice-activity-detection low-resource-languages spoken-language-identification

Updated Oct 25, 2022
Python

BatsResearch / LexC-Gen

Star

Generate synthetic labeled data for extremely low-resource languages using bilingual lexicons.

multilingual sentiment-analysis topic-modeling synthetic-data synthetic-dataset-generation low-resource-languages lexicon-based multilingual-nlp llm

Updated Oct 3, 2024
Python

Aditi138 / EntityTargetedActiveLearning

Star

nlp named-entity-recognition transfer-learning active-learning low-resource-languages

Updated Aug 29, 2019
Python

khuangaf / CONCRETE

Star

Official implementation of "CONCRETE: Improving Cross-lingual Fact Checking with Cross-lingual Retrieval" (COLING'22)

retrieval fact-checking low-resource-languages multilinguality cross-lingual-transfer

Updated Oct 13, 2022
Python

cisnlp / GlotWeb

Star

🕸 GlotWeb: Web Indexing for Low-Resource Languages -- under construction.

multilingual dataset glot low-resource-languages news-dataset awsome-list

Updated Aug 13, 2025
Python

alecokas / BiLatticeRNN-Confidence

Star

Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks https://arxiv.org/abs/1910.11933 or https://ieeexplore.ieee.org/document/9053264

pytorch lstm speech-recognition attention lattice speech-processing asr lattices confidence-estimation low-resource-languages pytorch-implementation confidence-scores confusion-networks latticernn confidence-estimates

Updated Apr 16, 2020
Python

NN-Project-2 / Emotion-TTS-Emebddings

Star

This project explores zero-shot emotional speech synthesis using EMOD, a novel approach combining emotion and content embeddings for multilingual and cross-lingual emotion transfer. Built on a VITS-based TTS model, it preserves speaker identity while enhancing expressiveness, enabling emotion transfer across languages and genders efficiently.

text-to-speech end-to-end speech-synthesis zero-shot-learning few-shot few-shot-learning low-resource-languages emotional-speech-synthesis

Updated Oct 17, 2025
Python

fajri91 / minangNLP

Star

Minangkabau NLP corpus. PACLIC 2020

nlp sentiment-analysis machine-translation corpus indonesian-language bert ethnicity low-resource-languages minangkabau-language paclic

Updated Jun 7, 2021
Python

Improve this page

Add a description, image, and links to the low-resource-languages topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the low-resource-languages topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

low-resource-languages

Here are 74 public repositories matching this topic...

csebuetnlp / xl-sum

cisnlp / GlotLID

csebuetnlp / banglanmt

back-kh / SADA-Ancient-Palm-Leaf-Manuscripts-Recognitions

ljvmiranda921 / calamanCy

jcblaisecruz02 / Filipino-Text-Benchmarks

Rumeysakeskin / Turkish-Text-to-Speech

EveryVoiceTTS / EveryVoice

alexandra-chron / relm_unmt

luciusssss / mc2_corpus

luciusssss / ZhuangBench

jhdeov / interlingual-MFA

CoEDL / vad-sli-asr

BatsResearch / LexC-Gen

Aditi138 / EntityTargetedActiveLearning

khuangaf / CONCRETE

cisnlp / GlotWeb

alecokas / BiLatticeRNN-Confidence

NN-Project-2 / Emotion-TTS-Emebddings

fajri91 / minangNLP

Improve this page

Add this topic to your repo