Multilingual sentence alignment using sentence embeddings
-
Updated
Nov 4, 2024 - Python
Multilingual sentence alignment using sentence embeddings
Caucasus languages focused multilingual and monolingual corpuses for Natural Language Processing(NLP)
This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.
Neural Machine Translation on the Nepali-English language pair
OpusFilter - Parallel corpus processing toolkit
A simple and efficient tool for mining and aligning sentences with pre-trained models.
非常全的文言文(古文)-现代文平行语料
Parallel corpus annotation and visualization
OPUS (opus.nlpl.eu) Python3 API
Framework para corpus paralelos | Framework for parallel corpora
Creating (parallel) corpora from scratch using Uplug tooling
Leeds University and King Saud University (LK) Hadith Corpus
Extracting present perfects (and related forms) from parallel corpora
Python application, generating parallel corpus for any language pairs, can be used for training nmt (Neural Machine Translation) systems
ParTy2OPUS converts documents from the ParTy corpus to the OPUS format
An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For instance useful for comparing a translation with the original text, to find differences and similarities between two different translations, or to see how a machine translation differs from a reference translation.
Odia wikipedia monolingual corpus extraction
🪱 PARASITE || A parallel sentence data preprocessing toolkit. Originally developed as a part of the `en-ru` winner submission of WMT20 Biomedical Translation Task.
Code to extract multilingual parallel corpus from Press Information Bureau (PIB) website.
Add a description, image, and links to the parallel-corpus topic page so that developers can more easily learn about it.
To associate your repository with the parallel-corpus topic, visit your repo's landing page and select "manage topics."