Skip to content

This repository contains dataset for english to gujarati translation

Notifications You must be signed in to change notification settings

shahparth123/eng_guj_parallel_corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

eng_guj_parallel_corpus

The repository contains 65k corpuses translated from Gujarati to English language.

The seperator used is '\n'. User can do some extra stuff to change the seperation, according to the need of the expected sulution.

About Dataset

Dataset is developed at the Language Processing Laboratory, Uka Tarsadia University, Gujarat, India. It was part of ongoing research on Natural Lanugage Processing and Machine Translation. This dataset contains around 65000 english sentiences from MSCOCO captioning dataset that are translated to Gujarati and converted to parallel format.

Citation

P. Shah and V. Bakrola, "Neural Machine Translation System of Indic Languages - An Attention based Approach," 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), Gangtok, India, 2019, pp. 1-5, doi: 10.1109/ICACCP.2019.8882969. IEEE Xlpore arXiv

About

This repository contains dataset for english to gujarati translation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published