Skip to content

Attention based English-Bodo Neural Machine Translation. Bodo is a scheduled Indian language with less NLP research.

Notifications You must be signed in to change notification settings

maharajbrahma/bodo-nmt-attention

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 

Repository files navigation

Attention based English-Bodo Neural Machine Translation

Introduction

English-Bodo (Eng-Brx) Neural Machine Translation despite having potential no prior research has been done. According to 2011 Census of India, Bodo has 14,57,547 native speakers and a total of 14,82,929 total speakers. During the initial stage of this work we searched for English-Bodo parallel corpus, to our surprise we found only one resource - Indian Language Technology Proliferation and Deployment Centre.

Dataset

Tourism corpus: English-Bodo parallel corpus of Tourism domain (20901 sentences) provided by the TDIL-DC

The detailed steps of cleaning and preprocessing is present in paper.

All experiment are performed using Tensorflow NMT Framework by Thang Luong, Eugene Brevdo, Rui Zhao.

Training

The training process is similar to that of Tensorflow NMT however for better handling of hyper-parameters and execution we made a shell script start.sh. The hyper-parameters could be changed in the start.sh file.

bash start.sh

or

chmod +x start.sh
./start.sh

The trained models are saved in the models/ directory.

Testing

For testing the trained model on test set execute out.sh.

  • Translating 2090 English sentences to Bodo sentences
bash out.sh

or

chmod +x out.sh
./out.sh
  • View the translated sentence
gedit output.brx

Terminal editor like nano does not render Bodo characters properly so it's better to view it in gedit or leafpad

  • Calculate BLEU score
perl multi-bleu.perl nmt_data/tst2013.brx < output.brx

Translating a sentence

  • Enter English sentence which you want to translate in test.en file
  • Change the models path in translate.sh
  • Generate translation [Eng->Brx]
bash translate.sh

or

chmod +x translate.sh
./translate.sh
  • See translated Bodo sentence
gedit out.brx

About

Attention based English-Bodo Neural Machine Translation. Bodo is a scheduled Indian language with less NLP research.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published