This project is focused on word to to word transliteration from English to Indic languages and vice-versa. . This was done using seq2seq architecture using LSTM and GRU and LSTM with Bahdanau Attention mechanism.
- We gathered the dataset(text) from Internet. You can go the input folder to get the dataset.
- We have written code for seq2seq models using LSTM, GRU and LSTM with attention which can be found in src folder.
- We then trained 8 models(4 for Eng-Indic & 4 for Indic-Eng) for each architecture here. The models can be found in models folder.
- We have then hosted the webapp in streamlit. Link for the website to try it out.
├── input
│ └── *.xml - datasets for English to Indic Languages.
│
│
├── models
│ └── * - 8 models for 3 different architecture(LSTM, GRU and LSTM_attn).
│
│
├── src
│ └── config.py - configuration for different languages
│ └── dataset.py - dataset generator
│ └── language_preprocessing.py - preprocessing text
│ └── train.py - to train different models
│ └── webapp.py- streamlit webapp
│ └── gru.py- gru model
│ └── lstm.py- lstm model
│ └── lstm_attention.py- lstm_attn model
This can be extended to transformer based models as well. Currently working on it.
Any kind of enhancement or contribution is welcomed.