This project focuses on developing machine translation models that translate multiple languages into English. We explored various architectures and approaches for this purpose, starting with encoder-decoder models with attention, experimenting with transformers from scratch, and finally leveraging a pretrained model for language detection and translation. The deployment of different models was carried out using FastAPI and Streamlit.
-
Custom Model Development:
- Encoder-Decoder Architecture:
We experimented with building custom models using encoder-decoder architectures. - Attention Mechanism:
We incorporated attention mechanisms to improve the model's ability to focus on relevant parts of the input sequence during translation. - Languages:
We focused on Italian, Arabic, Japanese, and French for our initial experiments.
- Encoder-Decoder Architecture:
-
Transformer-Based Model:
- We developed a transformer-based model specifically for translating Portuguese into English. This required handling tokenization, positional encodings, and multi-head self-attention mechanisms manually.
-
Pretrained Model (m2m100_418M):
- For the final phase, we switched to the facebook/m2m100_418M pretrained model, which can handle language detection and translation across multiple languages.
- Language Detection:
We utilized the pretrained m2m100_418M model to detect the source language of the input text. - Translation:
Once the language was identified, we employed the model to translate the text into English. - Streamlit Deployment:
We deployed this solution using Streamlit for a user-friendly web interface.
- Used for the initial encoder-decoder and transformer models.
- Lightweight APIs exposed endpoints for translation.
- Supported input in Italian, Arabic, Japanese, French, and Portuguese.
- Used for the facebook/m2m100_418M model.
- Interactive web interface for users to input text in any language and view translations.
- Language detection and translation handled automatically.
- Python 3.8+
- Pip package manager
- Git
git clone https://github.com/yourusername/Multilingual_Machine_Translation.git
cd Multilingual_Machine_Translation
pip install -r requirements.txt
uvicorn main:app --reload
streamlit run translator_app.py
-
FastAPI:
-
Run the FastAPI server using the
uvicorn
command. -
Access the API via:
http://127.0.0.1:8000/translate
-
Example Request:
{ "text": "Olá, como vai?" }
-
-
Streamlit:
- Run the Streamlit app with the command above.
- Enter text in the input box, and the translation will appear along with detected language information.
This project is licensed under the MIT License - see the LICENSE file for details.
This markdown version maintains the structure and formatting of the original content. You can copy and paste it directly into your `README.md` file.