This project is a Multilingual Speech Translation application that transcribes audio input using OpenAI Whisper and translates the transcription into a selected target language using Facebook's MBart-50 model. The application is built using Gradio, Hugging Face Transformers, and LangChain.
- Speech-to-Text (ASR): Converts spoken audio to text using OpenAI Whisper.
- Language Detection: Automatically detects the spoken language.
- Text Translation: Translates detected text into a specified target language using MBart-50.
- Gradio UI: Provides a user-friendly web interface for audio input and translation.
- Python
- OpenAI Whisper (Automatic Speech Recognition)
- Facebook MBart-50 (Machine Translation)
- LangChain (LLM pipeline framework)
- Pydantic v2 (Data validation & serialization)
- Gradio (Web UI for easy interaction)
- Hugging Face API (For deploying models)
.
├── app.py # Main application file
├── requirements.txt # Python dependencies
├── working_field.ipynb # Working details
├── README.md # Documentation
git clone https://github.com/darkangrycoder/multilingual-speech-translation.git
cd multilingual-speech-translationpython -m venv venv
source venv/bin/activate # For Linux/macOS
venv\Scripts\activate # For Windows
pip install -r requirements.txtpython app.py- Upload an audio file in the Gradio UI.
- Select the Target Language from the dropdown.
- Click Submit to transcribe and translate the audio.
- View the Transcription, Detected Language, and Translation results.
- Push your project to Hugging Face Hub:
huggingface-cli login git push https://huggingface.co/spaces/tdnathmlenthusiast/multilingual_transcription
- Hugging Face will automatically install dependencies from requirements.txt and deploy the app.
docker build -t multilingual-translation .
docker run -p 7860:7860 multilingual-translation- If you face dependency issues on Hugging Face Spaces, make sure you are using Pydantic v2 instead of
root_validator(deprecated in v2). - If Whisper ASR is slow, try using
whisper.load_model("tiny")instead ofbaseorlargemodels. - For GPU acceleration, use
torchwith CUDA.
This project is licensed under the MIT License.
Contributions are welcome! Feel free to fork the repository, make changes, and submit a pull request.
For any inquiries, reach out to [debnathtirtha391@gmail.com]