An API service for transcribing audio in English and Kinyarwanda, with bidirectional translation and caption generation capabilities.
- Multilingual Transcription: Support for English and Kinyarwanda audio files
- Multiple Transcription Models:
- OpenAI Whisper (English)
- Whisper-Kinyarwanda (specialized for Kinyarwanda)
- NeMo (high accuracy for Kinyarwanda)
- Bidirectional Translation:
- Kinyarwanda → English
- English → Kinyarwanda
- Caption Generation:
- SRT captions for video subtitling
- WebVTT captions for web videos
- Bilingual caption support
- Full Processing Pipeline:
- Comprehensive transcription → translation → caption workflow
- Asynchronous processing with job status tracking
- Smart chunking for better timestamps
- CPU: 4 cores (8 threads)
- RAM: 8GB
- Storage: 50GB SSD
- NVIDIA GPU with 8GB+ VRAM (for optimal performance)
- CPU: 8+ cores
- RAM: 16GB+
- Storage: 100GB+ SSD
- NVIDIA GPU with 16GB+ VRAM (T4, RTX 3080, or better)
- High-speed internet connection
- Ubuntu 20.04 LTS or later
- Python 3.8+ (3.9 recommended)
- NVIDIA drivers and CUDA (for GPU acceleration)
- Clone the repository:
git clone https://github.com/yuyunfrancis/asr_api_kin_to_en.git
cd asr_api_kin_to_en
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Make sure you have nemo toolkit installed
sudo apt-get install build-essential python3-dev python3-pip
pip install "nemo_toolkit[all]"
To use nemo, make sure to login as it is gated
huggingface-cli login
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables by creating a
.env
file:
# API Settings
DEBUG=True
# Model Settings
MODELS_CACHE_DIR=./models_cache
HUGGING_FACE_TOKEN=your_huggingface_token_here
# File Storage
TEMP_FILE_DIR=./temp_files
TEMP_FILE_EXPIRY=24 # hours
# Resource Management
USE_GPU=True
MAX_AUDIO_SIZE=52428800 # 50MB
- Create necessary directories:
mkdir -p models_cache temp_files
- Docker and Docker Compose
- NVIDIA Container Toolkit (for GPU support)
- Install Redis server
sudo apt update
sudo apt install redis-server
sudo systemctl start redis
redis-cli ping
sudo systemctl start redis
Start a Celery worker in a separate terminal:
cd your_project_directory
# Activate your virtual environment if you're using one
source venv/bin/activate # On Windows: venv\Scripts\activate
# Start Celery worker
celery -A app.workers.celery_app worker --loglevel=info
- Clone the repository:
git clone https://github.com/yuyunfrancis/asr_api_kin_to_en.git
cd asr_api_kin_to_en
- Create necessary directories:
mkdir -p models_cache temp_files
-
Create a
.env
file with your configuration (see example above) -
Create or modify
docker-compose.yml
:
version: "3.8"
services:
api:
build: .
restart: always
ports:
- "8000:8000"
volumes:
- ./models_cache:/app/models_cache
- ./temp_files:/app/temp_files
environment:
- HUGGING_FACE_TOKEN=${HUGGING_FACE_TOKEN}
- MODELS_CACHE_DIR=/app/models_cache
- TEMP_FILE_DIR=/app/temp_files
- USE_GPU=True
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
redis:
image: redis:6.2-alpine
restart: always
ports:
- "6379:6379"
volumes:
- redis-data:/data
nginx:
image: nginx:1.21-alpine
restart: always
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/conf.d/default.conf
depends_on:
- api
volumes:
redis-data:
- Create an
nginx.conf
file:
server {
listen 80;
server_name _;
location / {
proxy_pass http://api:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
client_max_body_size 50M;
proxy_read_timeout 300s;
}
}
To run the application directly with Uvicorn:
# Activate virtual environment if not already activated
source venv/bin/activate # On Windows: venv\Scripts\activate
# Run with Uvicorn
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
This command:
- Starts the FastAPI application
- Enables hot reloading for development
- Makes the API accessible from other devices on the network
- Uses port 8000
To run the application using Docker Compose:
# Build and start the containers
docker-compose up -d
# View logs
docker-compose logs -f
The API will be accessible at http://localhost:8000
or http://localhost
if using the Nginx configuration.
Once the API is running, you can access the interactive API documentation:
- Swagger UI:
http://localhost:8000/docs
- ReDoc:
http://localhost:8000/redoc
Endpoint | Method | Description |
---|---|---|
/api/process-audio |
POST | Complete pipeline: transcription → translation → captions |
/api/transcribe |
POST | Audio transcription only |
/api/translate |
POST | Text translation between languages |
/api/generate-captions |
POST | Generate captions from transcription results |
/api/jobs/{job_id} |
GET | Check the status of a job |
Using curl:
curl -X POST "http://localhost:8000/api/process-audio" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@/path/to/your/audio.mp3" \
-F "source_language=english" \
-F "transcription_model=whisper" \
-F "translation_options={\"enabled\":true,\"target_language\":\"kinyarwanda\"}" \
-F "caption_options={\"enabled\":true,\"formats\":[\"srt\",\"vtt\"]}"
Using JavaScript fetch:
async function processAudio(audioFile) {
const formData = new FormData();
formData.append("file", audioFile);
formData.append("source_language", "english");
formData.append("transcription_model", "whisper");
formData.append(
"translation_options",
JSON.stringify({
enabled: true,
target_language: "kinyarwanda",
})
);
formData.append(
"caption_options",
JSON.stringify({
enabled: true,
formats: ["srt", "vtt"],
})
);
const response = await fetch("http://localhost:8000/api/process-audio", {
method: "POST",
body: formData,
});
const job = await response.json();
return job.job_id; // Use this to check job status
}
Using curl:
curl -X GET "http://localhost:8000/api/jobs/YOUR_JOB_ID" -H "accept: application/json"
Using JavaScript fetch:
async function checkJobStatus(jobId) {
const response = await fetch(`http://localhost:8000/api/jobs/${jobId}`);
return await response.json();
}
Sample parameters to test
{"chunk_size": 10, "overlap_size": 5, "use_mfa": false}
{"enabled": true, "target_language": "kinyarwanda"}
{"enabled": true, "formats": ["srt", "vtt"]}
For faster initial response times, you can "warm up" the models by making an initial request for each language you plan to support before heavy usage.
If you encounter GPU memory issues:
- Adjust the batch size and chunk parameters in your requests
- Consider using a GPU with more VRAM
- Add more worker processes for processing multiple files concurrently
The API automatically cleans up temporary files, but for high-volume deployments you may want to:
- Mount a larger volume for the temp_files directory
- Adjust the TEMP_FILE_EXPIRY setting in .env
- Set up a cron job to periodically clean old files
-
Model loading errors:
- Ensure your HUGGING_FACE_TOKEN is valid
- Check if you have internet access to download models
- Verify there's enough disk space in the models_cache directory
-
GPU not detected:
- Check NVIDIA drivers with
nvidia-smi
- For Docker: ensure nvidia-container-toolkit is properly configured
- Try setting USE_GPU=False to use CPU mode
- Check NVIDIA drivers with
-
Empty caption files:
- This is a known issue with Kinyarwanda → English translation
- Check the chunk structure in the translation process
- Ensure both languages have the correct field names
- Direct installation: Check console output
- Docker installation:
docker-compose logs -f api
For direct installation:
git pull
source venv/bin/activate
pip install -r requirements.txt
For Docker installation:
git pull
docker-compose down
docker-compose up -d --build
The API is designed to support additional languages by:
- Adding new models in the
app/config.py
file - Updating the
LANGUAGE_CODES
mapping - Implementing language-specific processing in the services
- MbazaNLP for the Kinyarwanda models
- OpenAI for the Whisper model
- NVIDIA for the NeMo toolkit
- Hugging Face for model hosting and transformers library
For support or inquiries, please contact [francisberi04@gmail.com].