Production-ready FastAPI ML API with GPU acceleration for OCR, Translation, and Sentiment Analysis
This project demonstrates how to deploy machine learning models in production using FastAPI with full GPU support, Docker containerization, and PostgreSQL database integration.
- 🔥 GPU Acceleration: Full NVIDIA CUDA support for ML inference
- 📝 OCR Service: Text recognition from images using TrOCR
- 🌍 Translation API: Bidirectional EN ↔ RU translation
- 😊 Sentiment Analysis: Russian text sentiment classification
- 🐳 Docker Ready: Production and development containers
- 📊 Database Integration: PostgreSQL with Tortoise ORM
- ⚡ High Performance: Optimized for production workloads
- 📚 Auto Documentation: Swagger UI and ReDoc included
- Backend: FastAPI + Uvicorn
- ML Framework: Transformers + PyTorch
- Database: PostgreSQL + Tortoise ORM
- Cache: Redis
- Containerization: Docker + Docker Compose
- GPU: NVIDIA CUDA support
- Docker and Docker Compose
- NVIDIA GPU with CUDA support (for GPU acceleration)
- NVIDIA Container Runtime
git clone https://github.com/yourusername/fastapi-ml-gpu
cd fastapi-ml-gpu
Create .env
file:
# Database
POSTGRES_USER=postgres
POSTGRES_PASSWORD=your_secure_password
POSTGRES_DB=ml_api
POSTGRES_HOST=postgres
POSTGRES_PORT=5432
# Redis
REDIS_HOST=redis
REDIS_PORT=6379
# Application
ENV=production
APP_HOST=0.0.0.0
APP_PORT=8000
# Production deployment with GPU
docker-compose up -d
# Development mode
docker-compose -f docker-compose.local.yml up -d
# Check GPU utilization
docker exec -it <container_name> nvidia-smi
# Test API health
curl http://localhost:8000/api/v1/ml/health
Once deployed, access interactive documentation:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
# Extract text from image
curl -X POST "http://localhost:8000/api/v1/ml/ocr" \
-H "Content-Type: multipart/form-data" \
-F "file=@image.jpg"
Response:
{
"text": "Recognized text from image",
"processing_time_ms": 245.3,
"device": "cuda"
}
# Translate English to Russian
curl -X POST "http://localhost:8000/api/v1/ml/translate" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello world",
"source_lang": "en",
"target_lang": "ru"
}'
Response:
{
"original_text": "Hello world",
"translated_text": "Привет мир",
"source_lang": "en",
"target_lang": "ru",
"processing_time_ms": 156.7
}
# Analyze text sentiment
curl -X POST "http://localhost:8000/api/v1/ml/sentiment" \
-H "Content-Type: application/json" \
-d '{
"text": "Этот продукт просто великолепен!"
}'
Response:
{
"text": "Этот продукт просто великолепен!",
"sentiment": "POSITIVE",
"score": 0.9834,
"processing_time_ms": 89.2
}
# Extract text and translate in one request
curl -X POST "http://localhost:8000/api/v1/ml/ocr-translate" \
-H "Content-Type: multipart/form-data" \
-F "file=@english_image.jpg" \
-F "target_lang=ru"
├── app/
│ ├── api/ # API routes
│ ├── config/ # Configuration
│ ├── db/ # Database models
│ ├── server/ # FastAPI application
│ └── services/
│ └── ml_service.py # ML models service
├── docker/
│ └── Dockerfile # Multi-stage Docker build
├── docker-compose.yml # Production deployment
├── docker-compose.local.yml # Development setup
└── pyproject.toml # Dependencies
Variable | Description | Default |
---|---|---|
ENV |
Environment mode | development |
APP_HOST |
Application host | 0.0.0.0 |
APP_PORT |
Application port | 8000 |
POSTGRES_HOST |
Database host | localhost |
POSTGRES_DB |
Database name | postgres |
REDIS_HOST |
Redis host | localhost |
The project automatically detects and uses NVIDIA GPUs when available. GPU memory usage is optimized for production workloads.
# Automatic GPU detection
device = "cuda" if torch.cuda.is_available() else "cpu"
# Install dependencies
poetry install
# Run development server
poetry run python app/main.py
# Run with auto-reload
uvicorn app.server.server:app --reload --host 0.0.0.0 --port 8000
# Run all tests
poetry run pytest
# Run with coverage
poetry run pytest --cov=app
# Initialize migrations
aerich init-db
# Create migration
aerich migrate
# Apply migrations
aerich upgrade
# docker-compose.yml
services:
server:
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu, compute, utility]
⭐ Star this repository if it helped you deploy ML models with FastAPI and GPU support!