Skip to content

RomanTuras/fastapi-gpu-docker-deploy

 
 

Repository files navigation

🚀 FastAPI ML Project with GPU Support

Production-ready FastAPI ML API with GPU acceleration for OCR, Translation, and Sentiment Analysis

This project demonstrates how to deploy machine learning models in production using FastAPI with full GPU support, Docker containerization, and PostgreSQL database integration.

🎯 Key Features

  • 🔥 GPU Acceleration: Full NVIDIA CUDA support for ML inference
  • 📝 OCR Service: Text recognition from images using TrOCR
  • 🌍 Translation API: Bidirectional EN ↔ RU translation
  • 😊 Sentiment Analysis: Russian text sentiment classification
  • 🐳 Docker Ready: Production and development containers
  • 📊 Database Integration: PostgreSQL with Tortoise ORM
  • ⚡ High Performance: Optimized for production workloads
  • 📚 Auto Documentation: Swagger UI and ReDoc included

🛠 Tech Stack

  • Backend: FastAPI + Uvicorn
  • ML Framework: Transformers + PyTorch
  • Database: PostgreSQL + Tortoise ORM
  • Cache: Redis
  • Containerization: Docker + Docker Compose
  • GPU: NVIDIA CUDA support

🚀 Quick Start

Prerequisites

  • Docker and Docker Compose
  • NVIDIA GPU with CUDA support (for GPU acceleration)
  • NVIDIA Container Runtime

1. Clone Repository

git clone https://github.com/yourusername/fastapi-ml-gpu
cd fastapi-ml-gpu

2. Environment Setup

Create .env file:

# Database
POSTGRES_USER=postgres
POSTGRES_PASSWORD=your_secure_password
POSTGRES_DB=ml_api
POSTGRES_HOST=postgres
POSTGRES_PORT=5432

# Redis
REDIS_HOST=redis
REDIS_PORT=6379

# Application
ENV=production
APP_HOST=0.0.0.0
APP_PORT=8000

3. Deploy with GPU Support

# Production deployment with GPU
docker-compose up -d

# Development mode
docker-compose -f docker-compose.local.yml up -d

4. Verify GPU Access

# Check GPU utilization
docker exec -it <container_name> nvidia-smi

# Test API health
curl http://localhost:8000/api/v1/ml/health

📖 API Documentation

Once deployed, access interactive documentation:

🔧 API Endpoints

OCR - Text Recognition

# Extract text from image
curl -X POST "http://localhost:8000/api/v1/ml/ocr" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@image.jpg"

Response:

{
  "text": "Recognized text from image",
  "processing_time_ms": 245.3,
  "device": "cuda"
}

Translation Service

# Translate English to Russian
curl -X POST "http://localhost:8000/api/v1/ml/translate" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello world",
    "source_lang": "en",
    "target_lang": "ru"
  }'

Response:

{
  "original_text": "Hello world",
  "translated_text": "Привет мир",
  "source_lang": "en",
  "target_lang": "ru",
  "processing_time_ms": 156.7
}

Sentiment Analysis

# Analyze text sentiment
curl -X POST "http://localhost:8000/api/v1/ml/sentiment" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Этот продукт просто великолепен!"
  }'

Response:

{
  "text": "Этот продукт просто великолепен!",
  "sentiment": "POSITIVE",
  "score": 0.9834,
  "processing_time_ms": 89.2
}

Combined OCR + Translation

# Extract text and translate in one request
curl -X POST "http://localhost:8000/api/v1/ml/ocr-translate" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@english_image.jpg" \
  -F "target_lang=ru"

🏗 Project Structure

├── app/
│   ├── api/                    # API routes
│   ├── config/                 # Configuration
│   ├── db/                     # Database models
│   ├── server/                 # FastAPI application
│   └── services/
│       └── ml_service.py       # ML models service
├── docker/
│   └── Dockerfile             # Multi-stage Docker build
├── docker-compose.yml         # Production deployment
├── docker-compose.local.yml   # Development setup
└── pyproject.toml            # Dependencies

🎛 Configuration

Environment Variables

Variable Description Default
ENV Environment mode development
APP_HOST Application host 0.0.0.0
APP_PORT Application port 8000
POSTGRES_HOST Database host localhost
POSTGRES_DB Database name postgres
REDIS_HOST Redis host localhost

GPU Configuration

The project automatically detects and uses NVIDIA GPUs when available. GPU memory usage is optimized for production workloads.

# Automatic GPU detection
device = "cuda" if torch.cuda.is_available() else "cpu"

🧪 Development

Local Development Setup

# Install dependencies
poetry install

# Run development server
poetry run python app/main.py

# Run with auto-reload
uvicorn app.server.server:app --reload --host 0.0.0.0 --port 8000

Running Tests

# Run all tests
poetry run pytest

# Run with coverage
poetry run pytest --cov=app

Database Migrations

# Initialize migrations
aerich init-db

# Create migration
aerich migrate

# Apply migrations
aerich upgrade

🐳 Docker Deployment

Production Deployment

# docker-compose.yml
services:
  server:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu, compute, utility]

⭐ Star this repository if it helped you deploy ML models with FastAPI and GPU support!

About

Repository to show how to deploy and develop fastapi application for ML models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 88.5%
  • Makefile 7.1%
  • Dockerfile 4.4%