An intelligent system for conducting workshops with real-time speech transcription, AI-powered card generation, and interactive canvas for content organization.
- 🎯 Real-time transcription with Google Speech-to-Text or Vosk (offline)
- 🤖 AI card generation with OpenAI GPT or Google Gemini
- 📊 Interactive canvas with drag-and-drop cards
- 💾 Persistent session storage in Redis
- 🐳 Container-native deployment with Docker
- 🌐 Production-ready with Traefik and Let's Encrypt
- Quick Start
- System Requirements
- Getting Access Keys
- Installation and Setup
- Running Modes
- Configuration
- Architecture
- Management Commands
- Troubleshooting
IMPORTANT: Get your API keys first before setup! See Getting Access Keys section below.
# 1. Clone the repository
git clone https://github.com/Real-AI-Engineering/vox-canvas.git
cd vox-canvas
# 2. Generate Python dependencies lock file
cd backend && uv lock && cd ..
# 3. Initial setup (creates .env.docker from example)
make setup
# 4. Configure your API keys in .env.docker
nano .env.docker
# 5. Place Google credentials file (if using Google STT)
cp ~/Downloads/service-account-key.json backend/google-credentials.json
# 6. Start in development mode
make devDone! Open http://localhost:5173 in your browser.
- Docker 24.0+ and Docker Compose 2.0+
- Make for management commands
- Git for repository cloning
- 4 GB RAM minimum (8 GB recommended)
- Modern browser with WebRTC support (Chrome, Firefox, Safari)
- Go to Google Cloud Console
- Create a new project or select an existing one
- Enable Speech-to-Text API:
- Navigate to "APIs & Services" → "Library"
- Search for "Cloud Speech-to-Text API"
- Click "Enable"
- Go to "IAM & Admin" → "Service Accounts"
- Click "Create Service Account"
- Set name:
vox-canvas-stt - Assign role: "Cloud Speech Client"
- Create and download JSON key
# Place JSON key in the project
cp ~/Downloads/service-account-key.json backend/google-credentials.json
# Ensure the file won't be committed to git
echo "backend/google-credentials.json" >> .gitignore- Register at OpenAI Platform
- Go to "API Keys" → "Create new secret key"
- Copy the key (starts with
sk-...)
# Open configuration file
nano .env.docker
# Find and replace:
OPENAI_API_KEY=your-openai-api-key-here
# With your actual key:
OPENAI_API_KEY=sk-1234567890abcdef...
# Change card mode:
VOX_CARD_MODE=openai- Go to Google AI Studio
- Click "Create API Key"
- Copy the key
nano .env.docker
# Configure Gemini:
GEMINI_API_KEY=your-gemini-api-key-here
VOX_CARD_MODE=gemini# Clone the repository
git clone https://github.com/Real-AI-Engineering/vox-canvas.git
cd vox-canvas
# Verify Docker is running
docker --version
docker compose version# Generate Python dependencies lock file
cd backend && uv lock && cd ..# Create configuration file from example
make setup
# Open file for editing
nano .env.dockerRequired settings for full functionality:
# STT Configuration
VOX_STT_MODE=google # or vosk for offline
VOX_LANGUAGE=ru-RU # recognition language
# Card Generation
VOX_CARD_MODE=openai # openai, gemini, or stub
OPENAI_API_KEY=sk-your-key-here # if using OpenAI
GEMINI_API_KEY=your-key-here # if using Gemini
# Redis (security)
REDIS_PASSWORD=your-secure-password-123
# Domains (for production)
DOMAIN=your-domain.com
LETSENCRYPT_EMAIL=admin@your-domain.com# Place Google JSON credentials file
cp /path/to/your/service-account.json backend/google-credentials.json
# Verify the file is in place
ls -la backend/google-credentials.json# Start in development mode
make dev
# Or directly in production
make upmake dev- 🔧 Hot reload for frontend
- 📊 Extended logs and debugging
- 🌐 Access: http://localhost:5173
- 🔧 API: http://localhost:8000
make up- 🚀 Optimized builds
- 🌐 Access via domains (configure DNS)
- 🔒 HTTPS with Let's Encrypt
- 📈 Monitoring and metrics
cd backend
uv sync --all-extras
uv run uvicorn app.main:app --reload --factorycd frontend
pnpm install
pnpm dev# === SPEECH RECOGNITION ===
VOX_STT_MODE=google # google, vosk, stub
VOX_LANGUAGE=ru-RU # recognition language
VOX_GOOGLE_SAMPLE_RATE=48000 # sampling rate
# === CARD GENERATION ===
VOX_CARD_MODE=openai # openai, gemini, stub
OPENAI_API_KEY=sk-... # OpenAI API key
GEMINI_API_KEY=... # Google Gemini key
VOX_OPENAI_MODEL=gpt-4o-mini # OpenAI model
# === SECURITY ===
REDIS_PASSWORD=secure-password-123
CORS_ORIGINS=http://localhost:5173,http://vox.local
# === DOMAINS (production) ===
DOMAIN=vox.your-domain.com
API_DOMAIN=api.vox.your-domain.com
LETSENCRYPT_EMAIL=admin@your-domain.com
# === PERFORMANCE ===
REDIS_URL=redis://:password@redis:6379/0
RATE_LIMIT=100 # requests per minute| Mode | Description | Requirements |
|---|---|---|
google |
Google Speech-to-Text API | Google Cloud credentials + internet |
vosk |
Offline recognition | Downloaded Vosk model |
stub |
Testing simulation | None |
| Mode | Description | Requirements |
|---|---|---|
openai |
OpenAI GPT models | OpenAI API key |
gemini |
Google Gemini | Google AI API key |
stub |
Testing stubs | None |
┌─────────────────────────────────────────────────────────────────┐
│ User │
└─────────────────────┬───────────────────────────────────────────┘
│
┌─────────────────────▼───────────────────────────────────────────┐
│ Traefik │
│ (Reverse Proxy + Load Balancer) │
└─────────────┬─────────────────────────┬─────────────────────────┘
│ │
┌─────────────▼─────────────┐ ┌─────────▼─────────────────────────┐
│ Frontend (Caddy) │ │ Backend (FastAPI) │
│ │ │ │
│ • React + TypeScript │ │ • WebSocket for audio │
│ • Real-time UI │ │ • Speech-to-Text │
│ • Canvas with cards │ │ • AI card generation │
│ • WebSocket client │ │ • REST API │
└───────────────────────────┘ └─────────┬─────────────────────────┘
│
┌─────────────────────────▼─────────────────────────┐
│ Redis │
│ │
│ • Sessions and state │
│ • Transcription cache │
│ • Persistent storage │
└───────────────────────────────────────────────────┘
External APIs:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Google STT │ │ OpenAI GPT │ │ Google Gemini │
│ │ │ │ │ │
│ • Real-time │ │ • Card │ │ • Card │
│ speech │ │ generation │ │ generation │
│ recognition │ │ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
- Technologies: React 19, TypeScript, Tailwind CSS 4
- Functions: Canvas with cards, real-time transcription, session management
- Port: 5173 (dev) / 80,443 (prod)
- Technologies: Python 3.12, FastAPI, WebSocket, structlog
- Functions: STT integration, AI generation, sessions, API
- Port: 8000
- Functions: Load balancing, HTTPS termination, automatic certificates
- Port: 80, 443, 8080 (dashboard)
- Functions: Sessions, cache, persistent data
- Port: 6379
# Service Management
make setup # Initial setup
make dev # Start in development mode
make up # Start in production
make down # Stop all services
make restart # Restart services
# Monitoring and Debugging
make logs # View all logs
make logs-backend # Backend logs only
make logs-frontend # Frontend logs only
make health # Check service status
# Data Management
make backup # Redis backup
make restore BACKUP_FILE=backup.rdb # Restore
make clean # Clean temporary files
# Development
make lint # Code linting (backend)
make test # Run tests (backend)
make check # Lint + tests# Image Rebuilding
make build # Rebuild all images
make build-frontend # Frontend only
make build-backend # Backend only
# Container Debugging
make shell-backend # Connect to backend container
make shell-frontend # Connect to frontend container
make shell-redis # Connect to Redis
# Resource Monitoring
make stats # Resource usage statistics
make top # Processes in containers- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- Traefik Dashboard: http://localhost:8080
- Redis: localhost:6379
- Application: http://vox.local (or your domain)
- API: http://api.vox.local
- Traefik Dashboard: http://traefik.vox.local
# Add to /etc/hosts for local production testing
echo "127.0.0.1 vox.local api.vox.local traefik.vox.local" | sudo tee -a /etc/hosts# Check that ports are free
lsof -i :5432 -i :6379 -i :8000 -i :5173
# Stop conflicting services
docker stop $(docker ps -q)
make down# Check Docker status
docker system info
# Clean unused resources
docker system prune -f
# Recreate network
docker network prune -f
make down && make up- Check browser permissions
- Ensure using HTTPS or localhost
- Check system microphone settings
# Check STT logs
make logs-backend | grep stt
# Check settings in .env.docker
grep VOX_STT .env.docker
grep GOOGLE .env.docker# Check CORS settings in .env.docker
grep CORS_ORIGINS .env.docker
# Restart backend
docker restart vox-backend# Check API key
grep OPENAI_API_KEY .env.docker
# Check balance on OpenAI Platform
# https://platform.openai.com/usage# Check Gemini API key
grep GEMINI_API_KEY .env.docker
# Switch to stub mode for testing
sed -i 's/VOX_CARD_MODE=.*/VOX_CARD_MODE=stub/' .env.docker
make restart# Check status of all services
make health
# Detailed logs with filtering
make logs | grep ERROR
make logs-backend | grep -i "websocket"
make logs-frontend | grep -i "cors"
# Check configuration
cat .env.docker | grep -v "^#" | grep -v "^$"
# Check API functionality
curl http://localhost:8000/api/status
# Check Redis
docker exec -it vox-redis redis-cli ping- Docker Setup - Detailed Docker documentation
- Frontend README - Frontend documentation
- Backend README - Backend documentation
- Google Speech Setup - Google STT setup
If you encounter issues:
- Check Troubleshooting
- Review logs:
make logs - Create an Issue in the repository with:
- Problem description
- Logs (
make logs > logs.txt) - Configuration (without secret keys)
- Docker version (
docker --version)
MIT License - see LICENSE file
Made with ❤️ to improve workshops and presentations