A browser-based real-time speech-to-text captioning system with minimal latency and high accuracy. Perfect for meetings, presentations, accessibility, and live events.
Try it out: Live Demo
- π€ Real-time microphone audio capture
- β‘ Low-latency live captions (<500ms)
- π― High accuracy transcription using Vosk
- π¨ Clean, accessible UI with customizable display
- π Transcript export (TXT, SRT, VTT)
- π Automatic reconnection and error handling
- π Multi-language support
- π₯ Speaker identification (planned)
[Browser Audio Capture]
β
βΌ (WebSocket audio stream)
[FastAPI Backend + Vosk]
β
βΌ (real-time captions)
[Live Caption Display]
- Python 3.8+ (3.12 recommended)
- Node.js 16+ (18+ recommended)
- Modern browser with WebRTC support (Chrome, Firefox, Safari, Edge)
- Microphone for audio input
- ~500MB free space for Vosk models
cd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
python main.py
cd frontend
npm install
npm start
- Open http://localhost:3000 in your browser
- Grant microphone permissions
- Start speaking - captions will appear in real-time
- Use the controls to adjust font size, contrast, and export transcripts
Import errors in IDE:
- Add
# type: ignore
comments to import statements - Configure your IDE to use the virtual environment Python interpreter
Audio not working:
- Ensure microphone permissions are granted
- Check browser console for WebRTC errors
- Try refreshing the page
Backend connection issues:
- Verify backend is running on port 8000
- Check firewall settings
- Ensure Vosk models are downloaded
Performance issues:
- Close other audio applications
- Use a wired microphone for better quality
- Check system resources
Create a .env
file in the backend directory:
# Backend Configuration
HOST=0.0.0.0
PORT=8000
LOG_LEVEL=info
# Vosk Configuration
VOSK_MODEL_PATH=./models/vosk-model-small-en-us-0.15
SAMPLE_RATE=16000
- Language Models: Download different Vosk models for other languages
- Audio Quality: Adjust sample rate and chunk size in
useTranscription.js
- UI Theme: Modify CSS variables in component stylesheets
βββ backend/ # FastAPI + Vosk transcription service
β βββ main.py # WebSocket server
β βββ transcription.py # Vosk integration
β βββ requirements.txt # Python dependencies
βββ frontend/ # React frontend
β βββ src/
β β βββ components/ # React components
β β βββ hooks/ # Custom hooks
β β βββ utils/ # Utilities
β βββ package.json
βββ docker-compose.yml # Full stack deployment
Endpoint | Method | Description |
---|---|---|
/ws/audio |
WebSocket | Real-time audio streaming and transcription |
/ws/control |
WebSocket | Control messages (language, settings) |
/health |
GET | Health check endpoint |
/models |
GET | Available transcription models |
Audio Streaming:
{
"type": "transcription",
"text": "Hello world",
"is_final": true,
"confidence": 0.95,
"timestamp": 1640995200
}
Control Messages:
{
"type": "change_language",
"language": "en"
}
- Latency: <500ms end-to-end
- Accuracy: 95%+ with clear speech
- Supported Languages: 20+ languages via Vosk models
- Concurrent Users: 10+ simultaneous connections
- Requires stable internet connection
- Audio quality affects transcription accuracy
- Background noise may impact performance
- Limited to browser-supported audio formats
docker-compose up -d
- Deploy backend to your preferred cloud provider
- Build and deploy frontend to a static hosting service
- Configure CORS and WebSocket proxy settings
- Audio is processed in real-time and not stored
- All communication uses secure WebSocket connections
- No personal data is collected or transmitted
- Optional local-only mode available
We welcome contributions! Please see our Contributing Guidelines for details.
- Fork and clone the repository
- Set up development environment:
# Backend cd backend python -m venv venv source venv/bin/activate pip install -r requirements.txt # Frontend cd frontend npm install
- Run development servers:
# Terminal 1: Backend cd backend && python main.py # Terminal 2: Frontend cd frontend && npm start
- Python: Follow PEP 8
- JavaScript: Use ESLint configuration
- Commit messages: Use conventional commits
This project is licensed under the MIT License - see the LICENSE file for details.
Copyright (c) 2024 Live Captioning System Contributors