Visual Speech Recognition - Read lips, transcribe speech, all locally
A beautiful, open-source tool that reads your lips in real-time and transcribes silently mouthed speech using local ML models.
Quick Start • Contributing • Privacy • Security • Documentation • License
Chaplin-UI is a gentle, privacy-focused tool that reads lips and turns them into text. Simply record yourself speaking (or upload a video), and watch as your words appear on screen—all without making a sound. This project is based on Chaplin by Amanvir Parhar, with added web interface and UI improvements. The VSR model achieves 19.1% word error rate (WER) on LRS3. Perfect for:
- 🎤 Silent communication - Type without speaking, or transcribe existing videos
- 🔒 Privacy-first - Everything runs locally on your machine (Privacy Policy)
- 🌐 Web-based - Works in any modern browser—no installation needed
- 🎨 Beautiful UI - Clean, calming design that adapts to your system theme
I built this after a week of laryngitis—when I couldn't speak, I needed a way to communicate. If you've ever wanted to say something without making a sound, Chaplin-UI might help:
- Public places — Libraries, offices, late-night calls, or anywhere you want to stay quiet
- Deaf and hard-of-hearing — Mouth words to communicate when sign language isn't shared
- Medical conditions — ALS, aphonia, cerebral palsy, laryngectomy, Parkinson's, vocal cord paralysis, selective mutism
- Temporary voice loss — Laryngitis, recovery from throat surgery, or vocal strain
- Privacy — Situations where you'd rather not speak aloud but still need to get your words out
Apple just acquired a silent-speech company (Q.ai) for $2 billion—this space matters, and open-source tools like this keep the technology accessible.
- Python 3.12+ (check with
python3 --version) - LLM local server – Ollama or LM Studio (the app finds one you have running):
- Modern web browser with camera access
-
Clone the repository:
git clone https://github.com/loganngarcia/chaplin-ui.git cd chaplin-ui -
Set up Python environment:
python3 -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate pip install -r requirements.txt
-
Download model files:
./setup.sh
This downloads the VSR model from Hugging Face (~500MB).
-
Start your LLM server (pick one: Ollama or LM Studio):
Option A – Ollama
ollama serve # usually starts automatically ollama pull llama3.2 # or mistral, llama2, etc.
Option B – LM Studio
- Open LM Studio
- Load a model (we recommend
zai-org/glm-4.6v-flash) - Go to Developer tab → Enable Local Server (port 1234)
-
Run the web app:
./run_web.sh
The UI opens in ~1 second at http://localhost:8000. The model loads in the background (~30–60 sec first time); you can use the interface right away—buttons enable when ready.
-
Start transcribing:
- Record live: Click "Start recording" to capture video from your camera
- Upload a video: Click "Upload video" to transcribe an existing video file
- Your transcription appears in both raw and corrected formats. (You can change the LLM in settings if needed.)
That's it! The app handles everything else for you.
chaplin-ui/
├── chaplin_ui/ # Core shared modules
│ └── core/ # Shared utilities, models, configs
│ ├── models.py # Pydantic data models
│ ├── constants.py # All configuration constants
│ ├── llm_client.py # LLM API wrapper
│ ├── video_processor.py # Video processing utilities
│ └── ...
├── web/ # Web app frontend
│ ├── index.html # Main HTML
│ ├── style.css # Styles (Apple HIG)
│ └── app.js # Frontend logic
├── web_app.py # FastAPI backend server
├── chaplin.py # CLI implementation
├── main.py # CLI entry point
└── pipelines/ # VSR model pipeline
Simply put, Chaplin-UI watches how your lips move and turns that into text. Here's what happens behind the scenes:
- You provide video - Either record yourself speaking or upload an existing video file
- Face detection - The app finds and tracks your face in the video
- Lip reading - A trained model watches your lip movements and creates initial text
- Text refinement - An AI language model cleans up the text, adds punctuation, and fixes any mistakes
- You see results - Both the raw transcription and the polished version appear on screen
You can copy the corrected text with one click, or review the raw output to see what the lip-reading model detected.
Chaplin-UI supports two local LLM backends. Both use OpenAI-compatible APIs:
| Provider | Default URL | Default Model | Setup |
|---|---|---|---|
| Ollama | http://localhost:11434/v1 |
llama3.2 |
ollama serve then ollama pull <model> |
| LM Studio | http://localhost:1234/v1 |
local |
Load model, enable Local Server in Developer tab |
- Web app: Select provider in the "LLM Provider" dropdown and optionally override the model name.
- CLI: Use
llm_provider=ollamaorllm_provider=lmstudio, or run with--config-name ollamafor Ollama defaults.
chaplin_ui/core/- Shared code used by CLI and Web interfacesweb_app.py- FastAPI server handling video uploads and processingchaplin.py- CLI version with keyboard typingpipelines/- VSR model inference pipeline
Web App:
source .venv/bin/activate
python web_app.pyCLI:
source .venv/bin/activate
python main.py config_filename=./configs/LRS3_V_WER19.1.ini detector=mediapipe
# With Ollama:
python main.py --config-name ollama
# Or: python main.py llm_provider=ollama llm_model=mistralWe follow Python best practices:
- Type hints on all functions
- Docstrings (Google style) for all public functions
- Logging instead of print statements
- Constants centralized in
chaplin_ui/core/constants.py
# Test imports
python -c "from chaplin_ui.core import *; print('✓ All imports work')"
# Test web app
python web_app.py &
curl http://localhost:8000/api/healthWe love contributions! Whether it's:
- 🐛 Bug fixes
- ✨ New features
- 📝 Documentation improvements
- 🎨 UI/UX enhancements
- 🔧 Code refactoring
See our Contributing Guide for details on:
- How to set up your development environment
- Code style guidelines
- How to submit pull requests
- Where to ask questions
First time contributing? Check out our good first issues!
This project is licensed under the MIT License - see LICENSE for details.
Chaplin-UI is based on Chaplin by Amanvir Parhar. We're grateful for the original work that made this project possible!
- VSR Model: Based on Auto-AVSR by mpc001 (19.1% WER on LRS3)
- Dataset: Lip Reading Sentences 3
- LLM: Uses Ollama or LM Studio for local text correction (both OpenAI-compatible)
- 🐛 Found a bug? Open an issue
- 💡 Have an idea? Start a discussion
- 📧 Questions? Check our FAQ or open a discussion
Made with ❤️ by the open source community
