A crash-safe Rust pipeline for generating multi-language subtitles from your video library β fast, resumable, and designed for batch jobs.
- π¬ Extract audio from MP4/MKV video files
- π£οΈ Transcribe speech to text using Whisper
- π Translate subtitles to multiple languages (English, Spanish, German)
- π Output in SRT and VTT formats
- β»οΈ Resumable checkpoints (stage-level + translation segment-level) to avoid redoing work
- π§Ή Built-in cleanup (
subtitles clean) to reclaim checkpoint storage
- πΎ Never redo hours of work: crash-safe checkpoints resume from where you left off by default.
- π Local-first by design: translate offline with Ollama or plug in OpenAI.
- π Built for libraries: batch processing with configurable concurrency.
- β Predictable outputs: clear naming conventions and standards-compliant SRT/VTT.
cargo build --release
./target/release/subtitles generate movie.mp4 --languages en,es,deResume is enabled by default. To force a clean run, pass --no-resume.
subtitles clean- π Requirements β User stories and acceptance criteria
- ποΈ Design β Technical architecture and implementation details
Glossary
| Term | Definition |
|---|---|
| SRT | SubRip Subtitle format. A simple text-based subtitle format with sequential numbering, timestamps, and text. Widely supported by media players. |
| VTT | WebVTT (Web Video Text Tracks). A subtitle format designed for HTML5 video, supporting styling and positioning. Used by web browsers and streaming platforms. |
| STT | Speech-to-Text. The process of converting spoken audio into written text. Also called automatic speech recognition (ASR). |
| TTS | Text-to-Speech. The inverse of STT β converting written text into spoken audio. Not used in this project but often confused with STT. |
| Whisper | An open-source speech recognition model by OpenAI. Supports multiple languages and produces timestamped transcriptions. |
| Ollama | A tool for running large language models locally. Used here as a translation backend that doesn't require internet access or API keys. |
| FFmpeg | A multimedia framework for handling video, audio, and other multimedia files. Used here to extract audio tracks from video containers. |
| Segment | A single unit of subtitle text with a start time, end time, and content. Multiple segments make up a complete subtitle file. |
| Plex | A media server platform for organizing and streaming personal media collections. This tool generates subtitles compatible with Plex's subtitle discovery conventions. |
| ISO 639 | An international standard for language codes. We use ISO 639-1 two-letter codes (e.g., en for English, es for Spanish, de for German) throughout this project. |
MIT