Skip to content

🎬 Crash-safe Rust pipeline for generating multi-language subtitles from video files β€” fast, resumable, and designed for batch jobs

Notifications You must be signed in to change notification settings

kevinmichaelchen/subtitles

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Subtitles

A crash-safe Rust pipeline for generating multi-language subtitles from your video library β€” fast, resumable, and designed for batch jobs.

Features

  • 🎬 Extract audio from MP4/MKV video files
  • πŸ—£οΈ Transcribe speech to text using Whisper
  • 🌍 Translate subtitles to multiple languages (English, Spanish, German)
  • πŸ“„ Output in SRT and VTT formats
  • ♻️ Resumable checkpoints (stage-level + translation segment-level) to avoid redoing work
  • 🧹 Built-in cleanup (subtitles clean) to reclaim checkpoint storage

Why It's Different ✨

  • πŸ’Ύ Never redo hours of work: crash-safe checkpoints resume from where you left off by default.
  • 🏠 Local-first by design: translate offline with Ollama or plug in OpenAI.
  • πŸ“š Built for libraries: batch processing with configurable concurrency.
  • βœ… Predictable outputs: clear naming conventions and standards-compliant SRT/VTT.

Quick Start

cargo build --release
./target/release/subtitles generate movie.mp4 --languages en,es,de

Resume is enabled by default. To force a clean run, pass --no-resume.

Maintenance 🧹

subtitles clean

Documentation

  • πŸ“‹ Requirements β€” User stories and acceptance criteria
  • πŸ—οΈ Design β€” Technical architecture and implementation details
Glossary
Term Definition
SRT SubRip Subtitle format. A simple text-based subtitle format with sequential numbering, timestamps, and text. Widely supported by media players.
VTT WebVTT (Web Video Text Tracks). A subtitle format designed for HTML5 video, supporting styling and positioning. Used by web browsers and streaming platforms.
STT Speech-to-Text. The process of converting spoken audio into written text. Also called automatic speech recognition (ASR).
TTS Text-to-Speech. The inverse of STT β€” converting written text into spoken audio. Not used in this project but often confused with STT.
Whisper An open-source speech recognition model by OpenAI. Supports multiple languages and produces timestamped transcriptions.
Ollama A tool for running large language models locally. Used here as a translation backend that doesn't require internet access or API keys.
FFmpeg A multimedia framework for handling video, audio, and other multimedia files. Used here to extract audio tracks from video containers.
Segment A single unit of subtitle text with a start time, end time, and content. Multiple segments make up a complete subtitle file.
Plex A media server platform for organizing and streaming personal media collections. This tool generates subtitles compatible with Plex's subtitle discovery conventions.
ISO 639 An international standard for language codes. We use ISO 639-1 two-letter codes (e.g., en for English, es for Spanish, de for German) throughout this project.

License

MIT

About

🎬 Crash-safe Rust pipeline for generating multi-language subtitles from video files β€” fast, resumable, and designed for batch jobs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages