Convert any SRT subtitle file into a fully timed voice-over. The tool reads subtitles, generates speech with your chosen TTS provider (OpenAI, ElevenLabs, or Google Cloud), and outputs a single synced audio track. It supports caching, multiple jobs, and optional Serbian Latin→Cyrillic transliteration for better pronunciation.
-
Install dependencies:
pip install -r requirements.txt
-
Important: Copy
.env.exampleto.envand populate with your real credentials:cp .env.example .env
Open the
.envfile and replace placeholder values with your actual API keys and preferences. Each variable is documented with descriptions and valid options. At minimum, you must set the API key for your chosen provider:- For OpenAI: Set
OPENAI_API_KEY - For ElevenLabs: Set
ELEVENLABS_API_KEY - For Google Cloud: Set
GOOGLE_APPLICATION_CREDENTIALSpath
- For OpenAI: Set
-
Prepare your input SRT file:
Place your subtitle file as
input.srtin the project root, or specify a custom path using theTTS_SRT_PATHenvironment variable or command-line argument. An example file is provided in theexample/directory for reference.
Run a basic conversion (replace with your file and job name):
python3 main.py --srt_path examples/basic/input.srt --provider openai --job-name your_job_nameThis generates an audio file in the output/ directory with the naming pattern: {job_name}-{provider}-voiceover.mp3.
Choose your TTS provider:
# Use ElevenLabs
python3 main.py --provider elevenlabs
# Use Google Cloud TTS
python3 main.py --provider google
# Use OpenAI
python3 main.py --provider openaiOr set TTS_PROVIDER in your .env file.
Job names are important because they:
- Organize cache per job and provider at
.cache/{provider}/{job_name}/ - Form the final output filename:
output/{job_name}-{provider}-voiceover.mp3
Use them to keep different projects/versions separate and reproducible:
# Process a specific project
python3 main.py --job-name contest-training
# Output: output/contest-training-openai-voiceover.mp3
# Cache: .cache/openai/contest-training/# Different job with different provider
python3 main.py --job-name production-narration --provider elevenlabs
# Output: output/production-narration-elevenlabs-voiceover.mp3
# Cache: .cache/elevenlabs/production-narration/Job names default to "default" if not specified. Set TTS_JOB_NAME in .env for a persistent default.
Specify a custom SRT file path:
# Using command-line argument
python3 main.py path/to/your/subtitles.srt
# Using environment variable (in .env)
TTS_SRT_PATH=path/to/your/subtitles.srtOverride the default output location:
python3 main.py -o custom/path/my-audio.mp3When using -o, the job name prefix is not added to maintain your exact filename.
When the SRT text is Serbian in Latin script, we automatically transliterate it to Cyrillic (configurable via TTS_TRANSLITERATE). Some TTS models recognize and pronounce Serbian more accurately from Cyrillic text, improving prosody and proper names. The transliterated file is cached at .cache/{provider}/{job_name}/input-transliterated.srt.
--provider: Selects the TTS backend (openai,elevenlabs,google)--job-name: Organizes cache and output by project/version name-o,--out: Custom output file path--model,--voice,--instructions,--force-language: OpenAI-specific overrides--format: Request a different output format/extension when supported--no-fill: Skip padding to subtitle ends--hard-cut: Trim overruns instead of speeding up--max-speedup: Cap speech acceleration (default 1.15x)--pad-start,--pad-end: Add leading/trailing silence in milliseconds--max-chars: Limit characters per TTS call--cache-dir: Override base cache directory
python3 main.py \
--provider elevenlabs \
--job-name my-project \
--max-speedup 1.2 \
--pad-start 1000 \
--pad-end 1000 \
path/to/subtitles.srtThis command:
- Uses ElevenLabs for synthesis
- Organizes cache under
.cache/elevenlabs/my-project/ - Outputs to
output/my-project-elevenlabs-voiceover.mp3 - Allows up to 20% speed increase if needed
- Adds 1 second of silence at start and end
- Processes the specified SRT file
ℹ️ Copy
.env.exampleto.envand populate it with your real provider credentials before running the app.
- Input: Your SRT can live anywhere (not just project root). Our suggestion for getting started is to keep a copy under an
examples/folder, e.g.examples/basic/input.srt. You can replace it with your own file at any time. - Cache:
.cache/{provider}/{job_name}/— cached TTS chunks and transliterated SRT for fast reruns and isolation across jobs/providers. - Output:
output/{job_name}-{provider}-{output_name}.mp3— final generated audio. The folder is created automatically and ignored by Git. This structure keeps projects tidy, enables multiple jobs side-by-side, and makes outputs easy to find.
- Missing input file: Ensure
input.srtexists in the project root, or provide a path viaTTS_SRT_PATHor command-line argument. - API errors: Verify your API keys are correctly set in
.envfor your chosen provider. - Audio processing issues: Install
ffmpegfor audio processing:sudo apt install ffmpeg(Linux) orbrew install ffmpeg(macOS). - Cache issues: Clear provider/job cache to force re-synthesis:
rm -rf .cache/{provider}/{job_name}/ - Google Cloud TTS: Confirm your service account has
roles/cloudtexttospeech.userpermission andGOOGLE_APPLICATION_CREDENTIALSpoints to a valid JSON file. - ElevenLabs voices: Find available voice IDs at https://elevenlabs.io/app/voice-library