A lightweight command-line wrapper around edge-tts that turns plain text or SubRip (.srt) subtitle files into high-quality spoken audio. The tool can optionally regenerate subtitle timing to match synthesized speech, making it convenient for creating narrated videos, accessibility assets, or localized voice-overs.
- 🔊 Convert plain text documents or
.srtcaption files into MP3 or WAV audio. - 🗣️ Choose from the full catalog of Microsoft Edge neural voices built into
edge-tts. - ✂️ Automatically inserts configurable silence between synthesized segments.
- 🧾 Optionally regenerate
.srtcaption files when starting from plain text input. - ♻️ Robust retry logic to handle transient synthesis errors gracefully.
- Python 3.9 or later.
- System installation of FFmpeg (required by
pydubfor audio processing). - The Python packages listed in
requirements.txt, includingedge-ttsandpydub.
-
Clone the repository and move into the project directory:
git clone https://github.com/<your-username>/edgeTtsCommandLine.git cd edgeTtsCommandLine
-
(Optional) Create and activate a virtual environment:
python3 -m venv .venv source .venv/bin/activate -
Install the required dependencies:
pip install -r requirements.txt
Make sure FFmpeg is available on your PATH. On macOS you can install it with Homebrew (brew install ffmpeg); on Ubuntu, use sudo apt install ffmpeg.
Run the converter by pointing it at an input file, selecting a voice, and providing the desired output path. The output format is inferred from the file extension (.mp3 or .wav).
python tts_converter.py --input input.txt \
--voice en-US-AriaNeural \
--output output/audio.mp3 \
--generate-srt \
--silence 500 \
--rate +0% --volume +0% --pitch +0Hz- Plain text (
.txt) – Each non-empty line becomes a separate utterance. The tool can optionally produce an.srtfile aligned to the generated speech when--generate-srtis passed. - SubRip (
.srt) – Existing captions are synthesized in sequence. Silence between entries is enforced according to--silence, which acts as the minimum gap. Passing--generate-srthas no effect for this mode.
--input– Path to the source text or subtitle file.--voice– Name of the Edge neural voice to use. A comprehensive list is embedded intts_converter.py; you can also refer to theedge-ttsdocumentation for descriptions.--output– Destination file ending in.mp3or.wav.
| Flag | Description |
|---|---|
--generate-srt |
Emit a new .srt file (only when starting from text input). |
--silence <ms> |
Amount of silence in milliseconds between segments (default: 750). |
--rate <percent> |
Adjust speaking rate, e.g. -10% to slow down. |
--volume <percent> |
Adjust output volume, e.g. +5% to increase. |
--pitch <hz> |
Adjust voice pitch, e.g. +2Hz. |
Convert a text script to MP3 and matching captions:
python tts_converter.py --input scripts/lesson.txt \
--voice en-GB-LibbyNeural \
--output build/lesson.mp3 \
--generate-srtRender an existing subtitle file into WAV audio while enforcing a 1 second minimum gap:
python tts_converter.py --input captions/show.srt \
--voice en-US-GuyNeural \
--output build/show.wav \
--silence 1000- Run
python tts_converter.py --helpto see the complete list of CLI options and defaults. - Feel free to extend the script with new features or integrate it into larger automation pipelines.
Contributions, bug reports, and feature requests are welcome! Please open an issue to discuss major changes, and submit a pull request when you're ready.
This project is released under the MIT License.