RunAndRead-Audiobook is an open-source project aimed at generating high-quality text-to-speech (TTS) audiobooks using open-source models like Zyphra/Zonos.
The ultimate goal is to make Run & Read, the audiobook player app, sound more natural by using high-quality voices. Currently, it relies on the standard voices embedded in Apple and Android devices, which are still not perfect. Starting from Android v1.5 (6) and iOS v1.6 (18), Run & Read supports MP3 audiobooks generated using the RANDR pipeline in this repository. See instructions here.
Download and try the apps for free!
π App Store: Ran & Read for Apple Devices
π€ Google Play: Ran & Read for Android
π± Scan QR Codes to Download:
---Generate high-quality audiobooks at home using open-source AI models! Weβve built a pipeline using MLX-AUDIO to create audiobooks in the RANDR format, optimized for playback in the Run & Read app.
π Dedicated document with step-by-step instructions
β
Convert EPUB to JSON for text extraction.
β
Generate audio files using Zonos TTS model.
β
Generate audio files using Kokoro-TTS by AUDIO-MLX.
β
Clone voices from provided MP3 sample.
β
Play audio clips sequentially while displaying text in the terminal.
β
Merge audio clips into one file.
β
Zyphra API support for cloud-based TTS.
β
Deepgram API support for cloud-based TTS.
β
Wrap produced audio and json files into zip file readable by Run & Read App.
β
Transfer audio files to a mobile phone and play them in the Run & Read App.
π Calculate the Self-Cost of Complete Book Generation: Cloud vs. Local.
π On-device TTS model for mobile apps (Android/iOS).
Here are some audiobook samples generated using RunAndRead-Audiobook with Zonos TTS voice cloning:
[Sample 1 - Alice in Wonderland]
π You can find examples under the audio/pg11/ folder, and generate your own samples using the steps outlined in the Usage section below.
- Python 3.9+
- Zyphra/Zonos (open-source TTS engine)
- ffmpeg (audio conversion)
- EbookLib (EPUB parsing)
- PyAudio /
playsound
(for playback) - yt-dlp (to download MP3 files from YouTube for voice cloning)
pip install -r requirements.txt
Follow the official installation instructions from Zyphra/Zonos. Using a uv
virtual
environment is recommended for running RunAndRead scripts. After installing the Zonos project, run the sample.py
script:
uv run sample.py
This will download the "Zyphra/Zonos-v0.1-transformer" base model from Hugging Face and store it in your environment.
- macOS:
brew install ffmpeg
- Ubuntu:
sudo apt install ffmpeg
- Windows: Download from ffmpeg.org and add to system PATH.
To train a Zonos voice clone, you'll need an MP3 sample of the speaker. A 10-20 minute video with a single
speaker (e.g., a tutorial or audiobook) is recommended. You can download an MP3 track from YouTube using yt-dlp
:
yt-dlp -x --audio-format mp3 "https://www.youtube.com/watch?v=MkLBNUMc26Y" -o "assets/exampleaudio.mp3"
This exampleaudio.mp3
file will be used by the Zonos model to fine-tune the voice sample before actual synthesis.
First, run this script with 0
as the third parameter:
python epub_to_json.py epub/pg11.epub library/pg11.json 0
Check the terminal output to find how many lines should be skipped, then rerun the script with the number of the first line to keep:
python epub_to_json.py epub/pg11.epub library/pg11.json 10
This ensures that the book starts from the correct position, e.g.:
10: CHAPTER I. Down the Rabbit-Hole
π¨ Note: Without an NVIDIA GPU, converting an entire book to audio takes a long time. A 30-second audio clip
takes approximately 3 minutes to generate on macbook pro, m1. A full book can take dozens of hours. For example,
Aliceβs Adventures in Wonderland is 3 hours long, meaning 18 hours of processing on a MacBook Pro with an M1
processor. However, the make_abook
script can be interrupted at any time, and it will resume from the position where
it was stopped.
uv run python make_abook.py library/pg21279.json assets/kurt_v.mp3
python play_audio.py audio/pg11 mp3
python merge_audio_clips.py library/pg11.json audio/pg11 mp3
# YouTube
ffmpeg -loop 1 -i assets/ic_launcher.png -i audio/pg11/merged_output.mp3 -c:v libx264 -c:a aac -b:a 192k -shortest output.mp4
# LinkedIn
ffmpeg -loop 1 -i appGoogle.png -i merged_output.mp3 -vf "scale=1080:1080,format=yuv420p" -c:v libx264 -tune stillimage -c:a aac -b:a 192k -shortest output.mp4
# X
ffmpeg -loop 1 -i appGoogle.png -i merged_output.mp3 -vf "scale=1080:1080,format=yuv420p" -c:v libx264 -tune stillimage -c:a aac -b:a 192k -pix_fmt yuv420p -shortest output.mp4
# Zyphra
export ZYPHRA_API_KEY="your-zyphra-api-key"
python zyphra_api.py library/pg11.json
# DeepGarm
export DEEPGRAM_API_KEY="your-deepgram-api-key"
python deepgram_api.py library/pg11.json
# OpenAI MINI TTS
export OPENAI_API_KEY="your-open-api-key"
python make_abook_open_ai.py library/pg11.json
pip install -e ~/projects/voice/mlx-audio
π¨ Note: Kokoro-82M TTS model skips names and other out-of-dictionary (OOD) words due to its reliance on an external grapheme-to-phoneme (g2p) conversion tool called espeak-ng2. This behavior occurs when espeak-ng is not properly installed or detected by the system.
To prevent Kokoro-82M from skipping names and OOD words, you need to install espeak-ng
echo 'export ESPEAK_DATA_PATH=/opt/homebrew/share/espeak-ng-data' >> ~/.zshrc
source ~/.zshrc
# make audio book
python make_abook_mlx.py library/pg2680.json
python make_randr.py audio/pg20203/
runandread-audiobook/
βββ epub_to_json.py # Extracts text from EPUB into JSON
βββ make_abook.py # Converts text into audio files with Zonos TTS
βββ make_abook_mlx.py # Converts text into audio files using the Kokoro-82M TTS model with mlx-audio (optimized for Apple M-series processors).
βββ make_randr.py # Wrap the produced audio and JSON files into a ZIP file readable by the Run & Read app.
βββ play_audio.py # Play audio clips sequentially while displaying text
βββ merge_audio_clips.py # Merges audio files into one and generates a timestamped JSON file
βββ word_tokens_tools.py # Utility to normalize the text before pass it to the TTS
βββ test_scan_next.py # Unit tests to make sure text normalization works as expected
βββ zyphra_api.py # Converts text into audio files with Zyphra SDK/Rest API API
βββ deepgram_api.py # Converts text into audio files with Deepgram SDK/Rest API API
βββ make_abook_open_ai.py# Converts text into audio files with OpenAI TTS
βββ assets/ # Stores MP3 files for voice cloning
βββ epub/ # EPUB books from the Gutenberg Project
βββ audio/ # Output audio files
βββ audiobooks/ # RAND audiobooks samples
βββ pg2680.randr # Meditations by Emperor of Rome Marcus Aurelius
βββ pg20203.randr # Autobiography of Benjamin Franklin
βββ library/ # Output JSON book files
βββ README.md # Documentation
βββ requirements.txt # Dependencies
βββ LICENSE # Open-source license
Contributions are welcome! Feel free to open an issue or submit a pull request.
- π― Ultimate Goal: On-device TTS model.
- Zonos - Open-source TTS model.
- AUDIO-MLX - A TTS and STS library built on Apple's MLX framework.
- Kokoro-TTS - An open-weight TTS model with 82 million parameters.
- Deepgram - Commercial cloud-based TTS (future integration).
- EbookLib - EPUB parsing in Python.
- yt-dlp - YouTube audio downloader for voice cloning.
- Gutenberg Project - A library of over 75,000 free eBooks.
- Python Simplified, MariyaSha - Python Simplified. Kudos to Mariya for her beautiful voice that I did clone from one of her videos.
- Sergey N - Connect and follow me on LinkedIn.
This project is open-source and available under the MIT License.