Skip to content

πŸš€ Open-source project for creating high-quality AI TTS-narrated audiobooks at home using models like Zonos, Kokoro-82M, or services like Deepgram and Eleven Labs. Tested on Apple Silicon M1 (32GB RAM). πŸ“–πŸŽ§

License

Notifications You must be signed in to change notification settings

sergenes/runandread-audiobook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

27 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RunAndRead-Audiobook

Alt text

πŸ“– Overview

RunAndRead-Audiobook is an open-source project aimed at generating high-quality text-to-speech (TTS) audiobooks using open-source models like Zyphra/Zonos.

The ultimate goal is to make Run & Read, the audiobook player app, sound more natural by using high-quality voices. Currently, it relies on the standard voices embedded in Apple and Android devices, which are still not perfect. Starting from Android v1.5 (6) and iOS v1.6 (18), Run & Read supports MP3 audiobooks generated using the RANDR pipeline in this repository. See instructions here.

Download and try the apps for free!

🍏 App Store: Ran & Read for Apple Devices
πŸ€– Google Play: Ran & Read for Android

πŸ“± Scan QR Codes to Download:

Β Β Β 
---

πŸ“’ New: Create Audiobooks with AI (RANDR Format)

Generate high-quality audiobooks at home using open-source AI models! We’ve built a pipeline using MLX-AUDIO to create audiobooks in the RANDR format, optimized for playback in the Run & Read app.

πŸ“– Dedicated document with step-by-step instructions

πŸš€ Features

βœ… A fully functional pipeline for generating audiobooks compatible with the Run & Read app.

βœ… Convert EPUB to JSON for text extraction.
βœ… Generate audio files using Zonos TTS model.
βœ… Generate audio files using Kokoro-TTS by AUDIO-MLX.
βœ… Clone voices from provided MP3 sample.
βœ… Play audio clips sequentially while displaying text in the terminal.
βœ… Merge audio clips into one file.
βœ… Zyphra API support for cloud-based TTS.
βœ… Deepgram API support for cloud-based TTS.
βœ… Wrap produced audio and json files into zip file readable by Run & Read App.
βœ… Transfer audio files to a mobile phone and play them in the Run & Read App. πŸ”œ Calculate the Self-Cost of Complete Book Generation: Cloud vs. Local.
πŸ”œ On-device TTS model for mobile apps (Android/iOS).


🎧 Audio Samples

Here are some audiobook samples generated using RunAndRead-Audiobook with Zonos TTS voice cloning:

[Sample 1 - Alice in Wonderland]

πŸ“Œ You can find examples under the audio/pg11/ folder, and generate your own samples using the steps outlined in the Usage section below.


πŸ“¦ Dependencies & Technologies

  • Python 3.9+
  • Zyphra/Zonos (open-source TTS engine)
  • ffmpeg (audio conversion)
  • EbookLib (EPUB parsing)
  • PyAudio / playsound (for playback)
  • yt-dlp (to download MP3 files from YouTube for voice cloning)

πŸ›  Installation

1️⃣ Install Python Dependencies

pip install -r requirements.txt

2️⃣ Set Up Zyphra/Zonos

Follow the official installation instructions from Zyphra/Zonos. Using a uv virtual environment is recommended for running RunAndRead scripts. After installing the Zonos project, run the sample.py script:

uv run sample.py

This will download the "Zyphra/Zonos-v0.1-transformer" base model from Hugging Face and store it in your environment.

3️⃣ Set Up ffmpeg

4️⃣ Download a Voice Sample from YouTube

To train a Zonos voice clone, you'll need an MP3 sample of the speaker. A 10-20 minute video with a single speaker (e.g., a tutorial or audiobook) is recommended. You can download an MP3 track from YouTube using yt-dlp:

yt-dlp -x --audio-format mp3 "https://www.youtube.com/watch?v=MkLBNUMc26Y" -o "assets/exampleaudio.mp3"

This exampleaudio.mp3 file will be used by the Zonos model to fine-tune the voice sample before actual synthesis.


πŸ“š Usage

Step 1: Convert EPUB to JSON

First, run this script with 0 as the third parameter:

python epub_to_json.py epub/pg11.epub library/pg11.json 0

Check the terminal output to find how many lines should be skipped, then rerun the script with the number of the first line to keep:

python epub_to_json.py epub/pg11.epub library/pg11.json 10

This ensures that the book starts from the correct position, e.g.:

10: CHAPTER I. Down the Rabbit-Hole

🚨 Note: Without an NVIDIA GPU, converting an entire book to audio takes a long time. A 30-second audio clip takes approximately 3 minutes to generate on macbook pro, m1. A full book can take dozens of hours. For example, Alice’s Adventures in Wonderland is 3 hours long, meaning 18 hours of processing on a MacBook Pro with an M1 processor. However, the make_abook script can be interrupted at any time, and it will resume from the position where it was stopped.

Step 2: Generate TTS Audio Files

uv run python make_abook.py library/pg21279.json assets/kurt_v.mp3

Step 3: Play Audiobook in CLI

python play_audio.py audio/pg11 mp3

Step 4: Merge a set of audio clips into one audio file

python merge_audio_clips.py library/pg11.json audio/pg11 mp3

Step 5: Prepare audio clip for YouTube/LinkedIn

# YouTube
ffmpeg -loop 1 -i assets/ic_launcher.png -i audio/pg11/merged_output.mp3 -c:v libx264 -c:a aac -b:a 192k -shortest output.mp4 
# LinkedIn
ffmpeg -loop 1 -i appGoogle.png -i merged_output.mp3 -vf "scale=1080:1080,format=yuv420p" -c:v libx264 -tune stillimage -c:a aac -b:a 192k -shortest output.mp4

# X
ffmpeg -loop 1 -i appGoogle.png -i merged_output.mp3 -vf "scale=1080:1080,format=yuv420p" -c:v libx264 -tune stillimage -c:a aac -b:a 192k -pix_fmt yuv420p -shortest output.mp4

Step 6: Setup Rest Zyphra/Deepgram/OpenAI SDK

# Zyphra
export ZYPHRA_API_KEY="your-zyphra-api-key"
python zyphra_api.py library/pg11.json
# DeepGarm
export DEEPGRAM_API_KEY="your-deepgram-api-key"
python deepgram_api.py library/pg11.json
# OpenAI MINI TTS
export OPENAI_API_KEY="your-open-api-key"
python make_abook_open_ai.py library/pg11.json

Step 7: Setup MLX-AUDIO (cloned local repo)

pip install -e ~/projects/voice/mlx-audio

🚨 Note: Kokoro-82M TTS model skips names and other out-of-dictionary (OOD) words due to its reliance on an external grapheme-to-phoneme (g2p) conversion tool called espeak-ng2. This behavior occurs when espeak-ng is not properly installed or detected by the system. To prevent Kokoro-82M from skipping names and OOD words, you need to install espeak-ng

echo 'export ESPEAK_DATA_PATH=/opt/homebrew/share/espeak-ng-data' >> ~/.zshrc
source ~/.zshrc

# make audio book
python make_abook_mlx.py library/pg2680.json 

Step 8: Make RANDR Audiobook

python make_randr.py audio/pg20203/

πŸ“‚ Project Structure

runandread-audiobook/
β”œβ”€β”€ epub_to_json.py      # Extracts text from EPUB into JSON
β”œβ”€β”€ make_abook.py        # Converts text into audio files with Zonos TTS
β”œβ”€β”€ make_abook_mlx.py    # Converts text into audio files using the Kokoro-82M TTS model with mlx-audio (optimized for Apple M-series processors).
β”œβ”€β”€ make_randr.py        # Wrap the produced audio and JSON files into a ZIP file readable by the Run & Read app.
β”œβ”€β”€ play_audio.py        # Play audio clips sequentially while displaying text
β”œβ”€β”€ merge_audio_clips.py # Merges audio files into one and generates a timestamped JSON file
β”œβ”€β”€ word_tokens_tools.py # Utility to normalize the text before pass it to the TTS
β”œβ”€β”€ test_scan_next.py    # Unit tests to make sure text normalization works as expected
β”œβ”€β”€ zyphra_api.py        # Converts text into audio files with Zyphra SDK/Rest API API
β”œβ”€β”€ deepgram_api.py      # Converts text into audio files with Deepgram SDK/Rest API API
β”œβ”€β”€ make_abook_open_ai.py# Converts text into audio files with OpenAI TTS
β”œβ”€β”€ assets/              # Stores MP3 files for voice cloning
β”œβ”€β”€ epub/                # EPUB books from the Gutenberg Project
β”œβ”€β”€ audio/               # Output audio files
β”œβ”€β”€ audiobooks/          # RAND audiobooks samples
     β”œβ”€β”€ pg2680.randr    # Meditations by Emperor of Rome Marcus Aurelius
     β”œβ”€β”€ pg20203.randr   # Autobiography of Benjamin Franklin
β”œβ”€β”€ library/             # Output JSON book files
β”œβ”€β”€ README.md            # Documentation
β”œβ”€β”€ requirements.txt     # Dependencies
└── LICENSE              # Open-source license

🀝 Contributions

Contributions are welcome! Feel free to open an issue or submit a pull request.


πŸ›£οΈ Roadmap

  • 🎯 Ultimate Goal: On-device TTS model.

πŸ“œ References & Kudos

  • Zonos - Open-source TTS model.
  • AUDIO-MLX - A TTS and STS library built on Apple's MLX framework.
  • Kokoro-TTS - An open-weight TTS model with 82 million parameters.
  • Deepgram - Commercial cloud-based TTS (future integration).
  • EbookLib - EPUB parsing in Python.
  • yt-dlp - YouTube audio downloader for voice cloning.
  • Gutenberg Project - A library of over 75,000 free eBooks.
  • Python Simplified, MariyaSha - Python Simplified. Kudos to Mariya for her beautiful voice that I did clone from one of her videos.

πŸ“ž Contact

  • Sergey N - Connect and follow me on LinkedIn.

πŸ“„ License

This project is open-source and available under the MIT License.

About

πŸš€ Open-source project for creating high-quality AI TTS-narrated audiobooks at home using models like Zonos, Kokoro-82M, or services like Deepgram and Eleven Labs. Tested on Apple Silicon M1 (32GB RAM). πŸ“–πŸŽ§

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages