A production-ready EPUB to TTS converter that converts EPUB files into optimized text and audio using multiple TTS engines (Kokoro, ElevenLabs, Hume AI) and local VLM for image descriptions.
- Modern EPUB Processing: Production-ready EPUB parsing with OmniParser (8.7x faster than Pandoc)
- Advanced Text Processing: Modern NLP-based processing with spaCy, plus legacy regex fallback
- Chapter Segmentation: Intelligent chapter detection using TOC data and ML confidence scoring
- Multi-Engine TTS Integration: Support for three TTS engines:
- Kokoro TTS: Local, high-quality audio with MLX optimization
- ElevenLabs: Premium cloud-based voices with natural inflection
- Hume AI: Ultra-low latency (<200ms), emotion-aware synthesis with multilingual support (11 languages)
- Local VLM Integration: Image content description using local vision-language models (Gemma, LLaVA)
- Split-Window Terminal UI: Advanced real-time progress tracking with live stats and activity logs
- Batch Processing: Parallel processing of multiple files with comprehensive error handling
- Resume Capability: Smart resume functionality for interrupted processing
- Flexible Output: Support for text, SSML, and JSON formats with comprehensive metadata
- Auto-loading Models: Automatic TTS and VLM model detection and loading
- Production Ready: Comprehensive logging, error handling, and performance monitoring
# Install UV (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies
uv sync
# Basic text extraction
uv run python scripts/process_epub.py book.epub
# Full pipeline with Kokoro TTS (local)
uv run python scripts/process_epub.py book.epub --tts --voice "bf_lily"
# Full pipeline with ElevenLabs TTS (cloud)
uv run python scripts/process_epub.py book.epub --tts --engine elevenlabs --voice "Rachel"
# Full pipeline with Hume AI TTS (ultra-low latency)
uv run python scripts/process_epub.py book.epub --tts --engine hume --voice "Female English Actor"
# Full pipeline with advanced UI
uv run python scripts/process_epub.py book.epub --tts --images --ui-mode split-window
# Batch processing
uv run python scripts/batch_convert.py ./ebooks/*.epub --parallel 4
# Alternative: Use the main CLI interface
uv run python src/cli.py convert book.epub --tts --voice "bf_lily"
uv run python src/cli.py batch ./ebooks/*.epub --parallel 4
uv run python src/cli.py test # Test system setup
uv run python src/cli.py info book.epub # Get EPUB information- Python 3.10+ (required for Kokoro TTS)
- UV (Python package manager)
- OmniParser (EPUB parsing - included as dependency)
- FFmpeg (for audio processing)
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install ffmpeg
# macOS
brew install ffmpeg
# Windows (using chocolatey)
choco install ffmpeggit clone https://github.com/AutumnsGrove/epub2tts.git
cd epub2tts
# Install UV if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies and create virtual environment
uv sync
# Verify installation
uv run python scripts/process_epub.py --helpCreate a custom configuration file:
# config/custom_config.yaml
processing:
temp_dir: "/tmp/epub2tts"
text_processing:
processor_mode: "modern" # "modern" (spacy+nlp), "legacy" (regex), "hybrid"
spacy_model: "en_core_web_sm"
tts:
engine: "kokoro" # "kokoro", "elevenlabs", or "hume"
voice: "bf_lily"
speed: 1.1
model_path: "./models/Kokoro-82M-8bit"
use_mlx: true
# Kokoro TTS settings (local engine)
kokoro:
voice: "bf_lily"
speed: 1.1
model_path: "./models/Kokoro-82M-8bit"
use_mlx: true
# ElevenLabs settings (cloud engine)
elevenlabs:
api_key: "${ELEVENLABS_API_KEY}" # Set via environment variable
voice: "Rachel"
model: "eleven_multilingual_v2"
stability: 0.5
similarity_boost: 0.75
# Hume AI settings (ultra-low latency engine)
hume:
api_key: "${HUME_API_KEY}" # Set via environment variable
voice: "Female English Actor"
language: "en" # en, es, fr, de, it, pt, ru, ja, ko, hi, ar
model: "octave-2" # Latest model (40% faster, 50% lower cost)
streaming: false
image_description:
enabled: true
model: "gemma-3n-e4b"
api_url: "http://127.0.0.1:1234"
ui:
mode: "split-window" # "classic" or "split-window"
show_progress_bars: true
output:
text_format: "plain" # "plain", "ssml", "json"
save_intermediate: true
generate_toc: trueUse with:
uv run python scripts/process_epub.py book.epub -c config/custom_config.yaml- Modern Mode (default): Uses spaCy NLP for intelligent text processing and chapter detection
- Legacy Mode: Traditional regex-based cleaning for maximum compatibility
- Hybrid Mode: Combines both approaches for optimal results
- OmniParser: Production-ready EPUB parser with native TOC support and high-accuracy chapter detection (8.7x faster than legacy Pandoc)
- Classic Mode: Traditional command-line progress bars
- Split-Window Mode: Advanced terminal UI with real-time stats, progress tracking, and activity logs
- bf_lily: Female British English (default)
- am_michael: Male American English
- bf_emma: Female British English (alternative)
- am_sarah: Female American English
- Rachel: Female American English (conversational)
- Adam: Male American English (deep)
- Bella: Female American English (soft)
- Antoni: Male American English (warm)
- Custom voice cloning available
Ultra-low latency (<200ms) with emotion-aware synthesis:
- Female English Actor: Expressive female English voice
- Male English Actor: Expressive male English voice
- Female Spanish Actor: Expressive female Spanish voice
- Male Spanish Actor: Expressive male Spanish voice
- Supports 11 languages: en, es, fr, de, it, pt, ru, ja, ko, hi, ar
- Voice cloning support available
- 40% faster than previous generation (Octave 1)
- 50% lower cost than Octave 1
- gemma-3n-e4b: Lightweight vision model (default)
- llava: More comprehensive but resource-intensive
┌─────────────┐ ┌──────────────┐ ┌────────────────┐ ┌─────────────┐
│ EPUB Input │────▶│ OmniParser │────▶│ Modern Text │────▶│ Clean Text │
└─────────────┘ │ + TOC │ │ Pipeline │ │ + Chapters │
│ Extraction │ │ (spaCy+NLP) │ │ Output │
└──────────────┘ └────────────────┘ └─────────────┘
│ │
┌───────▼────────┐ ┌────────▼────────┐
│ Image Pipeline │ │ TTS Pipeline │
│ (Gemma/LLaVA) │ │ (Multi-Eng) │
└────────────────┘ └─────────────────┘
│ │
┌───────▼────────┐ ┌────────▼────────┐
│Image Desc Text │ │ ┌─────────────┐ │
└────────────────┘ │ │Kokoro (MLX) │ │
│ │ ElevenLabs │ │
│ │ Hume AI │ │
│ └─────────────┘ │
│ Audio Files │
│ + Merged Book │
└─────────────────┘
# All tests
uv run pytest tests/
# Unit tests only
uv run pytest tests/unit/
# With coverage
uv run pytest tests/ --cov=src --cov-report=html
# Test setup and dependencies
uv run python src/cli.py test# Format Python code
uv run black .
# Check formatting
uv run black --check .
# Run linting
uv run flake8 src/ tests/
# Type checking
uv run mypy src/- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes and add tests
- Run tests:
uv run pytest tests/ - Format code:
uv run black . - Commit changes:
git commit -m "feat: add amazing feature" - Push to branch:
git push origin feature/amazing-feature - Open a Pull Request
MIT License - see LICENSE file for details.
- Text extraction: < 5 seconds for 100MB EPUB
- Text cleaning: < 10 seconds for 500KB text
- TTS generation: 2x faster than playback speed
- Memory usage: < 500MB for typical novel
# Process single EPUB file
uv run python src/cli.py convert book.epub [OPTIONS]
# Batch process multiple files
uv run python src/cli.py batch *.epub [OPTIONS]
# Get EPUB information
uv run python src/cli.py info book.epub
# Validate EPUB file
uv run python src/cli.py validate book.epub
# Test system setup
uv run python src/cli.py test
# Show configuration
uv run python src/cli.py config --show-config# Process EPUB (with more examples in help)
uv run python scripts/process_epub.py book.epub [OPTIONS]
# Batch convert with advanced options
uv run python scripts/batch_convert.py *.epub [OPTIONS]# Default local TTS
uv run python scripts/process_epub.py book.epub --tts
# With specific voice
uv run python scripts/process_epub.py book.epub --tts --voice "bf_lily"
# With speed adjustment
uv run python scripts/process_epub.py book.epub --tts --voice "am_michael" --speed 1.2# Basic ElevenLabs usage (requires ELEVENLABS_API_KEY)
uv run python scripts/process_epub.py book.epub --tts --engine elevenlabs --voice "Rachel"
# With custom model and settings
uv run python scripts/process_epub.py book.epub --tts --engine elevenlabs --voice "Adam" --model "eleven_turbo_v2"# Basic Hume usage (requires HUME_API_KEY)
uv run python scripts/process_epub.py book.epub --tts --engine hume
# With specific voice
uv run python scripts/process_epub.py book.epub --tts --engine hume --voice "Female English Actor"
# With different language
uv run python scripts/process_epub.py book.epub --tts --engine hume --voice "Male Spanish Actor" --language es
# With streaming mode for ultra-low latency
uv run python scripts/process_epub.py book.epub --tts --engine hume --streaming
# French audiobook
uv run python scripts/process_epub.py livre.epub --tts --engine hume --language fr
# Japanese audiobook
uv run python scripts/process_epub.py book.epub --tts --engine hume --language ja# Set up ElevenLabs API key
export ELEVENLABS_API_KEY="your_elevenlabs_api_key_here"
# Set up Hume AI API key
export HUME_API_KEY="your_hume_api_key_here"
# Or add to your shell profile (~/.bashrc, ~/.zshrc, etc.)
echo 'export ELEVENLABS_API_KEY="your_key_here"' >> ~/.bashrc
echo 'export HUME_API_KEY="your_key_here"' >> ~/.bashrc# Copy the template to create your secrets file
cp secrets_template.json secrets.json
# Edit secrets.json with your API keys
# {
# "elevenlabs_api_key": "your_elevenlabs_api_key_here",
# "hume_api_key": "your_hume_api_key_here",
# "anthropic_api_key": "your_anthropic_api_key_here",
# "comment": "Add your API keys here. This file should be kept private."
# }Note: The secrets.json file is already in .gitignore to prevent accidentally committing API keys to version control.
See docs/TROUBLESHOOTING.md for common issues and solutions.
See docs/API.md for detailed API documentation.