Turn slides/PDFs into narrated videos — transcripts, TTS, subtitles, and optional avatars.
This repository now contains the FastAPI backend that powers SlideSpeaker. It exposes the task orchestration pipeline, handles transcription/TTS jobs, and serves generated media back to clients. The React/Next.js frontend has been moved into its own repository (slide-speaker-web/), ready to be published as a separate git project.
SlideSpeaker is under active development. Expect rapid iteration, breaking changes, and incomplete tooling while we work toward production readiness.
- Automated script generation from slide decks or PDFs
- Natural-sounding text-to-speech narration with configurable voices
- Optional AI avatars synced to narration for presenter-style videos
- Podcast-ready audio exports for sharing beyond video platforms
- Subtitle outputs in VTT/SRT formats aligned to the narration
- Task-based API that coordinates the full processing pipeline end-to-end
- Responsive light, dark, and auto themes with per-user preferences
- Global language switcher with localized UI labels and stored preferences
- Hybrid authentication powered by NextAuth (Google OAuth + email/password) backed by FastAPI endpoints
- WCAG 2.1 AA compliance with enhanced accessibility features
- High contrast themes for both light and dark modes
- Support for additional languages: Thai, Korean, and Japanese
- Optimized task creation page and improved processing display
- Enhanced web performance for better user experience
- Modern state management with Zustand for improved frontend performance
- Enhanced theme system with proper high contrast support
cd api
uv sync # Install base dependencies
cp .env.example .env # Create config file
# Edit .env to add your API keys
make dev # Start development server (port 8000)cd api
make master-worker # Start master process that spawns workerscd api
python scripts/user_cli.py list
python scripts/user_cli.py create --email you@example.com --password secret --name "You"Use --help on any subcommand to see additional options (show, set-password, delete).
The Next.js/React UI now lives in slide-speaker-web/ (generated beside this repository). Move it to its own git project and follow the instructions in slide-speaker-web/README.md to continue frontend development.
SlideSpeaker is committed to providing an inclusive experience for all users:
- WCAG 2.1 AA compliance for web accessibility standards
- High contrast themes available for both light and dark modes
- Enhanced focus indicators for keyboard navigation
- Screen reader friendly interface
- Support for multiple languages to serve a diverse user base
Visit:
http://localhost:8000/docs- API documentation
-
LLM (OpenAI) - Required for transcript generation
OPENAI_API_KEY(required)- Optional:
OPENAI_BASE_URL(for custom endpoints) - Optional:
OPENAI_TIMEOUT,OPENAI_RETRIES,OPENAI_BACKOFF
-
Text-to-Speech
TTS_SERVICE=openai|elevenlabs(defaults to openai)- ElevenLabs requires
ELEVENLABS_API_KEY
-
Avatar Generation (optional)
- HeyGen:
HEYGEN_API_KEY - OpenAI DALL-E: Uses your
OPENAI_API_KEY
- HeyGen:
-
Storage
- Defaults to local filesystem
- For cloud storage, configure S3 or OSS in
.env
SlideSpeaker supports multiple storage backends:
- Local - Default, stores files in
api/output/ - AWS S3 - Configure
AWS_S3_BUCKET_NAMEand credentials - Aliyun OSS - Configure
OSS_BUCKET_NAMEand credentials
- API (FastAPI)
- Password hashing uses PBKDF2-HMAC-SHA256; no additional secrets required.
- Next.js (web/.env)
NEXTAUTH_SECRET– signing key for NextAuth JWT sessionsNEXTAUTH_URL– base URL of the Next.js app (e.g.http://localhost:3000)NEXT_PUBLIC_API_BASE_URL– base URL of the FastAPI backend (defaults tohttp://localhost:8000for local dev)
- NextAuth providers
- Optional Google OAuth: set
GOOGLE_CLIENT_ID/GOOGLE_CLIENT_SECRET
- Optional Google OAuth: set
- Installation Guide - Detailed setup instructions
- API Installation Guide - Backend-specific installation and configuration
- Backend Technical Stack - Python/FastAPI architecture
- API Documentation - Auto-generated API docs (when running)
- API Reference - Complete API reference and endpoints
- Pipeline Overview - High-level processing pipeline architecture
- Step Definitions - Detailed breakdown of processing steps
- Data Flow - Data flow and state management
- Configuration - Environment variables reference
- High Contrast Themes Improvements - Details about accessibility enhancements
- Claude Code Guide - Guidance for AI coding assistants working with this repository
MIT License - see LICENSE file for details
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a pull request
For issues and feature requests, please open an issue on GitHub.