SlideSpeaker API

Turn slides/PDFs into narrated videos — transcripts, TTS, subtitles, and optional avatars.

This repository now contains the FastAPI backend that powers SlideSpeaker. It exposes the task orchestration pipeline, handles transcription/TTS jobs, and serves generated media back to clients. The React/Next.js frontend has been moved into its own repository (slide-speaker-web/), ready to be published as a separate git project.

⚠️ Project Status

SlideSpeaker is under active development. Expect rapid iteration, breaking changes, and incomplete tooling while we work toward production readiness.

✨ Features

Automated script generation from slide decks or PDFs
Natural-sounding text-to-speech narration with configurable voices
Optional AI avatars synced to narration for presenter-style videos
Podcast-ready audio exports for sharing beyond video platforms
Subtitle outputs in VTT/SRT formats aligned to the narration
Task-based API that coordinates the full processing pipeline end-to-end
Responsive light, dark, and auto themes with per-user preferences
Global language switcher with localized UI labels and stored preferences
Hybrid authentication powered by NextAuth (Google OAuth + email/password) backed by FastAPI endpoints
WCAG 2.1 AA compliance with enhanced accessibility features
High contrast themes for both light and dark modes
Support for additional languages: Thai, Korean, and Japanese
Optimized task creation page and improved processing display
Enhanced web performance for better user experience
Modern state management with Zustand for improved frontend performance
Enhanced theme system with proper high contrast support

🚀 Quick Start (API)

cd api
uv sync                      # Install base dependencies
cp .env.example .env         # Create config file
# Edit .env to add your API keys
make dev                     # Start development server (port 8000)

Background Workers

cd api
make master-worker          # Start master process that spawns workers

User Management CLI

cd api
python scripts/user_cli.py list
python scripts/user_cli.py create --email you@example.com --password secret --name "You"

Use --help on any subcommand to see additional options (show, set-password, delete).

🌐 Frontend (Separate Repo)

The Next.js/React UI now lives in slide-speaker-web/ (generated beside this repository). Move it to its own git project and follow the instructions in slide-speaker-web/README.md to continue frontend development.

♿ Accessibility

SlideSpeaker is committed to providing an inclusive experience for all users:

WCAG 2.1 AA compliance for web accessibility standards
High contrast themes available for both light and dark modes
Enhanced focus indicators for keyboard navigation
Screen reader friendly interface
Support for multiple languages to serve a diverse user base

Visit:

http://localhost:8000/docs - API documentation

🛠️ Configuration

Essential API Keys

LLM (OpenAI) - Required for transcript generation
- OPENAI_API_KEY (required)
- Optional: OPENAI_BASE_URL (for custom endpoints)
- Optional: OPENAI_TIMEOUT, OPENAI_RETRIES, OPENAI_BACKOFF
Text-to-Speech
- TTS_SERVICE=openai|elevenlabs (defaults to openai)
- ElevenLabs requires ELEVENLABS_API_KEY
Avatar Generation (optional)
- HeyGen: HEYGEN_API_KEY
- OpenAI DALL-E: Uses your OPENAI_API_KEY
Storage
- Defaults to local filesystem
- For cloud storage, configure S3 or OSS in .env

Storage Options

SlideSpeaker supports multiple storage backends:

Local - Default, stores files in api/output/
AWS S3 - Configure AWS_S3_BUCKET_NAME and credentials
Aliyun OSS - Configure OSS_BUCKET_NAME and credentials

Authentication

API (FastAPI)
- Password hashing uses PBKDF2-HMAC-SHA256; no additional secrets required.
Next.js (web/.env)
- NEXTAUTH_SECRET – signing key for NextAuth JWT sessions
- NEXTAUTH_URL – base URL of the Next.js app (e.g. http://localhost:3000)
- NEXT_PUBLIC_API_BASE_URL – base URL of the FastAPI backend (defaults to http://localhost:8000 for local dev)
NextAuth providers
- Optional Google OAuth: set GOOGLE_CLIENT_ID / GOOGLE_CLIENT_SECRET

📚 Documentation

Installation Guide - Detailed setup instructions
API Installation Guide - Backend-specific installation and configuration
Backend Technical Stack - Python/FastAPI architecture
API Documentation - Auto-generated API docs (when running)
API Reference - Complete API reference and endpoints
Pipeline Overview - High-level processing pipeline architecture
Step Definitions - Detailed breakdown of processing steps
Data Flow - Data flow and state management
Configuration - Environment variables reference
High Contrast Themes Improvements - Details about accessibility enhancements
Claude Code Guide - Guidance for AI coding assistants working with this repository

📄 License

MIT License - see LICENSE file for details

🤝 Contributing

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a pull request

🆘 Support

For issues and feature requests, please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
.claude		.claude
__mocks__/src/services		__mocks__/src/services
docs		docs
migrations		migrations
scripts		scripts
slidespeaker		slidespeaker
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
LICENSE		LICENSE
Makefile		Makefile
QWEN.md		QWEN.md
README.md		README.md
alembic.ini		alembic.ini
cli.py		cli.py
master_worker.py		master_worker.py
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
server.py		server.py
worker.py		worker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SlideSpeaker API

⚠️ Project Status

✨ Features

🚀 Quick Start (API)

Background Workers

User Management CLI

🌐 Frontend (Separate Repo)

♿ Accessibility

🛠️ Configuration

Essential API Keys

Storage Options

Authentication

📚 Documentation

📄 License

🤝 Contributing

🆘 Support

About

Uh oh!

Releases

Packages

Languages

License

droxer/slide-speaker-core

Folders and files

Latest commit

History

Repository files navigation

SlideSpeaker API

⚠️ Project Status

✨ Features

🚀 Quick Start (API)

Background Workers

User Management CLI

🌐 Frontend (Separate Repo)

♿ Accessibility

🛠️ Configuration

Essential API Keys

Storage Options

Authentication

📚 Documentation

📄 License

🤝 Contributing

🆘 Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages