Skip to content

SlideSpeaker is an AI app that converts your slides into engaging videos and Podcasts with narration and avatars.

License

Notifications You must be signed in to change notification settings

droxer/slide-speaker-core

Repository files navigation

SlideSpeaker API

Turn slides/PDFs into narrated videos — transcripts, TTS, subtitles, and optional avatars.

This repository now contains the FastAPI backend that powers SlideSpeaker. It exposes the task orchestration pipeline, handles transcription/TTS jobs, and serves generated media back to clients. The React/Next.js frontend has been moved into its own repository (slide-speaker-web/), ready to be published as a separate git project.

⚠️ Project Status

SlideSpeaker is under active development. Expect rapid iteration, breaking changes, and incomplete tooling while we work toward production readiness.

✨ Features

  • Automated script generation from slide decks or PDFs
  • Natural-sounding text-to-speech narration with configurable voices
  • Optional AI avatars synced to narration for presenter-style videos
  • Podcast-ready audio exports for sharing beyond video platforms
  • Subtitle outputs in VTT/SRT formats aligned to the narration
  • Task-based API that coordinates the full processing pipeline end-to-end
  • Responsive light, dark, and auto themes with per-user preferences
  • Global language switcher with localized UI labels and stored preferences
  • Hybrid authentication powered by NextAuth (Google OAuth + email/password) backed by FastAPI endpoints
  • WCAG 2.1 AA compliance with enhanced accessibility features
  • High contrast themes for both light and dark modes
  • Support for additional languages: Thai, Korean, and Japanese
  • Optimized task creation page and improved processing display
  • Enhanced web performance for better user experience
  • Modern state management with Zustand for improved frontend performance
  • Enhanced theme system with proper high contrast support

🚀 Quick Start (API)

cd api
uv sync                      # Install base dependencies
cp .env.example .env         # Create config file
# Edit .env to add your API keys
make dev                     # Start development server (port 8000)

Background Workers

cd api
make master-worker          # Start master process that spawns workers

User Management CLI

cd api
python scripts/user_cli.py list
python scripts/user_cli.py create --email you@example.com --password secret --name "You"

Use --help on any subcommand to see additional options (show, set-password, delete).

🌐 Frontend (Separate Repo)

The Next.js/React UI now lives in slide-speaker-web/ (generated beside this repository). Move it to its own git project and follow the instructions in slide-speaker-web/README.md to continue frontend development.

♿ Accessibility

SlideSpeaker is committed to providing an inclusive experience for all users:

  • WCAG 2.1 AA compliance for web accessibility standards
  • High contrast themes available for both light and dark modes
  • Enhanced focus indicators for keyboard navigation
  • Screen reader friendly interface
  • Support for multiple languages to serve a diverse user base

Visit:

  • http://localhost:8000/docs - API documentation

🛠️ Configuration

Essential API Keys

  • LLM (OpenAI) - Required for transcript generation

    • OPENAI_API_KEY (required)
    • Optional: OPENAI_BASE_URL (for custom endpoints)
    • Optional: OPENAI_TIMEOUT, OPENAI_RETRIES, OPENAI_BACKOFF
  • Text-to-Speech

    • TTS_SERVICE=openai|elevenlabs (defaults to openai)
    • ElevenLabs requires ELEVENLABS_API_KEY
  • Avatar Generation (optional)

    • HeyGen: HEYGEN_API_KEY
    • OpenAI DALL-E: Uses your OPENAI_API_KEY
  • Storage

    • Defaults to local filesystem
    • For cloud storage, configure S3 or OSS in .env

Storage Options

SlideSpeaker supports multiple storage backends:

  • Local - Default, stores files in api/output/
  • AWS S3 - Configure AWS_S3_BUCKET_NAME and credentials
  • Aliyun OSS - Configure OSS_BUCKET_NAME and credentials

Authentication

  • API (FastAPI)
    • Password hashing uses PBKDF2-HMAC-SHA256; no additional secrets required.
  • Next.js (web/.env)
    • NEXTAUTH_SECRET – signing key for NextAuth JWT sessions
    • NEXTAUTH_URL – base URL of the Next.js app (e.g. http://localhost:3000)
    • NEXT_PUBLIC_API_BASE_URL – base URL of the FastAPI backend (defaults to http://localhost:8000 for local dev)
  • NextAuth providers
    • Optional Google OAuth: set GOOGLE_CLIENT_ID / GOOGLE_CLIENT_SECRET

📚 Documentation

📄 License

MIT License - see LICENSE file for details

🤝 Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a pull request

🆘 Support

For issues and feature requests, please open an issue on GitHub.

About

SlideSpeaker is an AI app that converts your slides into engaging videos and Podcasts with narration and avatars.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages