Skip to content

feat: expand to multi-capability studio server#2

Merged
willgriffin merged 2 commits intomainfrom
feat/modular-backends
Jan 26, 2026
Merged

feat: expand to multi-capability studio server#2
willgriffin merged 2 commits intomainfrom
feat/modular-backends

Conversation

@willgriffin
Copy link
Contributor

Summary

Transforms tts-server into a multi-capability studio server with modular backends.

New Capabilities

Capability Backend Description
TTS Qwen3-TTS Text-to-speech with voice cloning
Face InsightFace Face embedding extraction for IP-Adapter FaceID
Transcription Whisper Audio transcription with word-level timestamps

Architecture

  • Modular backend system with abstract base classes in backends/
  • Each capability can be enabled/disabled via environment variables
  • Mock TTS backend for development without GPU

API Endpoints

TTS:

  • GET /v1/tts/speakers - List available speakers
  • POST /v1/tts/extract - Extract voice prompt from audio
  • POST /v1/tts/synthesize - Synthesize speech

Face:

  • POST /v1/face/embed - Extract face embedding
  • POST /v1/face/embed-all - Extract all faces
  • POST /v1/face/compare - Compare embeddings

Transcription:

  • POST /v1/transcribe - Transcribe audio with word timings

Breaking Changes

  • Removed legacy backwards-compatibility endpoints (/v1/audio/speech, /v1/voice/extract, /v1/speakers)
  • Histrio client updated to use new routes

Test Plan

  • All 12 tests pass
  • Integration test with histrio dev server
  • Deploy to K8s cluster

Transforms tts-server into studio-server with modular backends:

## New Capabilities
- **TTS**: Text-to-speech with voice cloning (Qwen3-TTS)
- **Face**: Face embedding extraction for IP-Adapter FaceID (InsightFace)
- **Transcription**: Audio transcription with word timings (Whisper)

## Architecture
- Modular backend system with abstract base classes
- Each capability can be enabled/disabled via environment variables
- Mock TTS backend for development without GPU

## API Changes
- New endpoints: /v1/tts/*, /v1/face/*, /v1/transcribe
- Removed legacy backwards-compatibility endpoints
- Clean REST API structure

## Environment Variables
- TTS_BACKEND: qwen3-tts (default) or mock
- FACE_ENABLED: true/false
- TRANSCRIPTION_ENABLED: true/false
@willgriffin willgriffin merged commit 3b8e619 into main Jan 26, 2026
2 checks passed
@willgriffin willgriffin deleted the feat/modular-backends branch January 26, 2026 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant