I cloned my voice, gave it a knowledge base, and have it answer intro chats on my behalf.
Will someone use my voice to trick my mom? Hopefully not. Just wanna play around with multimodal AI, adding spice to audio by having it engage in conversation, referencing some data source, and having it talk to my friends.
- Clone and install dependencies
yarn install-
Add environment variables - see
.env.examplein each package. -
Visit
http://localhost:3000and watch the magic unfold
Standard Next.js web app. Frontend creates a Daily.co room, makes some Postgres writes, and faciliates exchange of audio between client and server.
Dockerized FastAPI app. Makes calls to various AI providers like OpenAI for GPT 4o, ElevenLabs for TTS, Deepgram for realtime audio transcription.
Use Postgres - for obvious reasons, and Prisma ORM because it offers both node and python clients, so the same schema can be reused across multiple packages.

