🎙️ Virtual Podcast Studio

A full-stack workspace for turning research papers into AI-assisted podcast episodes. The project combines a FastAPI backend for arXiv ingestion with a Next.js frontend that handles realtime AI conversations and guides creators from discovery through post-production.

✨ Key Capabilities

Research Hub – Accessible topic toggles, deduplicated arXiv results, and a handoff flow that persists the selected paper to the Audio Studio.
Audio Studio – Microphone capture with real-time speech-to-text transcription (server VAD), AI responses from OpenAI Realtime API via Server-Sent Events streams, live transcript display with typing animations, synchronized audio playback, and export options (transcript download, WAV/ZIP bundle with separate host/AI tracks).
Post-production dashboards – Video Studio timeline editor, Library, Publisher, and Analytics pages that consume saved conversations for mock editing/publishing workflows.
Workspace settings – Sidebar collapse state and API provider/credential management stored via React contexts, with localStorage hydration for persistent preferences.

📁 Repository Structure

Path	Description
`backend/`	FastAPI app (`main.py`) with `/api/papers` and `/health` endpoints.
`podcast-studio/`	Next.js 15 App Router frontend (Research Hub, Audio Studio, Video Studio, Library, Publisher, Analytics) with realtime API routes that manage OpenAI connections server-side.
`quick_health_check.py`	CLI helper that verifies `backend/.env` includes `OPENAI_API_KEY` and confirms the FastAPI `/health` endpoint is reachable.
`README.md`	This guide.

🔄 Architecture at a Glance

Browser (Research Hub & Studios)
   │  
   │  fetch /api/papers
   ▼
Next.js API route ──► FastAPI (`/api/papers`) ──► arXiv API
   │
   ├─► Realtime routes (`/api/rt/*`) ──► RT session manager ──► OpenAI Realtime API
   │     • POST /api/rt/start         (start session)
   │     • POST /api/rt/audio-append  (send mic audio)
   │     • GET  /api/rt/audio         (SSE: AI audio)
   │     • GET  /api/rt/transcripts   (SSE: AI text)
   │     • GET  /api/rt/user-transcripts (SSE: user speech)
   │     • POST /api/rt/stop          (end session)
   │
   └─► sessionStorage (selected paper / saved conversation)

Flow

Topic discovery – The Research Hub posts to POST /api/papers. A Next.js proxy validates the payload and forwards it to FastAPI, which sanitises each topic, queries arXiv, de-duplicates results, and returns the newest papers first.
Realtime session – The Audio Studio starts a session via POST /api/rt/start, then opens SSE streams for audio and transcripts. Microphone audio is captured client-side and sent via POST /api/rt/audio-append. The Next.js server maintains a WebSocket connection to OpenAI's Realtime API and bridges events to the client via SSE streams.
Conversation storage – Finished sessions are serialised with src/lib/conversationStorage.ts, saved in sessionStorage, and consumed by the Video Studio, Library, and Publisher pages.

⚙️ Setup

Prerequisites

Node.js 18+ (20 recommended)
Python 3.8+ (only needed for papers API)
OpenAI API key with access to the Realtime API

1. Configure Environment Variables

Create podcast-studio/.env.local with:

OPENAI_API_KEY=sk-...          # Required for realtime conversations
# Optional: OPENAI_REALTIME_MODEL, OPENAI_REALTIME_VOICE
# Optional: BACKEND_URL, NEXT_PUBLIC_BACKEND_URL (default http://localhost:8000)

Create backend/.env (only needed for papers API):

OPENAI_API_KEY=sk-...          # Optional, only if using backend directly
# Optional: ALLOWED_ORIGINS

2. Install Dependencies

# Frontend (required)
cd podcast-studio
npm install

# Backend (optional, only for papers API)
cd ../backend
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

3. Run the Stack

# Terminal 1 – Next.js frontend (required)
cd podcast-studio
npm run dev

# Terminal 2 – FastAPI backend (optional, only for papers)
cd backend
source venv/bin/activate
uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Visit http://localhost:3000/studio to open the Audio Studio (or / for the Research Hub).

Note: The Python backend is only required for the papers API. Realtime conversations are handled entirely by the Next.js server.

4. Quick Health Check

Run python quick_health_check.py from the repo root to verify the backend is running and the OpenAI key is configured. The frontend also exposes GET /api/test-openai to confirm credentials can list models.

🧠 Realtime Workflow Cheat Sheet

Select a paper in the Research Hub. The card's "Start Audio Studio" button stores the selection in sessionStorage and navigates to /studio.
In the Audio Studio, click Start Live Session. The page will:
- Call POST /api/rt/start to initialize the session
- Open SSE streams for audio and transcripts
- Start microphone capture
Speak naturally – your audio is sent to /api/rt/audio-append and transcribed via OpenAI's VAD. The AI responds with audio and text streamed back via SSE.
Use the controls to export transcripts, download audio bundles, or send the session to the Video Studio for post-production.
Click End Session when finished – this stops microphone capture, tears down SSE streams, and saves the conversation for downstream pages.

🧪 Development Scripts

Location	Command	Purpose
`podcast-studio/`	`npm run dev`	Start the Next.js dev server (Turbopack).
	`npm run build`	Verify the production bundle.
	`npm run lint`	Run ESLint (flat config).
	`npm run start`	Serve the production build.
`backend/`	`uvicorn main:app --reload`	Run the FastAPI server with auto-reload.
Repo root	`python quick_health_check.py`	Validate configuration + backend health.

🛠️ Troubleshooting

Backend not reachable – Ensure uvicorn is running on port 8000. If you changed the port, update BACKEND_URL / NEXT_PUBLIC_BACKEND_URL. Note: The backend is only needed for the papers API.
Realtime session errors – Confirm /api/rt/start returns { ok: true } in the network tab and that your API key has access to OpenAI Realtime. Missing keys surface as HTTP 400/503 responses.
No audio or transcript – Check browser microphone permissions and verify SSE endpoints (audio, transcripts, user transcripts) are open in the network inspector. Look for EventSource connections in the Network tab with readyState: 1 (open).
Transcription not appearing – Verify the realtime session is active and SSE streams are connected. Check the browser console for connection errors.
Workspace settings not persisting – Only provider + model selections persist via localStorage. API keys intentionally reset on refresh for security.

🤝 Contributing

Fork the repository and create a feature branch (no force pushes to main).
Keep backend/frontend schemas in sync and update the relevant AGENT.md + README sections.
Run npm run lint and exercise the realtime workflow before opening a PR.
Submit the PR with a clear summary of changes.

📄 License

MIT License – see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
.cursor		.cursor
.vscode		.vscode
backend		backend
podcast-studio		podcast-studio
.env.example		.env.example
.gitignore		.gitignore
AGENT.md		AGENT.md
LICENSE		LICENSE
README.md		README.md
quick_health_check.py		quick_health_check.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Virtual Podcast Studio

✨ Key Capabilities

📁 Repository Structure

🔄 Architecture at a Glance

Flow

⚙️ Setup

Prerequisites

1. Configure Environment Variables

2. Install Dependencies

3. Run the Stack

4. Quick Health Check

🧠 Realtime Workflow Cheat Sheet

🧪 Development Scripts

🛠️ Troubleshooting

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

sergiopesch/virtualpodcaststudio

Folders and files

Latest commit

History

Repository files navigation

🎙️ Virtual Podcast Studio

✨ Key Capabilities

📁 Repository Structure

🔄 Architecture at a Glance

Flow

⚙️ Setup

Prerequisites

1. Configure Environment Variables

2. Install Dependencies

3. Run the Stack

4. Quick Health Check

🧠 Realtime Workflow Cheat Sheet

🧪 Development Scripts

🛠️ Troubleshooting

🤝 Contributing

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages