-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Open
Description
Problem
The desktop app (macOS) bundles vendor API keys (DEEPGRAM_API_KEY, GEMINI_API_KEY) in the app bundle's .env file and calls external APIs directly from the client:
- Deepgram STT:
TranscriptionService.swiftconnects directly towss://api.deepgram.com/v1/listenwith the API key in the WebSocket auth header - Gemini:
GeminiClient.swiftandEmbeddingService.swiftcall Google APIs with the key in URL query parameters (?key=<KEY>)
Security risks:
- Keys are extractable from the app bundle (
Contents/Resources/.env— plain text) - Keys are visible in network traffic (auth headers, URL params)
- No per-user attribution, rate limiting, or revocation granularity
- Blast radius = full vendor account billing
Architectural inconsistency:
- Mobile app routes ALL audio through the Python backend's
/v4/listenWebSocket — API keys stay server-side - Desktop app bypasses the backend entirely for STT — keys ship in the client
- Desktop misses backend features: VAD gate (~75% Deepgram cost savings), speech profiles, speaker identification, unified billing/monitoring
Proposed Solution
Phase 1: Route desktop STT through /v4/listen
The Python backend already has a fully-featured /v4/listen WebSocket endpoint with Firebase auth, used by all mobile clients. Desktop should use it too.
Swift changes:
- Replace direct Deepgram WebSocket connection in
TranscriptionService.swiftwith a WebSocket connection to the backend's/v4/listen(or/v4/web/listenwhich supports first-message token auth) - Remove
DEEPGRAM_API_KEYfrom client-side.env - Desktop gets VAD gate, speech profiles, speaker ID for free
Backend changes:
- May need minor adjustments to handle desktop audio format (16kHz stereo PCM vs mobile's opus/pcm8)
- Add
source=desktopparameter for monitoring/billing segmentation
Phase 2: Route Gemini through backend endpoints
- Add backend API endpoints for the proactive assistant operations currently calling Gemini directly (embeddings, generation)
- Remove
GEMINI_API_KEYfrom client-side.env - Enables server-side rate limiting, cost tracking, prompt governance
Phase 3: Decommission direct API paths
- Remove direct Deepgram/Gemini code paths from desktop app
- Remove
.envbundling of vendor keys from build pipeline - Add CI check to block shipping vendor API keys in app bundles
Benefits
| Current (direct) | Proposed (backend proxy) | |
|---|---|---|
| API key exposure | Client-side, extractable | Server-side only |
| Cost visibility | Invisible to backend | Unified monitoring |
| VAD gate savings | Not available | ~75% Deepgram cost reduction |
| Speech profiles | Not available | Speaker identification |
| Rate limiting | None | Per-user/device/session |
| Key rotation | Requires app update | Server-side, instant |
| Provider flexibility | Hardcoded Deepgram | Backend can switch STT providers |
Latency Consideration
Adding a backend hop adds some latency. In practice, with persistent WebSocket connections and region colocation, the increase is modest relative to STT model inference + endpointing delays. Mitigated with dedicated streaming workers and autoscaling (same infra mobile already uses).
References
desktop/Desktop/Sources/TranscriptionService.swift— direct Deepgram connectiondesktop/Desktop/Sources/ProactiveAssistants/Core/GeminiClient.swift— direct Gemini callsbackend/routers/transcribe.py— existing/v4/listenendpointbackend/utils/stt/streaming.py— server-side STT providersbackend/utils/stt/vad_gate.py— VAD gate (active on mobile)
by AI for @beastoin
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels