Desktop: move proactive AI to /v4/listen, remove GEMINI_API_KEY

## Problem

Desktop macOS app bundles `GEMINI_API_KEY` in plain text `.env` and calls Google Gemini APIs directly from the client for all proactive AI features:

- **GeminiClient.swift** (1,450 lines) — 9 callers across ProactiveAssistants + LiveNotes. Calls `generativelanguage.googleapis.com/v1beta/models/{model}:generateContent?key=<KEY>`. Uses structured JSON output, tool-calling loops, image+text, streaming SSE.
- **EmbeddingService.swift** (315 lines) — Calls `embedContent` and `batchEmbedContents` with key in URL. Used by OCREmbeddingService + TaskAssistant.
- **Local SQLite** stores all results (tasks, memories, focus sessions, embeddings) — should use Firestore/Pinecone like mobile.

**Security risks:** Same as #5393 Phase 1 (extractable keys, no per-user attribution, blast radius = full vendor billing).

**Architectural inconsistency:** Mobile routes ALL AI through backend. Desktop bypasses backend entirely, duplicating server-side capabilities that already exist in production.

## Proposed Solution

**Extend `/v4/listen` WebSocket** to handle desktop's proactive AI needs. Desktop becomes a thin client — same pattern as mobile.

### Why /v4/listen (not new endpoints)
- Desktop already connects to `/v4/listen` for STT (PR #5395)
- Auth, heartbeat, reconnection, data protection all handled
- Image chunk protocol already exists
- Bidirectional — backend can push results back asynchronously
- Single connection for all desktop↔backend communication

### New WebSocket Message Types

**Client → Server:**

| Message Type | Purpose | Payload |
|---|---|---|
| `screen_frame` | Screenshot for analysis | `{frame_id, image_b64, app_name, window_title, ocr_text?, analyze: ["focus","tasks","memories","advice"]}` |
| `live_notes_text` | Transcript → note | `{text, session_context}` |
| `profile_request` | Generate user profile | `{}` |
| `task_rerank` | Re-prioritize tasks | `{}` |
| `task_dedup` | Deduplicate tasks | `{}` |

**Server → Client:**

| Message Type | Purpose | Payload |
|---|---|---|
| `focus_result` | Focus detection | `{frame_id, status, app_or_site, description, message}` |
| `tasks_extracted` | Tasks from screenshot | `{frame_id, tasks: [{id, description, priority, confidence, source_app, due_at}]}` |
| `memories_extracted` | Memories from screenshot | `{frame_id, memories: [{id, content, category, confidence}]}` |
| `advice_extracted` | Proactive advice | `{frame_id, advice: {id, content, category, confidence}}` |
| `live_note` | Generated note | `{text}` |
| `profile_updated` | User profile | `{profile_text}` |
| `rerank_complete` | Tasks re-ranked | `{updated_tasks: [{id, new_position}]}` |
| `dedup_complete` | Duplicates removed | `{deleted_ids, reason}` |

### Storage Migration

| Desktop SQLite | → Cloud Storage | Status |
|---|---|---|
| action_items | `users/{uid}/action_items` (Firestore) | EXISTS |
| memories (incl. advice) | `users/{uid}/memories` (Firestore) | EXISTS |
| conversations | `users/{uid}/conversations` (Firestore) | EXISTS |
| goals | `users/{uid}/goals` (Firestore) | EXISTS |
| focus_sessions | `users/{uid}/focus_sessions` (Firestore) | **NEW** |
| action_items.embedding | Pinecone vectors | REUSE existing infra |
| screenshots.embedding | Pinecone ns3 | REUSE (already syncs) |

### Backend Reuse

| Desktop Feature | Backend Equivalent (PRODUCTION) |
|---|---|
| Memory extraction | `new_memories_extractor()` in `utils/llm/memories.py` |
| Action item extraction + dedup | `extract_action_items()` in `utils/llm/conversation_processing.py` |
| Goal progress detection | `extract_and_update_goal_progress()` in `utils/llm/goals.py` |
| User profile | Persona generation in `utils/llm/persona.py` |
| Data protection | AES-256-GCM encryption in `utils/encryption.py` |
| Vector search | Pinecone via `database/vector_db.py` |

**New backend work:** Vision LLM handlers for screenshot analysis (focus, task extraction, memory extraction, advice).

## Subtasks

### Backend (Python)
- [ ] Add message dispatcher for new types in `_stream_handler()` (transcribe.py)
- [ ] Implement `handle_screen_frame()` — routes to analysis handlers in parallel
- [ ] Implement focus analysis (vision LLM → `focus_result`)
- [ ] Implement task extraction (vision LLM + Firestore dedup + Pinecone similarity → `tasks_extracted`)
- [ ] Implement memory extraction from screenshots (vision LLM → `memories_extracted`)
- [ ] Implement advice extraction (vision LLM → `advice_extracted`)
- [ ] Implement live notes handler (text LLM → `live_note`)
- [ ] Implement task re-ranking handler (Firestore fetch + LLM → `rerank_complete`)
- [ ] Implement task dedup handler (Firestore + Pinecone + LLM → `dedup_complete`)
- [ ] Implement profile generation handler (multi-source fetch + LLM → `profile_updated`)
- [ ] Add `focus_sessions` Firestore collection with data protection decorators
- [ ] Add frame_id + idempotency for duplicate frame handling

### Desktop (Swift)
- [ ] Add `sendJSON()` method to BackendTranscriptionService for text messages
- [ ] Add response handlers for all new server→client message types
- [ ] Rewrite 9 assistants as thin WebSocket message senders
- [ ] Remove `GeminiClient.swift` (1,450 lines)
- [ ] Remove `EmbeddingService.swift` (315 lines)
- [ ] Remove `GEMINI_API_KEY` from `.env.example` and `loadEnvironment()`
- [ ] Replace local SQLite reads with Firestore-cached data where applicable

### Testing
- [ ] End-to-end test per analysis type (focus, tasks, memories, advice, notes)
- [ ] Latency benchmarks (focus detection target: <3s including network hop)
- [ ] Load test screenshot bandwidth (adaptive quality/cadence)

## Codex Review Summary

**Scores:** Correctness 6/10, Simplicity 3/10, Completeness 5/10

**Key gaps to address during implementation:**
1. Protocol versioning and typed schemas per message type
2. Backpressure — audio and vision on same WS need priority lanes
3. Bandwidth strategy — adaptive screenshot quality/cadence, skip unchanged context
4. Failure modes — partial outages, retries, idempotency
5. Monolith risk — refactor transcribe.py into message dispatcher + per-capability handlers
6. Local cache for offline mode (SQLite stays as cache, Firestore is source of truth)
7. Privacy controls — PII/sensitive window filtering, user consent for screenshot upload

## References

- Issue #5393 (Phase 1: STT migration — PR #5395)
- `desktop/Desktop/Sources/ProactiveAssistants/Core/GeminiClient.swift`
- `desktop/Desktop/Sources/ProactiveAssistants/Services/EmbeddingService.swift`
- `backend/routers/transcribe.py` — `/v4/listen` handler
- `backend/utils/llm/` — existing LLM infrastructure
- `backend/utils/encryption.py` — data protection

_by AI for @beastoin_

Message Type	Purpose	Payload
`screen_frame`	Screenshot for analysis	`{frame_id, image_b64, app_name, window_title, ocr_text?, analyze: ["focus","tasks","memories","advice"]}`
`live_notes_text`	Transcript → note	`{text, session_context}`
`profile_request`	Generate user profile	`{}`
`task_rerank`	Re-prioritize tasks	`{}`
`task_dedup`	Deduplicate tasks	`{}`

Message Type	Purpose	Payload
`focus_result`	Focus detection	`{frame_id, status, app_or_site, description, message}`
`tasks_extracted`	Tasks from screenshot	`{frame_id, tasks: [{id, description, priority, confidence, source_app, due_at}]}`
`memories_extracted`	Memories from screenshot	`{frame_id, memories: [{id, content, category, confidence}]}`
`advice_extracted`	Proactive advice	`{frame_id, advice: {id, content, category, confidence}}`
`live_note`	Generated note	`{text}`
`profile_updated`	User profile	`{profile_text}`
`rerank_complete`	Tasks re-ranked	`{updated_tasks: [{id, new_position}]}`
`dedup_complete`	Duplicates removed	`{deleted_ids, reason}`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Desktop: move proactive AI to /v4/listen, remove GEMINI_API_KEY #5396

Problem

Proposed Solution

Why /v4/listen (not new endpoints)

New WebSocket Message Types

Storage Migration

Backend Reuse

Subtasks

Backend (Python)

Desktop (Swift)

Testing

Codex Review Summary

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Desktop SQLite	→ Cloud Storage	Status
action_items	`users/{uid}/action_items` (Firestore)	EXISTS
memories (incl. advice)	`users/{uid}/memories` (Firestore)	EXISTS
conversations	`users/{uid}/conversations` (Firestore)	EXISTS
goals	`users/{uid}/goals` (Firestore)	EXISTS
focus_sessions	`users/{uid}/focus_sessions` (Firestore)	NEW
action_items.embedding	Pinecone vectors	REUSE existing infra
screenshots.embedding	Pinecone ns3	REUSE (already syncs)

Desktop Feature	Backend Equivalent (PRODUCTION)
Memory extraction	`new_memories_extractor()` in `utils/llm/memories.py`
Action item extraction + dedup	`extract_action_items()` in `utils/llm/conversation_processing.py`
Goal progress detection	`extract_and_update_goal_progress()` in `utils/llm/goals.py`
User profile	Persona generation in `utils/llm/persona.py`
Data protection	AES-256-GCM encryption in `utils/encryption.py`
Vector search	Pinecone via `database/vector_db.py`

Desktop: move proactive AI to /v4/listen, remove GEMINI_API_KEY #5396

Description

Problem

Proposed Solution

Why /v4/listen (not new endpoints)

New WebSocket Message Types

Storage Migration

Backend Reuse

Subtasks

Backend (Python)

Desktop (Swift)

Testing

Codex Review Summary

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions