fix: goal extraction O(N) LLM calls causing +51% OpenAI cost spike

## Summary

OpenAI costs spiked **+51% in 2 days** (Feb 10 → Feb 12) while DAU remained flat. Root cause investigation traced to two code changes — one is a call multiplier bug, the other is expected new-feature cost.

## Root Cause #1 (MAJOR): Goal extraction O(N) loop — PR #4778

`extract_and_update_goal_progress()` in `backend/utils/llm/goals.py:238-304` now loops over **ALL active goals**, making a **separate `llm_mini` (gpt-4.1-mini) call per goal**:

```python
# goals.py:249-275 — ONE LLM call per goal in the loop
for goal in goals:  # Was: single goal lookup via get_user_goal()
    prompt = f"""Analyze this message... Goal: "{goal_title}"..."""
    with track_usage(uid, Features.GOALS):
        response = llm_mini.invoke(prompt).content  # N calls instead of 1
```

### Why this matters

Called from **two hot paths** on every user interaction:
- `routers/chat.py:109` — Every chat message (fires in background thread)
- `utils/conversations/process_conversation.py:369` — Every conversation processing

**Before #4778:** 1 LLM call per message/conversation  
**After #4778:** N calls where N = user's active goals (typically 1-3)

This directly explains:
- **gpt-4.1-mini requests: +33%**
- **gpt-4.1-mini input tokens: +59%** — each extra goal sends a full prompt

### Proposed fix

Rewrite `extract_and_update_goal_progress` to evaluate ALL goals in **one LLM call** using structured output:

```python
# Instead of N separate calls:
prompt = f"""Analyze this message for progress toward ANY of these goals:
{json.dumps([{"id": g["id"], "title": g["title"], ...} for g in goals])}

User Message: "{text[:500]}"

Return JSON array: [{{"goal_id": "...", "found": true/false, "value": number_or_null}}]
"""
response = llm_mini.invoke(prompt).content  # 1 call regardless of goal count
```

**Estimated savings:** 15-25% of gpt-4.1-mini daily spend.

---

## Root Cause #2 (MODERATE): Proactive mentor notifications — PR #4735

New feature (merged Feb 11) adds function-calling triggers during real-time conversation streaming. This is **expected new-feature cost**, not a bug.

Call chain: `transcribe.py` / `pusher.py` → `trigger_realtime_integrations()` → `process_mentor_notification()` → `_process_proactive_notification()` which makes **two LLM calls per trigger**:
1. `_process_triggers()` — `llm_mini.bind_tools()` with tool_choice="auto"  
2. `get_proactive_message()` — notification text generation

Rate limited at 30s per user (`PROACTIVE_NOTI_LIMIT_SECONDS`), but across all active users this adds significant volume.

### Optimization opportunity

When `_process_triggers()` returns no results (confidence below 0.7), the code still calls `get_proactive_message()`. Consider skipping the second call when triggers don't fire.

---

## Root Cause #3 (MINOR): gpt-5.2 model appearing

A small number of requests appearing on a model not referenced anywhere in the codebase. Possible sources:
- LangSmith `omi-agentic-system` prompt A/B test specifying a different model
- OpenAI auto-routing gpt-5.1 → 5.2 during model upgrade rollout
- External app integration using the org API key

Needs investigation in OpenAI dashboard and LangSmith.

---

## Impact breakdown

| Cause | PR | Model | Type | Priority |
|-------|-----|-------|------|----------|
| Goal extraction O(N) loop | #4778 | gpt-4.1-mini | Bug — call multiplier | **Fix immediately** |
| Proactive mentor notifications | #4735 | gpt-4.1-mini | Expected (new feature) | Optimize |
| gpt-5.2 mystery | Unknown | gpt-5.2 | Unknown | Investigate |
| Larger goal section in prompt | #4778 | gpt-5.1 | Token increase | Low |

## Acceptance criteria

- [ ] `extract_and_update_goal_progress` makes exactly 1 LLM call regardless of goal count
- [ ] All existing goal extraction tests pass
- [ ] gpt-4.1-mini daily request percentage drops back toward pre-#4778 baseline

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: goal extraction O(N) LLM calls causing +51% OpenAI cost spike #4789

Summary

Root Cause #1 (MAJOR): Goal extraction O(N) loop — PR #4778

Why this matters

Proposed fix

Root Cause #2 (MODERATE): Proactive mentor notifications — PR #4735

Optimization opportunity

Root Cause #3 (MINOR): gpt-5.2 model appearing

Impact breakdown

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cause	PR	Model	Type	Priority
Goal extraction O(N) loop	#4778	gpt-4.1-mini	Bug — call multiplier	Fix immediately
Proactive mentor notifications	#4735	gpt-4.1-mini	Expected (new feature)	Optimize
gpt-5.2 mystery	Unknown	gpt-5.2	Unknown	Investigate
Larger goal section in prompt	#4778	gpt-5.1	Token increase	Low

fix: goal extraction O(N) LLM calls causing +51% OpenAI cost spike #4789

Description

Summary

Root Cause #1 (MAJOR): Goal extraction O(N) loop — PR #4778

Why this matters

Proposed fix

Root Cause #2 (MODERATE): Proactive mentor notifications — PR #4735

Optimization opportunity

Root Cause #3 (MINOR): gpt-5.2 model appearing

Impact breakdown

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions