-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Summary
OpenAI costs spiked +51% in 2 days (Feb 10 → Feb 12) while DAU remained flat. Root cause investigation traced to two code changes — one is a call multiplier bug, the other is expected new-feature cost.
Root Cause #1 (MAJOR): Goal extraction O(N) loop — PR #4778
extract_and_update_goal_progress() in backend/utils/llm/goals.py:238-304 now loops over ALL active goals, making a separate llm_mini (gpt-4.1-mini) call per goal:
# goals.py:249-275 — ONE LLM call per goal in the loop
for goal in goals: # Was: single goal lookup via get_user_goal()
prompt = f"""Analyze this message... Goal: "{goal_title}"..."""
with track_usage(uid, Features.GOALS):
response = llm_mini.invoke(prompt).content # N calls instead of 1Why this matters
Called from two hot paths on every user interaction:
routers/chat.py:109— Every chat message (fires in background thread)utils/conversations/process_conversation.py:369— Every conversation processing
Before #4778: 1 LLM call per message/conversation
After #4778: N calls where N = user's active goals (typically 1-3)
This directly explains:
- gpt-4.1-mini requests: +33%
- gpt-4.1-mini input tokens: +59% — each extra goal sends a full prompt
Proposed fix
Rewrite extract_and_update_goal_progress to evaluate ALL goals in one LLM call using structured output:
# Instead of N separate calls:
prompt = f"""Analyze this message for progress toward ANY of these goals:
{json.dumps([{"id": g["id"], "title": g["title"], ...} for g in goals])}
User Message: "{text[:500]}"
Return JSON array: [{{"goal_id": "...", "found": true/false, "value": number_or_null}}]
"""
response = llm_mini.invoke(prompt).content # 1 call regardless of goal countEstimated savings: 15-25% of gpt-4.1-mini daily spend.
Root Cause #2 (MODERATE): Proactive mentor notifications — PR #4735
New feature (merged Feb 11) adds function-calling triggers during real-time conversation streaming. This is expected new-feature cost, not a bug.
Call chain: transcribe.py / pusher.py → trigger_realtime_integrations() → process_mentor_notification() → _process_proactive_notification() which makes two LLM calls per trigger:
_process_triggers()—llm_mini.bind_tools()with tool_choice="auto"get_proactive_message()— notification text generation
Rate limited at 30s per user (PROACTIVE_NOTI_LIMIT_SECONDS), but across all active users this adds significant volume.
Optimization opportunity
When _process_triggers() returns no results (confidence below 0.7), the code still calls get_proactive_message(). Consider skipping the second call when triggers don't fire.
Root Cause #3 (MINOR): gpt-5.2 model appearing
A small number of requests appearing on a model not referenced anywhere in the codebase. Possible sources:
- LangSmith
omi-agentic-systemprompt A/B test specifying a different model - OpenAI auto-routing gpt-5.1 → 5.2 during model upgrade rollout
- External app integration using the org API key
Needs investigation in OpenAI dashboard and LangSmith.
Impact breakdown
| Cause | PR | Model | Type | Priority |
|---|---|---|---|---|
| Goal extraction O(N) loop | #4778 | gpt-4.1-mini | Bug — call multiplier | Fix immediately |
| Proactive mentor notifications | #4735 | gpt-4.1-mini | Expected (new feature) | Optimize |
| gpt-5.2 mystery | Unknown | gpt-5.2 | Unknown | Investigate |
| Larger goal section in prompt | #4778 | gpt-5.1 | Token increase | Low |
Acceptance criteria
-
extract_and_update_goal_progressmakes exactly 1 LLM call regardless of goal count - All existing goal extraction tests pass
- gpt-4.1-mini daily request percentage drops back toward pre-Fix chat to use all active goals instead of only the first one #4778 baseline