Skip to content

fix: goal extraction O(N) LLM calls causing +51% OpenAI cost spike #4789

@beastoin

Description

@beastoin

Summary

OpenAI costs spiked +51% in 2 days (Feb 10 → Feb 12) while DAU remained flat. Root cause investigation traced to two code changes — one is a call multiplier bug, the other is expected new-feature cost.

Root Cause #1 (MAJOR): Goal extraction O(N) loop — PR #4778

extract_and_update_goal_progress() in backend/utils/llm/goals.py:238-304 now loops over ALL active goals, making a separate llm_mini (gpt-4.1-mini) call per goal:

# goals.py:249-275 — ONE LLM call per goal in the loop
for goal in goals:  # Was: single goal lookup via get_user_goal()
    prompt = f"""Analyze this message... Goal: "{goal_title}"..."""
    with track_usage(uid, Features.GOALS):
        response = llm_mini.invoke(prompt).content  # N calls instead of 1

Why this matters

Called from two hot paths on every user interaction:

  • routers/chat.py:109 — Every chat message (fires in background thread)
  • utils/conversations/process_conversation.py:369 — Every conversation processing

Before #4778: 1 LLM call per message/conversation
After #4778: N calls where N = user's active goals (typically 1-3)

This directly explains:

  • gpt-4.1-mini requests: +33%
  • gpt-4.1-mini input tokens: +59% — each extra goal sends a full prompt

Proposed fix

Rewrite extract_and_update_goal_progress to evaluate ALL goals in one LLM call using structured output:

# Instead of N separate calls:
prompt = f"""Analyze this message for progress toward ANY of these goals:
{json.dumps([{"id": g["id"], "title": g["title"], ...} for g in goals])}

User Message: "{text[:500]}"

Return JSON array: [{{"goal_id": "...", "found": true/false, "value": number_or_null}}]
"""
response = llm_mini.invoke(prompt).content  # 1 call regardless of goal count

Estimated savings: 15-25% of gpt-4.1-mini daily spend.


Root Cause #2 (MODERATE): Proactive mentor notifications — PR #4735

New feature (merged Feb 11) adds function-calling triggers during real-time conversation streaming. This is expected new-feature cost, not a bug.

Call chain: transcribe.py / pusher.pytrigger_realtime_integrations()process_mentor_notification()_process_proactive_notification() which makes two LLM calls per trigger:

  1. _process_triggers()llm_mini.bind_tools() with tool_choice="auto"
  2. get_proactive_message() — notification text generation

Rate limited at 30s per user (PROACTIVE_NOTI_LIMIT_SECONDS), but across all active users this adds significant volume.

Optimization opportunity

When _process_triggers() returns no results (confidence below 0.7), the code still calls get_proactive_message(). Consider skipping the second call when triggers don't fire.


Root Cause #3 (MINOR): gpt-5.2 model appearing

A small number of requests appearing on a model not referenced anywhere in the codebase. Possible sources:

  • LangSmith omi-agentic-system prompt A/B test specifying a different model
  • OpenAI auto-routing gpt-5.1 → 5.2 during model upgrade rollout
  • External app integration using the org API key

Needs investigation in OpenAI dashboard and LangSmith.


Impact breakdown

Cause PR Model Type Priority
Goal extraction O(N) loop #4778 gpt-4.1-mini Bug — call multiplier Fix immediately
Proactive mentor notifications #4735 gpt-4.1-mini Expected (new feature) Optimize
gpt-5.2 mystery Unknown gpt-5.2 Unknown Investigate
Larger goal section in prompt #4778 gpt-5.1 Token increase Low

Acceptance criteria

Metadata

Metadata

Assignees

No one assigned

    Labels

    intelligenceLayer: Summaries, insights, action itemsp1Priority: Critical (score 22-29)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions