Skip to content

Optimize conversation processing LLM costs (62% of total spend) #4635

@beastoin

Description

@beastoin

Context

Fresh BQ export of users/{uid}/llm_usage shows the conversation processing pipeline consuming 62% of total LLM spend. Breakdown:

Feature % of Total Model
conv_action_items 19.8% gpt-5.1
conv_structure 15.5% gpt-5.1
other / untracked 15.5% mixed
conversation_processing (legacy umbrella) 14.7% gpt-5.1
conv_apps 12.2% gpt-5.1

Root Cause Analysis

Every non-discarded conversation triggers 5-6 LLM calls, 3 of which hit gpt-5.1 with the full transcript:

Step Feature Model Cost
1 conv_discard gpt-4.1-mini cheap
2 conv_structure gpt-5.1 expensive
3 conv_action_items gpt-5.1 expensive
4 conv_folder gpt-4.1-mini cheap
5 conv_apps (suggest) gpt-4.1-mini cheap
6 conv_apps (execute) gpt-5.1 expensive

The same full transcript is sent to gpt-5.1 three separate times (structure, action_items, app_result).

Overuse paths

  • sync.py:606 and postprocess_conversation.py:115 both call process_conversation(force_process=True) on already-processed conversations → 2-3x LLM cost for synced/postprocessed conversations
  • _trigger_apps() runs get_suggested_apps_for_conversation() even when user has a preferred app set → wasted LLM call
  • extract_action_items() fetches 50 recent action items for dedup context on every call → thousands of extra input tokens

Sub-issues (ordered by estimated impact)

See linked sub-issues below for individual optimization tasks.

Expected Outcome

Target: 25-30% reduction in conversation pipeline LLM costs without degrading output quality.

Data Source

BQ table: based-hardware:llm_usage.raw

Labels

intelligence, maintainer, p1

Metadata

Metadata

Assignees

No one assigned

    Labels

    intelligenceLayer: Summaries, insights, action itemsmaintainerLane: High-risk, cross-system changesp1Priority: Critical (score 22-29)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions