Skip to content

Downgrade conv_action_items to gpt-4.1-mini #4637

@beastoin

Description

@beastoin

Parent: #4635

Problem

extract_action_items() uses llm_medium_experiment (gpt-5.1) but action item extraction is a well-constrained task with clear rules. gpt-4.1-mini may produce comparable quality at a fraction of the cost.

File: backend/utils/llm/conversation_processing.py:531

chain = prompt | llm_medium_experiment | action_items_parser  # gpt-5.1

Proposed Change

  1. Switch extract_action_items() to use llm_mini (gpt-4.1-mini)
  2. Run quality comparison on ~100 recent conversations (compare action items from both models)
  3. If quality delta is acceptable (<5% difference), ship the change

Impact

  • conv_action_items is 19.8% of total spend — model swap could cut this by ~80%
  • Estimated 10-15% reduction in total LLM costs

Risk

Medium — action item quality may degrade for complex conversations. Needs A/B quality comparison before shipping.

Validation

Query based-hardware:llm_usage.raw for conv_action_items to get baseline token counts, then compare costs after swap.

Metadata

Metadata

Assignees

No one assigned

    Labels

    intelligenceLayer: Summaries, insights, action itemsmaintainerLane: High-risk, cross-system changesp1Priority: Critical (score 22-29)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions