-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Closed as not planned
Labels
intelligenceLayer: Summaries, insights, action itemsLayer: Summaries, insights, action itemsmaintainerLane: High-risk, cross-system changesLane: High-risk, cross-system changesp1Priority: Critical (score 22-29)Priority: Critical (score 22-29)
Description
Parent: #4635
Problem
extract_action_items() uses llm_medium_experiment (gpt-5.1) but action item extraction is a well-constrained task with clear rules. gpt-4.1-mini may produce comparable quality at a fraction of the cost.
File: backend/utils/llm/conversation_processing.py:531
chain = prompt | llm_medium_experiment | action_items_parser # gpt-5.1Proposed Change
- Switch
extract_action_items()to usellm_mini(gpt-4.1-mini) - Run quality comparison on ~100 recent conversations (compare action items from both models)
- If quality delta is acceptable (<5% difference), ship the change
Impact
conv_action_itemsis 19.8% of total spend — model swap could cut this by ~80%- Estimated 10-15% reduction in total LLM costs
Risk
Medium — action item quality may degrade for complex conversations. Needs A/B quality comparison before shipping.
Validation
Query based-hardware:llm_usage.raw for conv_action_items to get baseline token counts, then compare costs after swap.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
intelligenceLayer: Summaries, insights, action itemsLayer: Summaries, insights, action itemsmaintainerLane: High-risk, cross-system changesLane: High-risk, cross-system changesp1Priority: Critical (score 22-29)Priority: Critical (score 22-29)