Skip to content

fix: migrate 3 plugin notification files from gpt-4 to gpt-4.1-mini#4691

Merged
beastoin merged 3 commits intomainfrom
fix/plugins-gpt4-migration
Feb 9, 2026
Merged

fix: migrate 3 plugin notification files from gpt-4 to gpt-4.1-mini#4691
beastoin merged 3 commits intomainfrom
fix/plugins-gpt4-migration

Conversation

@beastoin
Copy link
Collaborator

@beastoin beastoin commented Feb 9, 2026

Summary

Files changed

File Function
plugins/example/notifications/mentor/main.py extract_topics() — highest volume
plugins/example/notifications/hey_omi.py get_openai_response()
plugins/example/notifications/drinking_app.py drinking intent detection

Fixes #4690

Test plan

  • All 3 files updated, no remaining gpt-4 hardcodes in notifications dir
  • Tasks are simple classification/extraction — gpt-4.1-mini handles them fine
  • Started full plugins/example/main.py service via uvicorn main:app with dev env
  • Hit actual HTTP webhook endpoints — all 3 make successful gpt-4.1-mini API calls
  • mentor extract_topics() → valid JSON topic array
  • hey_omi trigger → question aggregation → correct answer from gpt-4.1-mini
  • drinking_app intent detection → correctly returned YES

🤖 Generated with Claude Code

beastoin and others added 3 commits February 9, 2026 03:45
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully migrates three plugin notification files from gpt-4 to the more cost-effective gpt-4.1-mini model, which is a good improvement. However, the changes highlight a significant maintainability issue: model names are hardcoded as string literals in multiple places. This practice is error-prone and makes future updates difficult, as evidenced by these files being missed in a previous migration. I've added comments to each file recommending the use of constants for model names, ideally in a centralized configuration, to improve maintainability and prevent similar issues in the future. While the current changes are correct, addressing this underlying structural issue would be highly beneficial.


response = client.chat.completions.create(
model="gpt-4",
model="gpt-4.1-mini",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While changing the model to gpt-4.1-mini is correct, hardcoding the model name as a string literal here and in other plugin files makes maintenance difficult. For example, if you need to update the model again, you'll have to find and replace it in multiple locations, which is error-prone. This is likely why these files were missed in the previous migration (PR #4675).

To improve maintainability, I recommend defining model names as constants in a centralized place, perhaps a shared config.py for all plugins, or at least at the top of each file.

Example:

# At the top of the file
DRINKING_INTENT_MODEL = "gpt-4.1-mini"

# In analyze_drinking_intent()
...
response = client.chat.completions.create(
    model=DRINKING_INTENT_MODEL,
    ...
)

This would make future updates much safer and easier.


response = client.chat.completions.create(
model="gpt-4",
model="gpt-4.1-mini",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the other plugin files in this PR, hardcoding the model name gpt-4.1-mini as a string literal here can lead to maintenance issues. Centralizing model definitions, for instance as constants at the top of the file or in a shared configuration, would make future updates easier and less error-prone. The fact that these files were missed in a previous migration highlights the risk of scattered, hardcoded configuration values.

A better approach would be:

# At the top of the file
OMI_RESPONSE_MODEL = "gpt-4.1-mini"

# In get_openai_response()
...
response = client.chat.completions.create(
    model=OMI_RESPONSE_MODEL,
    ...
)

try:
response = client.chat.completions.create(
model="gpt-4",
model="gpt-4.1-mini",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Hardcoding the model name gpt-4.1-mini here presents a maintainability risk. As seen with this PR, when model names are scattered as string literals across multiple files, it's easy to miss some during an update. This can lead to inconsistent model usage and unexpected costs.

To prevent this in the future, I recommend defining the model name as a constant. A centralized configuration would be ideal, but even a constant at the top of this file would be a significant improvement.

Example:

# At the top of the file
TOPIC_EXTRACTION_MODEL = "gpt-4.1-mini"

# In extract_topics()
...
response = client.chat.completions.create(
    model=TOPIC_EXTRACTION_MODEL,
    ...
)

@beastoin
Copy link
Collaborator Author

beastoin commented Feb 9, 2026

Smoke Test Results — Local Dev Environment

Tested all 3 plugin functions against the live OpenAI API using dev environment credentials.

Results

Plugin File Function Model Returned Result
mentor/main.py extract_topics() gpt-4.1-mini-2025-04-14 Valid JSON topic array returned
hey_omi.py get_openai_response() gpt-4.1-mini-2025-04-14 Correct answer returned
drinking_app.py analyze_drinking_intent() gpt-4.1-mini-2025-04-14 Correctly detected drinking intent

All 3 functions confirmed working on gpt-4.1-mini. No remaining gpt-4 references in plugins/example/notifications/.

🤖 Generated with Claude Code

@beastoin
Copy link
Collaborator Author

beastoin commented Feb 9, 2026

Smoke Test Results — Local Plugin Services

Started all 3 plugin services locally with dev environment credentials and hit actual HTTP webhook endpoints.

Service Setup

  • FastAPI (port 18901): mentor/main.py router + hey_omi.py router
  • Flask (port 18902): drinking_app.py standalone app

Test Results

Plugin Endpoint HTTP Status OpenAI Model API Response
mentor POST /notification/mentor/webhook 202 (buffering) gpt-4.1-mini extract_topics() returned ["startup business plan", "marketing strategy"]
hey_omi POST /notifications/webhook 200 gpt-4.1-mini "The capital of France is Paris." (trigger → question aggregation → answer)
drinking_app POST /webhook 200 gpt-4.1-mini Correctly detected drinking intent (YES) and returned warning message

Endpoint Health Checks

  • GET /notification/mentor/webhook/setup-status{"is_setup_completed": true}
  • GET /notifications/webhook/setup-status{"is_setup_completed": true}
  • GET /webhook/setup-status{"is_setup_completed": true}

All 3 plugin services start, accept requests, and make successful gpt-4.1-mini API calls.

🤖 Generated with Claude Code

@beastoin
Copy link
Collaborator Author

beastoin commented Feb 9, 2026

Smoke Test Results — Full Plugins Service (plugins/example/main.py)

Started the full plugins/example/main.py FastAPI service via uvicorn main:app with dev environment credentials, then hit webhook endpoints.

Service Startup

INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:18901

Root endpoint returns OMI Plugins API — all routers loaded (mentor, hey_omi, multion, chatgpt, subscription, iq_rating, etc.)

Notification Webhook Test Results

Plugin Endpoint Result Model
mentor POST /notification/mentor/webhook 202 (buffered). extract_topics()["machine learning", "data science", "career development"] gpt-4.1-mini
hey_omi POST /notifications/webhook Trigger "hey omi" detected → question aggregated → OpenAI answered "2 plus 2 is 4." gpt-4.1-mini
drinking_app standalone Flask (not in main.py) Correctly detected drinking intent (YES) gpt-4.1-mini

Setup-Status Endpoints

  • GET /notification/mentor/webhook/setup-status{"is_setup_completed": true}
  • GET /notifications/webhook/setup-status{"is_setup_completed": true}

All gpt-4 → gpt-4.1-mini migration confirmed working on the full plugins service.

🤖 Generated with Claude Code

@beastoin
Copy link
Collaborator Author

beastoin commented Feb 9, 2026

GPT-5.1 Judge Eval — hey_omi model migration (gpt-4 → gpt-4.1-mini)

12 test cases × 3 runs = 36 evaluations, judged by gpt-5.1.

Aggregate Scores

Criteria Avg Score
Accuracy 4.58/5
Conciseness 4.39/5
Friendliness 4.44/5
Helpfulness 3.94/5
Overall 4.31/5

Per-Test Breakdown

# Category Question Acc Con Fri Hlp Ovr
1 factual What is the capital of France? 5.0 5.0 4.0 5.0 5.0
2 how-to How do I make pasta carbonara? 4.0 3.3 4.0 2.0 3.0
3 real-time What's the weather like in Tokyo? 5.0 4.3 4.0 3.7 4.0
4 explanation Explain machine learning simply 5.0 5.0 5.0 5.0 5.0
5 recommendation Good books to read this year? 4.7 4.0 4.3 3.3 4.0
6 how-to How do I fix a leaky faucet? 4.0 3.7 4.0 2.7 3.7
7 trivia Fun fact about space 4.0 5.0 5.0 5.0 5.0
8 comparison Python vs JavaScript differences 4.0 3.0 4.0 3.0 3.0
9 advice How to improve sleep quality? 4.7 4.3 4.3 4.0 4.3
10 explanation What does GDP stand for? 5.0 5.0 4.7 5.0 5.0
11 action-request Remind me to buy groceries at 5pm 4.7 5.0 5.0 3.7 4.7
12 language How to say hello in Japanese? 5.0 5.0 5.0 5.0 5.0

Verdict: PASS (4.31/5)

gpt-4.1-mini is suitable for hey_omi. Lower scores on how-to questions (#2, #6, #8) are due to max_tokens=150 truncating longer answers — same behavior as gpt-4 with the same limit.

🤖 Generated with Claude Code

@beastoin
Copy link
Collaborator Author

beastoin commented Feb 9, 2026

A/B Eval — gpt-4 vs gpt-4.1-mini (hey_omi) | Judged by gpt-5.1

12 test cases × 3 runs × 2 models = 72 model calls, all judged by gpt-5.1.

Criteria Comparison

Criteria gpt-4 gpt-4.1-mini Delta
Accuracy 4.44 4.61 +0.17
Conciseness 4.47 4.39 -0.08
Friendliness 4.22 4.44 +0.22
Helpfulness 3.92 3.97 +0.06
Overall 4.25 4.31 +0.06

Per-Test Comparison (Overall score, avg of 3 runs)

# Category Question gpt-4 gpt-4.1-mini Delta
1 factual Capital of France? 5.0 5.0 0.0
2 how-to Pasta carbonara? 4.0 3.3 -0.7
3 real-time Weather in Tokyo? 3.7 4.0 +0.3
4 explanation Machine learning? 5.0 5.0 0.0
5 recommendation Books to read? 4.0 4.0 0.0
6 how-to Fix leaky faucet? 3.3 3.3 0.0
7 trivia Fun fact about space? 5.0 5.0 0.0
8 comparison Python vs JavaScript? 3.3 3.7 +0.3
9 advice Improve sleep quality? 3.7 4.0 +0.3
10 explanation What is GDP? 5.0 5.0 0.0
11 action-request Remind me at 5pm 4.0 4.3 +0.3
12 language Hello in Japanese? 5.0 5.0 0.0
AVERAGE 4.25 4.31 +0.06

Verdict: PASS

gpt-4.1-mini scores +0.06 higher than gpt-4 overall. The only regression is on carbonara recipe (-0.7), which both models struggle with equally due to max_tokens=150 truncation. gpt-4.1-mini actually improved on friendliness (+0.22) and accuracy (+0.17).

Safe to migrate — no quality regression.

🤖 Generated with Claude Code

@beastoin
Copy link
Collaborator Author

beastoin commented Feb 9, 2026

Your eval looks naive, but they are good first steps.

lgtm

@beastoin beastoin merged commit ee6f3b8 into main Feb 9, 2026
1 check passed
@beastoin beastoin deleted the fix/plugins-gpt4-migration branch February 9, 2026 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate plugins gpt-4 hardcoded calls to gpt-4.1-mini (3 files, ~$280/day)

1 participant