fix: migrate 3 plugin notification files from gpt-4 to gpt-4.1-mini by beastoin · Pull Request #4691 · BasedHardware/omi

beastoin · 2026-02-09T03:45:47Z

Summary

Change model="gpt-4" → model="gpt-4.1-mini" in 3 plugin notification files
These are the remaining source of ~400 reqs/hr of gpt-4-0613 calls ($30/1M input)
Same migration as PR fix: migrate mentor topic extraction from gpt-4 to gpt-4.1-mini (#4671) #4675 did for the backend copy, but for the plugin copies

Files changed

File	Function
`plugins/example/notifications/mentor/main.py`	`extract_topics()` — highest volume
`plugins/example/notifications/hey_omi.py`	`get_openai_response()`
`plugins/example/notifications/drinking_app.py`	drinking intent detection

Fixes #4690

Test plan

All 3 files updated, no remaining gpt-4 hardcodes in notifications dir
Tasks are simple classification/extraction — gpt-4.1-mini handles them fine
Started full plugins/example/main.py service via uvicorn main:app with dev env
Hit actual HTTP webhook endpoints — all 3 make successful gpt-4.1-mini API calls
mentor extract_topics() → valid JSON topic array
hey_omi trigger → question aggregation → correct answer from gpt-4.1-mini
drinking_app intent detection → correctly returned YES

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request successfully migrates three plugin notification files from gpt-4 to the more cost-effective gpt-4.1-mini model, which is a good improvement. However, the changes highlight a significant maintainability issue: model names are hardcoded as string literals in multiple places. This practice is error-prone and makes future updates difficult, as evidenced by these files being missed in a previous migration. I've added comments to each file recommending the use of constants for model names, ideally in a centralized configuration, to improve maintainability and prevent similar issues in the future. While the current changes are correct, addressing this underlying structural issue would be highly beneficial.

gemini-code-assist · 2026-02-09T03:47:02Z

plugins/example/notifications/drinking_app.py


        response = client.chat.completions.create(
-            model="gpt-4",
+            model="gpt-4.1-mini",


While changing the model to gpt-4.1-mini is correct, hardcoding the model name as a string literal here and in other plugin files makes maintenance difficult. For example, if you need to update the model again, you'll have to find and replace it in multiple locations, which is error-prone. This is likely why these files were missed in the previous migration (PR #4675).

To improve maintainability, I recommend defining model names as constants in a centralized place, perhaps a shared config.py for all plugins, or at least at the top of each file.

Example:

# At the top of the file DRINKING_INTENT_MODEL = "gpt-4.1-mini" # In analyze_drinking_intent() ... response = client.chat.completions.create( model=DRINKING_INTENT_MODEL, ... )

This would make future updates much safer and easier.

gemini-code-assist · 2026-02-09T03:47:03Z

plugins/example/notifications/hey_omi.py


        response = client.chat.completions.create(
-            model="gpt-4",
+            model="gpt-4.1-mini",


Similar to the other plugin files in this PR, hardcoding the model name gpt-4.1-mini as a string literal here can lead to maintenance issues. Centralizing model definitions, for instance as constants at the top of the file or in a shared configuration, would make future updates easier and less error-prone. The fact that these files were missed in a previous migration highlights the risk of scattered, hardcoded configuration values.

A better approach would be:

# At the top of the file OMI_RESPONSE_MODEL = "gpt-4.1-mini" # In get_openai_response() ... response = client.chat.completions.create( model=OMI_RESPONSE_MODEL, ... )

gemini-code-assist · 2026-02-09T03:47:03Z

plugins/example/notifications/mentor/main.py

    try:
        response = client.chat.completions.create(
-            model="gpt-4",
+            model="gpt-4.1-mini",


Hardcoding the model name gpt-4.1-mini here presents a maintainability risk. As seen with this PR, when model names are scattered as string literals across multiple files, it's easy to miss some during an update. This can lead to inconsistent model usage and unexpected costs.

To prevent this in the future, I recommend defining the model name as a constant. A centralized configuration would be ideal, but even a constant at the top of this file would be a significant improvement.

Example:

# At the top of the file TOPIC_EXTRACTION_MODEL = "gpt-4.1-mini" # In extract_topics() ... response = client.chat.completions.create( model=TOPIC_EXTRACTION_MODEL, ... )

beastoin · 2026-02-09T03:49:05Z

Smoke Test Results — Local Dev Environment

Tested all 3 plugin functions against the live OpenAI API using dev environment credentials.

Results

Plugin File	Function	Model Returned	Result
`mentor/main.py`	`extract_topics()`	`gpt-4.1-mini-2025-04-14`	Valid JSON topic array returned
`hey_omi.py`	`get_openai_response()`	`gpt-4.1-mini-2025-04-14`	Correct answer returned
`drinking_app.py`	`analyze_drinking_intent()`	`gpt-4.1-mini-2025-04-14`	Correctly detected drinking intent

All 3 functions confirmed working on gpt-4.1-mini. No remaining gpt-4 references in plugins/example/notifications/.

🤖 Generated with Claude Code

beastoin · 2026-02-09T03:54:28Z

Smoke Test Results — Local Plugin Services

Started all 3 plugin services locally with dev environment credentials and hit actual HTTP webhook endpoints.

Service Setup

FastAPI (port 18901): mentor/main.py router + hey_omi.py router
Flask (port 18902): drinking_app.py standalone app

Test Results

Plugin	Endpoint	HTTP Status	OpenAI Model	API Response
mentor	`POST /notification/mentor/webhook`	202 (buffering)	`gpt-4.1-mini`	`extract_topics()` returned `["startup business plan", "marketing strategy"]`
hey_omi	`POST /notifications/webhook`	200	`gpt-4.1-mini`	"The capital of France is Paris." (trigger → question aggregation → answer)
drinking_app	`POST /webhook`	200	`gpt-4.1-mini`	Correctly detected drinking intent (`YES`) and returned warning message

Endpoint Health Checks

GET /notification/mentor/webhook/setup-status → {"is_setup_completed": true}
GET /notifications/webhook/setup-status → {"is_setup_completed": true}
GET /webhook/setup-status → {"is_setup_completed": true}

All 3 plugin services start, accept requests, and make successful gpt-4.1-mini API calls.

🤖 Generated with Claude Code

beastoin · 2026-02-09T03:59:35Z

Smoke Test Results — Full Plugins Service (`plugins/example/main.py`)

Started the full plugins/example/main.py FastAPI service via uvicorn main:app with dev environment credentials, then hit webhook endpoints.

Service Startup

INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:18901

Root endpoint returns OMI Plugins API — all routers loaded (mentor, hey_omi, multion, chatgpt, subscription, iq_rating, etc.)

Notification Webhook Test Results

Plugin	Endpoint	Result	Model
mentor	`POST /notification/mentor/webhook`	202 (buffered). `extract_topics()` → `["machine learning", "data science", "career development"]`	`gpt-4.1-mini`
hey_omi	`POST /notifications/webhook`	Trigger "hey omi" detected → question aggregated → OpenAI answered "2 plus 2 is 4."	`gpt-4.1-mini`
drinking_app	standalone Flask (not in `main.py`)	Correctly detected drinking intent (`YES`)	`gpt-4.1-mini`

Setup-Status Endpoints

GET /notification/mentor/webhook/setup-status → {"is_setup_completed": true}
GET /notifications/webhook/setup-status → {"is_setup_completed": true}

All gpt-4 → gpt-4.1-mini migration confirmed working on the full plugins service.

🤖 Generated with Claude Code

beastoin · 2026-02-09T04:13:02Z

GPT-5.1 Judge Eval — hey_omi model migration (gpt-4 → gpt-4.1-mini)

12 test cases × 3 runs = 36 evaluations, judged by gpt-5.1.

Aggregate Scores

Criteria	Avg Score
Accuracy	4.58/5
Conciseness	4.39/5
Friendliness	4.44/5
Helpfulness	3.94/5
Overall	4.31/5

Per-Test Breakdown

#	Category	Question	Acc	Con	Fri	Hlp	Ovr
1	factual	What is the capital of France?	5.0	5.0	4.0	5.0	5.0
2	how-to	How do I make pasta carbonara?	4.0	3.3	4.0	2.0	3.0
3	real-time	What's the weather like in Tokyo?	5.0	4.3	4.0	3.7	4.0
4	explanation	Explain machine learning simply	5.0	5.0	5.0	5.0	5.0
5	recommendation	Good books to read this year?	4.7	4.0	4.3	3.3	4.0
6	how-to	How do I fix a leaky faucet?	4.0	3.7	4.0	2.7	3.7
7	trivia	Fun fact about space	4.0	5.0	5.0	5.0	5.0
8	comparison	Python vs JavaScript differences	4.0	3.0	4.0	3.0	3.0
9	advice	How to improve sleep quality?	4.7	4.3	4.3	4.0	4.3
10	explanation	What does GDP stand for?	5.0	5.0	4.7	5.0	5.0
11	action-request	Remind me to buy groceries at 5pm	4.7	5.0	5.0	3.7	4.7
12	language	How to say hello in Japanese?	5.0	5.0	5.0	5.0	5.0

Verdict: PASS (4.31/5)

gpt-4.1-mini is suitable for hey_omi. Lower scores on how-to questions (#2, #6, #8) are due to max_tokens=150 truncating longer answers — same behavior as gpt-4 with the same limit.

🤖 Generated with Claude Code

beastoin · 2026-02-09T04:21:01Z

A/B Eval — gpt-4 vs gpt-4.1-mini (hey_omi) | Judged by gpt-5.1

12 test cases × 3 runs × 2 models = 72 model calls, all judged by gpt-5.1.

Criteria Comparison

Criteria	gpt-4	gpt-4.1-mini	Delta
Accuracy	4.44	4.61	+0.17
Conciseness	4.47	4.39	-0.08
Friendliness	4.22	4.44	+0.22
Helpfulness	3.92	3.97	+0.06
Overall	4.25	4.31	+0.06

Per-Test Comparison (Overall score, avg of 3 runs)

#	Category	Question	gpt-4	gpt-4.1-mini	Delta
1	factual	Capital of France?	5.0	5.0	0.0
2	how-to	Pasta carbonara?	4.0	3.3	-0.7
3	real-time	Weather in Tokyo?	3.7	4.0	+0.3
4	explanation	Machine learning?	5.0	5.0	0.0
5	recommendation	Books to read?	4.0	4.0	0.0
6	how-to	Fix leaky faucet?	3.3	3.3	0.0
7	trivia	Fun fact about space?	5.0	5.0	0.0
8	comparison	Python vs JavaScript?	3.3	3.7	+0.3
9	advice	Improve sleep quality?	3.7	4.0	+0.3
10	explanation	What is GDP?	5.0	5.0	0.0
11	action-request	Remind me at 5pm	4.0	4.3	+0.3
12	language	Hello in Japanese?	5.0	5.0	0.0
	AVERAGE		4.25	4.31	+0.06

Verdict: PASS

gpt-4.1-mini scores +0.06 higher than gpt-4 overall. The only regression is on carbonara recipe (-0.7), which both models struggle with equally due to max_tokens=150 truncation. gpt-4.1-mini actually improved on friendliness (+0.22) and accuracy (+0.17).

Safe to migrate — no quality regression.

🤖 Generated with Claude Code

beastoin · 2026-02-09T04:22:28Z

Your eval looks naive, but they are good first steps.

lgtm

beastoin and others added 3 commits February 9, 2026 03:45

fix: migrate mentor plugin from gpt-4 to gpt-4.1-mini (#4690)

1355fcc

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: migrate hey_omi plugin from gpt-4 to gpt-4.1-mini (#4690)

b82ffe8

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: migrate drinking_app plugin from gpt-4 to gpt-4.1-mini (#4690)

eeae02b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist bot reviewed Feb 9, 2026

View reviewed changes

beastoin merged commit ee6f3b8 into main Feb 9, 2026
1 check passed

beastoin deleted the fix/plugins-gpt4-migration branch February 9, 2026 04:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: migrate 3 plugin notification files from gpt-4 to gpt-4.1-mini#4691

fix: migrate 3 plugin notification files from gpt-4 to gpt-4.1-mini#4691
beastoin merged 3 commits intomainfrom
fix/plugins-gpt4-migration

beastoin commented Feb 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 9, 2026

Uh oh!

gemini-code-assist bot Feb 9, 2026

Uh oh!

gemini-code-assist bot Feb 9, 2026

Uh oh!

beastoin commented Feb 9, 2026

Uh oh!

beastoin commented Feb 9, 2026

Uh oh!

beastoin commented Feb 9, 2026

Uh oh!

beastoin commented Feb 9, 2026

Uh oh!

beastoin commented Feb 9, 2026

Uh oh!

beastoin commented Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

beastoin commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files changed

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

beastoin commented Feb 9, 2026

Smoke Test Results — Local Dev Environment

Results

Uh oh!

beastoin commented Feb 9, 2026

Smoke Test Results — Local Plugin Services

Service Setup

Test Results

Endpoint Health Checks

Uh oh!

beastoin commented Feb 9, 2026

Smoke Test Results — Full Plugins Service (plugins/example/main.py)

Service Startup

Notification Webhook Test Results

Setup-Status Endpoints

Uh oh!

beastoin commented Feb 9, 2026

GPT-5.1 Judge Eval — hey_omi model migration (gpt-4 → gpt-4.1-mini)

Aggregate Scores

Per-Test Breakdown

Verdict: PASS (4.31/5)

Uh oh!

beastoin commented Feb 9, 2026

A/B Eval — gpt-4 vs gpt-4.1-mini (hey_omi) | Judged by gpt-5.1

Criteria Comparison

Per-Test Comparison (Overall score, avg of 3 runs)

Verdict: PASS

Uh oh!

beastoin commented Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

beastoin commented Feb 9, 2026 •

edited

Loading

Smoke Test Results — Full Plugins Service (`plugins/example/main.py`)