fix: prevent Groq message mutation in structured output fallback (#562) by Serhan-Asad · Pull Request #591 · promptdriven/pdd

Serhan-Asad · 2026-02-24T01:16:43Z

Summary

Adds failing tests that detect the bug reported in #562: Groq structured output path mutates shared formatted_messages, corrupting fallback models.

Test Files

Unit tests: tests/test_llm_invoke_groq_message_mutation.py (5 tests)
E2E tests: tests/test_llm_invoke_groq_message_mutation_e2e.py (3 tests)

What This PR Contains

8 failing tests that reproduce the reported bug from two mutation vectors:
- List-level: .insert(0, ...) injects a Groq schema system message into the shared formatted_messages list
- Dict-level: messages_list[0]["content"] = schema + "\n\n" + ... overwrites the system message content in-place
Tests are verified to fail on current code and will pass once the fix is applied
No API keys or network calls needed — pure Python reference/mutation issue

Root Cause

llm_invoke.py:1966 passes formatted_messages by reference (not copy) into litellm_kwargs. The Groq structured output workaround at lines 2125-2129 then mutates this shared reference in-place via .insert(0, ...) and dict content overwrite. Since all variables point to the same Python objects, every subsequent fallback model candidate receives corrupted messages containing Groq's JSON schema instruction.

Next Steps

Implement the fix: copy.deepcopy(formatted_messages) at line 1966
Verify the 5 unit tests pass
Verify the 3 E2E tests pass
Run full test suite for regressions
Mark PR as ready for review

Fixes #562

Generated by PDD agentic bug workflow

Add unit tests and E2E tests that detect the bug where the Groq structured output code path in llm_invoke.py mutates the shared formatted_messages list in-place, corrupting fallback model messages. - 5 unit tests covering both mutation vectors (list .insert() and dict content overwrite) plus cumulative corruption - 3 E2E tests exercising real CSV loading and model selection with only litellm.completion mocked All 8 tests fail on current code and will pass once the fix (copy.deepcopy at line 1966) is applied. Fixes promptdriven#562

…utation corrupting fallback models (promptdriven#562) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This pull request adds comprehensive test coverage for issue #562, where Groq's structured output implementation mutates the shared formatted_messages list, causing fallback models to receive corrupted messages containing Groq-specific JSON schema instructions. The PR includes both unit tests and E2E tests that verify the fix.

Changes:

Added copy.deepcopy() fix at line 1967 to prevent message mutation
Added 5 unit tests in test_llm_invoke_groq_message_mutation.py
Added 3 E2E tests in test_llm_invoke_groq_message_mutation_e2e.py

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
pdd/llm_invoke.py	Added `import copy` and changed line 1967 to use `copy.deepcopy(formatted_messages)` instead of passing by reference
tests/test_llm_invoke_groq_message_mutation.py	Added 5 unit tests covering both mutation vectors (list insert and dict overwrite) with mocked dependencies
tests/test_llm_invoke_groq_message_mutation_e2e.py	Added 3 E2E tests exercising real code paths with only litellm.completion mocked

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pdd/llm_invoke.py

gltanaka

Great work investigating and reproducing this bug! The root cause analysis is spot-on — the shared formatted_messages reference getting mutated by the Groq structured output path is a real issue, and copy.deepcopy is the right fix.

A few suggestions to make this even better:

1. Consider splitting into two PRs

This PR currently bundles three things: tests for #562, the deepcopy fix for #562, and the context window validation feature for #559. Splitting the context window validation into its own PR would make each easier to review and merge independently.

2. Use litellm's built-in context window lookup

Rather than maintaining a hardcoded MODEL_CONTEXT_LIMITS dict in token_counter.py, consider using litellm.get_max_tokens(model) — it already tracks context windows for hundreds of models across all providers, handles Bedrock/Vertex naming conventions, and stays current as litellm is updated. This would eliminate the need for the custom dict and the tricky default value question.

Similarly, litellm.token_counter(model, messages=messages) picks the correct tokenizer per model, which would be more accurate than always using tiktoken's cl100k_base.

The only special case you'd need to handle is Claude's 1M context override (since llm_invoke.py already adds the anthropic-beta: context-1m-2025-08-07 header).

3. Default context limit (128K → 1M)

The change of MODEL_CONTEXT_LIMITS["default"] from 128K to 1M effectively turns off validation for any model not in the dict. Using litellm.get_max_tokens() as suggested above would make this a non-issue, since you'd only need a fallback for truly unknown models (where skipping validation entirely is probably the right call anyway).

4. Test files could be trimmed

The two new test files (~770 lines) are thorough but quite large for a 1-line fix. Consider adding a few focused regression tests to the existing test_llm_invoke.py instead — it already has all the mock infrastructure set up.

Overall, nice detective work on the mutation bug and solid test coverage. The deepcopy fix itself is clean and correct. 👍

…nvoke.py (promptdriven#562) Replace 770 lines across 2 standalone test files with 3 focused regression tests (~80 lines) in the existing test_llm_invoke.py, per PR promptdriven#591 review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Serhan-Asad mentioned this pull request Feb 24, 2026

Groq structured output path mutates shared formatted_messages, corrupting fallback models #562

Open

Serhan-Asad force-pushed the fix/issue-562 branch from b7e74c3 to 42cfb47 Compare February 24, 2026 01:26

fix: deep copy formatted_messages to prevent Groq structured output m…

7c55764

…utation corrupting fallback models (promptdriven#562) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Serhan-Asad marked this pull request as ready for review February 24, 2026 04:04

gltanaka requested a review from Copilot February 24, 2026 04:17

Copilot started reviewing on behalf of gltanaka February 24, 2026 04:17 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

pdd/llm_invoke.py Show resolved Hide resolved

Serhan-Asad force-pushed the fix/issue-562 branch from 74903ee to 7c55764 Compare February 24, 2026 04:30

Serhan-Asad changed the title ~~test: add failing tests for Groq message mutation bug (#562)~~ fix: prevent Groq message mutation in structured output fallback (#562) Feb 24, 2026

gltanaka reviewed Feb 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix: prevent Groq message mutation in structured output fallback (#562)#591

fix: prevent Groq message mutation in structured output fallback (#562)#591
Serhan-Asad wants to merge 3 commits intopromptdriven:mainfrom
Serhan-Asad:fix/issue-562

Serhan-Asad commented Feb 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

gltanaka left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

Serhan-Asad commented Feb 24, 2026

Summary

Test Files

What This PR Contains

Root Cause

Next Steps

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

gltanaka left a comment

Choose a reason for hiding this comment

1. Consider splitting into two PRs

2. Use litellm's built-in context window lookup

3. Default context limit (128K → 1M)

4. Test files could be trimmed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants