Skip to content

Groq structured output path mutates shared formatted_messages, corrupting fallback models #562

@Serhan-Asad

Description

@Serhan-Asad

Bug Description

When a Groq model is selected as the first candidate and structured output (output_pydantic) is requested, the Groq code path in llm_invoke.py mutates the shared formatted_messages list in-place. If the Groq call fails and falls back to another model, the fallback model receives corrupted messages containing Groq's JSON schema instruction as an injected system message.

Root Cause

llm_invoke.py:1966 passes formatted_messages by reference (not copy):

litellm_kwargs: Dict[str, Any] = {
    "model": model_name_litellm,
    "messages": formatted_messages,  # <-- REFERENCE to shared list
}

Then llm_invoke.py:2125-2129 mutates the list in-place:

messages_list = litellm_kwargs.get("messages", [])  # same reference
if messages_list and messages_list[0].get("role") == "system":
    messages_list[0]["content"] = schema_instruction + "\n\n" + messages_list[0]["content"]
else:
    messages_list.insert(0, {"role": "system", "content": schema_instruction})

Since messages_list, litellm_kwargs["messages"], and formatted_messages all point to the same list object, the .insert(0, ...) mutates formatted_messages permanently. Every subsequent model candidate in the fallback loop receives the corrupted messages.

Reproduction

import os, sys, json
sys.path.insert(0, "/path/to/pdd")
os.environ["GROQ_API_KEY"] = "your-key"

from pydantic import BaseModel
import litellm
from pdd.llm_invoke import llm_invoke

# Add groq model with high Elo to ~/.pdd/llm_model.csv:
# Groq,groq/nonexistent-model,0.01,0.01,1500,,GROQ_API_KEY,0,True,none,

class SimpleResult(BaseModel):
    answer: str
    confidence: float

# Monkey-patch to inspect messages
_orig = litellm.completion
def spy(**kw):
    model = kw.get("model")
    msgs = kw.get("messages", [])
    for m in msgs:
        if "You must respond with valid JSON" in m.get("content", ""):
            print(f"BUG: {model} received Groq's JSON schema instruction!")
    return _orig(**kw)
litellm.completion = spy

result = llm_invoke(
    messages=[{"role": "user", "content": "What is 2 + 2?"}],
    strength=1.0, temperature=0.0, time=0.0,
    output_pydantic=SimpleResult,
)

Output:

BUG: groq/nonexistent-model received Groq's JSON schema instruction!
BUG: gpt-4o-mini received Groq's JSON schema instruction!    <-- CORRUPTED

Impact

  • Fallback models receive redundant/conflicting JSON schema instructions (once in system message, once in response_format API parameter)
  • Extra token cost on every fallback attempt (177 tokens vs ~20 in test case)
  • Potential for confused model responses from conflicting instructions
  • Silent corruption — no error is raised, results may just be subtly wrong
  • Every subsequent model in the fallback chain inherits the corruption

Suggested Fix

Deep copy formatted_messages when building litellm_kwargs at line 1966:

import copy

litellm_kwargs: Dict[str, Any] = {
    "model": model_name_litellm,
    "messages": copy.deepcopy(formatted_messages),  # isolate per-model
}

Or more targeted: deep copy only in the Groq path before mutating.

Environment

  • PDD version: 0.0.145
  • File: pdd/llm_invoke.py, lines 1966, 2125-2129
  • Affects any command using output_pydantic with Groq as a candidate model

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions