Skip to content

check if its within context window before running pdd #559

@Serhan-Asad

Description

@Serhan-Asad

Feature description

llm_invoke.py sends prompts to LLMs without checking if the assembled prompt fits within the model's context window. When the prompt exceeds the limit, the LLM returns a raw API error with no actionable guidance — no token count, no model limit, no suggestion to reduce prompt size.

PDD has a token counting utility (pdd/server/token_counter.py) with count_tokens() and get_context_limit(), but it's only used in the server API layer — not wired into the LLM invocation pipeline.

Current behavior

  1. Prompt is formatted via _format_messages()formatted_messages (line 1852 or 1856)
  2. Passed into litellm_kwargs["messages"] (line 1966) inside the per-model candidate loop
  3. Sent to litellm.completion() (line 2419) with no token validation
  4. If prompt exceeds context window → raw API error, no helpful message

There is zero handling for context window overflow errors anywhere in llm_invoke.py.

Expected behavior

Before calling litellm.completion(), validate the assembled prompt against the model's context limit:

  • Over limit → raise a clear error with: token count, model limit, usage percentage, and which prompt/file caused it
  • >90% of limit + verbose → log a warning with token metrics
  • Verbose mode → always log token count so users can understand prompt size

Affected code

File: pdd/llm_invoke.py

  • formatted_messages assigned at line 1852 (from messages param) or 1856 (from _format_messages())
  • Used at line 1966: litellm_kwargs["messages"] = formatted_messages
  • LLM called at line 2419: litellm.completion(**litellm_kwargs, timeout=LLM_CALL_TIMEOUT)
  • Validation should go inside the per-model loop (after line 1966) since the model name is needed for get_context_limit()

Existing utility: pdd/server/token_counter.py

  • count_tokens(text) → tiktoken cl100k_base token count
  • get_context_limit(model) → context limit by model family prefix
  • MODEL_CONTEXT_LIMITS → GPT-4 (128K), GPT-5 (200K), Claude (200K), Gemini (1M), default (128K)

Implementation notes

  1. Claude 1M context mismatchllm_invoke.py line 2013-2014 already adds anthropic-beta: context-1m-2025-08-07 for Claude models, extending effective context to 1M. But token_counter.py lists Claude at 200K. The validation must use the actual effective limit (1M for Claude when the beta header is present).

  2. Bedrock model namestoken_counter.py does prefix matching (e.g., "claude-3" in model_lower), but Bedrock models use names like anthropic.claude-opus-4-6-v1. The matching logic needs to handle these.

  3. No context column in llm_model.csv — The model CSV (pdd/data/llm_model.csv) has cost/elo/provider but no context window column. Consider adding one as the source of truth, or keep the hardcoded dict in token_counter.py.

  4. Insertion point — The check goes inside the for model_info in candidate_models: loop (line 1941), after litellm_kwargs is built (line 1966) and model-specific headers are set (line 2014), but before litellm.completion() (line 2419).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions