-
Notifications
You must be signed in to change notification settings - Fork 53
Description
Feature description
llm_invoke.py sends prompts to LLMs without checking if the assembled prompt fits within the model's context window. When the prompt exceeds the limit, the LLM returns a raw API error with no actionable guidance — no token count, no model limit, no suggestion to reduce prompt size.
PDD has a token counting utility (pdd/server/token_counter.py) with count_tokens() and get_context_limit(), but it's only used in the server API layer — not wired into the LLM invocation pipeline.
Current behavior
- Prompt is formatted via
_format_messages()→formatted_messages(line 1852 or 1856) - Passed into
litellm_kwargs["messages"](line 1966) inside the per-model candidate loop - Sent to
litellm.completion()(line 2419) with no token validation - If prompt exceeds context window → raw API error, no helpful message
There is zero handling for context window overflow errors anywhere in llm_invoke.py.
Expected behavior
Before calling litellm.completion(), validate the assembled prompt against the model's context limit:
- Over limit → raise a clear error with: token count, model limit, usage percentage, and which prompt/file caused it
- >90% of limit + verbose → log a warning with token metrics
- Verbose mode → always log token count so users can understand prompt size
Affected code
File: pdd/llm_invoke.py
formatted_messagesassigned at line 1852 (frommessagesparam) or 1856 (from_format_messages())- Used at line 1966:
litellm_kwargs["messages"] = formatted_messages - LLM called at line 2419:
litellm.completion(**litellm_kwargs, timeout=LLM_CALL_TIMEOUT) - Validation should go inside the per-model loop (after line 1966) since the model name is needed for
get_context_limit()
Existing utility: pdd/server/token_counter.py
count_tokens(text)→ tiktoken cl100k_base token countget_context_limit(model)→ context limit by model family prefixMODEL_CONTEXT_LIMITS→ GPT-4 (128K), GPT-5 (200K), Claude (200K), Gemini (1M), default (128K)
Implementation notes
-
Claude 1M context mismatch —
llm_invoke.pyline 2013-2014 already addsanthropic-beta: context-1m-2025-08-07for Claude models, extending effective context to 1M. Buttoken_counter.pylists Claude at 200K. The validation must use the actual effective limit (1M for Claude when the beta header is present). -
Bedrock model names —
token_counter.pydoes prefix matching (e.g.,"claude-3" in model_lower), but Bedrock models use names likeanthropic.claude-opus-4-6-v1. The matching logic needs to handle these. -
No context column in llm_model.csv — The model CSV (
pdd/data/llm_model.csv) has cost/elo/provider but no context window column. Consider adding one as the source of truth, or keep the hardcoded dict intoken_counter.py. -
Insertion point — The check goes inside the
for model_info in candidate_models:loop (line 1941), afterlitellm_kwargsis built (line 1966) and model-specific headers are set (line 2014), but beforelitellm.completion()(line 2419).