check if its within context window before running pdd

## Feature description

`llm_invoke.py` sends prompts to LLMs without checking if the assembled prompt fits within the model's context window. When the prompt exceeds the limit, the LLM returns a raw API error with no actionable guidance — no token count, no model limit, no suggestion to reduce prompt size.

PDD has a token counting utility (`pdd/server/token_counter.py`) with `count_tokens()` and `get_context_limit()`, but it's only used in the server API layer — not wired into the LLM invocation pipeline.

## Current behavior

1. Prompt is formatted via `_format_messages()` → `formatted_messages` (line 1852 or 1856)
2. Passed into `litellm_kwargs["messages"]` (line 1966) inside the per-model candidate loop
3. Sent to `litellm.completion()` (line 2419) with no token validation
4. If prompt exceeds context window → raw API error, no helpful message

There is zero handling for context window overflow errors anywhere in `llm_invoke.py`.

## Expected behavior

Before calling `litellm.completion()`, validate the assembled prompt against the model's context limit:
- **Over limit** → raise a clear error with: token count, model limit, usage percentage, and which prompt/file caused it
- **>90% of limit + verbose** → log a warning with token metrics
- **Verbose mode** → always log token count so users can understand prompt size

## Affected code

**File:** `pdd/llm_invoke.py`
- `formatted_messages` assigned at line 1852 (from `messages` param) or 1856 (from `_format_messages()`)
- Used at line 1966: `litellm_kwargs["messages"] = formatted_messages`
- LLM called at line 2419: `litellm.completion(**litellm_kwargs, timeout=LLM_CALL_TIMEOUT)`
- Validation should go inside the per-model loop (after line 1966) since the model name is needed for `get_context_limit()`

**Existing utility:** `pdd/server/token_counter.py`
- `count_tokens(text)` → tiktoken cl100k_base token count
- `get_context_limit(model)` → context limit by model family prefix
- `MODEL_CONTEXT_LIMITS` → GPT-4 (128K), GPT-5 (200K), Claude (200K), Gemini (1M), default (128K)

## Implementation notes

1. **Claude 1M context mismatch** — `llm_invoke.py` line 2013-2014 already adds `anthropic-beta: context-1m-2025-08-07` for Claude models, extending effective context to 1M. But `token_counter.py` lists Claude at 200K. The validation must use the actual effective limit (1M for Claude when the beta header is present).

2. **Bedrock model names** — `token_counter.py` does prefix matching (e.g., `"claude-3" in model_lower`), but Bedrock models use names like `anthropic.claude-opus-4-6-v1`. The matching logic needs to handle these.

3. **No context column in llm_model.csv** — The model CSV (`pdd/data/llm_model.csv`) has cost/elo/provider but no context window column. Consider adding one as the source of truth, or keep the hardcoded dict in `token_counter.py`.

4. **Insertion point** — The check goes inside the `for model_info in candidate_models:` loop (line 1941), after `litellm_kwargs` is built (line 1966) and model-specific headers are set (line 2014), but before `litellm.completion()` (line 2419).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check if its within context window before running pdd #559

Feature description

Current behavior

Expected behavior

Affected code

Implementation notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

check if its within context window before running pdd #559

Description

Feature description

Current behavior

Expected behavior

Affected code

Implementation notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions