matdev83
diff --git a/‎CHANGELOG.md‎
Lines changed: 8 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎QWEN_REASONING_EFFORT_FEATURE.md‎
Lines changed: 56 additions & 0 deletions b/‎QWEN_REASONING_EFFORT_FEATURE.md‎
Lines changed: 56 additions & 0 deletions
diff --git a/‎docs/zai-max-tokens-implementation.md‎
Lines changed: 18 additions & 18 deletions b/‎docs/zai-max-tokens-implementation.md‎
Lines changed: 18 additions & 18 deletions
@@ -486,6 +486,14 @@ This document outlines significant changes and updates to the LLM Interactive Pr
   - **Structured Error Responses**: Returns detailed 400 Bad Request responses with measured vs. limit token counts and error codes.
   - **Configuration Integration**: CLI override takes precedence over config file settings while maintaining compatibility with existing configurations.
   - **Environment Variable Support**: Sets `FORCE_CONTEXT_WINDOW` environment variable for downstream processes.
+
+## 2025-10-31 - ZAI Coding Plan GLM 4.6 Support
+
+- **Model Updates**: ZAI coding plan now preserves the client-provided model and defaults to `glm-4.6`, keeping `claude-sonnet-4-20250514` available as a legacy option.
+  - **Anthropic Routing**: Chat controller now forwards the resolved model name to the Anthropic compatibility path instead of forcing `claude-sonnet-4-20250514`.
+  - **API Headers**: ZAI connector overrides `get_headers` to include the current KiloCode metadata required by the upstream service.
+  - **Capabilities**: Model capability registry exposes entries for `glm-4.6`, `zai-coding-plan`, and the legacy Claude variant with updated metadata.
+  - **Testing & Docs**: Unit/integration tests and documentation refreshed to reflect GLM 4.6, with new coverage ensuring headers and payload models are preserved.
   - **Schema Validation**: Updated YAML schema to support the new `context_window_override` field.
   - **Comprehensive Testing**: Full test coverage for CLI argument parsing, enforcement logic, and edge cases.
   - **Documentation**: Enhanced README with detailed examples, use cases, and troubleshooting guidance.
 
@@ -0,0 +1,56 @@
+# Qwen OAuth Reasoning Effort Feature
+
+## Overview
+Enhanced the Qwen OAuth connector to support reasoning effort levels by automatically appending " /think" to messages when reasoning effort is set to medium or high.
+
+## Implementation Details
+
+### Changes Made
+1. **Modified `src/connectors/qwen_oauth.py`**:
+   - Updated `chat_completions()` method to detect `reasoning_effort` parameter
+   - When `reasoning_effort` is "medium" or "high", appends " /think" to the last client message
+   - Only appends to user or system messages, not tool responses
+   - Handles both Pydantic models and dict message formats
+
+### How It Works
+- The connector checks if `reasoning_effort` is set to "medium" or "high"
+- It finds the last client message (user or system role, skipping tool responses)
+- Appends " /think" to the content of that message
+- This triggers Qwen's extended reasoning mode for more thoughtful responses
+
+### Usage Example
+```python
+request = ChatRequest(
+    model="qwen-turbo",
+    messages=[
+        ChatMessage(role="user", content="What is 2+2?")
+    ],
+    reasoning_effort="medium"  # or "high"
+)
+```
+
+The message will be transformed to: "What is 2+2? /think"
+
+### Test Coverage
+Created comprehensive test suite in `tests/unit/test_qwen_oauth_reasoning_effort.py`:
+- ✅ Test reasoning_effort="medium" appends " /think"
+- ✅ Test reasoning_effort="high" appends " /think"
+- ✅ Test reasoning_effort="low" does NOT append
+- ✅ Test no reasoning_effort does NOT append
+- ✅ Test skips tool response messages
+- ✅ Test works with system messages
+- ✅ Test works with multiple messages (only last user message modified)
+- ✅ Test works with Pydantic ChatMessage objects
+
+All 132 qwen-related tests pass, including the 8 new tests.
+
+## Behavior
+- **reasoning_effort="low"**: No modification (standard behavior)
+- **reasoning_effort="medium"**: Appends " /think" to last client message
+- **reasoning_effort="high"**: Appends " /think" to last client message
+- **No reasoning_effort**: No modification (standard behavior)
+
+## Notes
+- The " /think" suffix is only appended to regular messages, not tool call responses
+- The modification happens before the message is sent to the Qwen API
+- This feature is specific to the Qwen OAuth connector and leverages Qwen's native reasoning capabilities
@@ -7,9 +7,9 @@ Both ZAI connectors (`zai` and `zai-coding-plan`) now enforce a 128K (131,072 to
 ## Implementation Details
 
 ### Default Behavior
-- **Default max_tokens**: 131,072 (128K)
-- This is the maximum supported by ZAI's backend models
-- Used when client doesn't explicitly specify max_tokens or provides invalid values (None, 0, negative)
+- **Default max_tokens**: 131,072 (128K)
+- This is the maximum supported by ZAI's backend models
+- Used when client doesn't explicitly specify max_tokens or provides invalid values (None, 0, negative)
 
 ### Client Override Rules
 Clients can override the default by explicitly setting `max_tokens` in their request:
@@ -25,10 +25,10 @@ Clients can override the default by explicitly setting `max_tokens` in their req
 
 ### Code Locations
 
-#### ZaiCodingPlanBackend
-- File: `src/connectors/zai_coding_plan.py`
-- Method: `_prepare_anthropic_payload()`
-- Inherits from: `AnthropicBackend`
+#### ZaiCodingPlanBackend
+- File: `src/connectors/zai_coding_plan.py`
+- Method: `_prepare_payload()`
+- Inherits from: `OpenAIConnector`
 
 #### ZAIConnector
 - File: `src/connectors/zai.py`
@@ -38,19 +38,19 @@ Clients can override the default by explicitly setting `max_tokens` in their req
 ## Examples
 
 ### Example 1: No max_tokens specified
-```python
-request = {
-    "model": "zai-coding-plan:claude-sonnet-4-20250514",
-    "messages": [{"role": "user", "content": "Hello"}],
-    # max_tokens not specified
-}
-# Result: max_tokens = 131072 (128K)
-```
+```python
+request = {
+    "model": "zai-coding-plan:glm-4.6",
+    "messages": [{"role": "user", "content": "Hello"}],
+    # max_tokens not specified
+}
+# Result: max_tokens = 131072 (128K)
+```
 
 ### Example 2: Explicit valid value
 ```python
 request = {
-    "model": "zai-coding-plan:claude-sonnet-4-20250514",
+    "model": "zai-coding-plan:glm-4.6",
     "messages": [{"role": "user", "content": "Hello"}],
     "max_tokens": 4096
 }
@@ -60,7 +60,7 @@ request = {
 ### Example 3: Value below minimum
 ```python
 request = {
-    "model": "zai-coding-plan:claude-sonnet-4-20250514",
+    "model": "zai-coding-plan:glm-4.6",
     "messages": [{"role": "user", "content": "Hello"}],
     "max_tokens": 512
 }
@@ -70,7 +70,7 @@ request = {
 ### Example 4: Value above maximum
 ```python
 request = {
-    "model": "zai-coding-plan:claude-sonnet-4-20250514",
+    "model": "zai-coding-plan:glm-4.6",
     "messages": [{"role": "user", "content": "Hello"}],
     "max_tokens": 200000
 }