Skip to content

Commit 3e833e5

Browse files
author
matdev83
committed
WIP; fixes
1 parent b8a2849 commit 3e833e5

15 files changed

+860
-233
lines changed

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -486,6 +486,14 @@ This document outlines significant changes and updates to the LLM Interactive Pr
486486
- **Structured Error Responses**: Returns detailed 400 Bad Request responses with measured vs. limit token counts and error codes.
487487
- **Configuration Integration**: CLI override takes precedence over config file settings while maintaining compatibility with existing configurations.
488488
- **Environment Variable Support**: Sets `FORCE_CONTEXT_WINDOW` environment variable for downstream processes.
489+
490+
## 2025-10-31 - ZAI Coding Plan GLM 4.6 Support
491+
492+
- **Model Updates**: ZAI coding plan now preserves the client-provided model and defaults to `glm-4.6`, keeping `claude-sonnet-4-20250514` available as a legacy option.
493+
- **Anthropic Routing**: Chat controller now forwards the resolved model name to the Anthropic compatibility path instead of forcing `claude-sonnet-4-20250514`.
494+
- **API Headers**: ZAI connector overrides `get_headers` to include the current KiloCode metadata required by the upstream service.
495+
- **Capabilities**: Model capability registry exposes entries for `glm-4.6`, `zai-coding-plan`, and the legacy Claude variant with updated metadata.
496+
- **Testing & Docs**: Unit/integration tests and documentation refreshed to reflect GLM 4.6, with new coverage ensuring headers and payload models are preserved.
489497
- **Schema Validation**: Updated YAML schema to support the new `context_window_override` field.
490498
- **Comprehensive Testing**: Full test coverage for CLI argument parsing, enforcement logic, and edge cases.
491499
- **Documentation**: Enhanced README with detailed examples, use cases, and troubleshooting guidance.

QWEN_REASONING_EFFORT_FEATURE.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Qwen OAuth Reasoning Effort Feature
2+
3+
## Overview
4+
Enhanced the Qwen OAuth connector to support reasoning effort levels by automatically appending " /think" to messages when reasoning effort is set to medium or high.
5+
6+
## Implementation Details
7+
8+
### Changes Made
9+
1. **Modified `src/connectors/qwen_oauth.py`**:
10+
- Updated `chat_completions()` method to detect `reasoning_effort` parameter
11+
- When `reasoning_effort` is "medium" or "high", appends " /think" to the last client message
12+
- Only appends to user or system messages, not tool responses
13+
- Handles both Pydantic models and dict message formats
14+
15+
### How It Works
16+
- The connector checks if `reasoning_effort` is set to "medium" or "high"
17+
- It finds the last client message (user or system role, skipping tool responses)
18+
- Appends " /think" to the content of that message
19+
- This triggers Qwen's extended reasoning mode for more thoughtful responses
20+
21+
### Usage Example
22+
```python
23+
request = ChatRequest(
24+
model="qwen-turbo",
25+
messages=[
26+
ChatMessage(role="user", content="What is 2+2?")
27+
],
28+
reasoning_effort="medium" # or "high"
29+
)
30+
```
31+
32+
The message will be transformed to: "What is 2+2? /think"
33+
34+
### Test Coverage
35+
Created comprehensive test suite in `tests/unit/test_qwen_oauth_reasoning_effort.py`:
36+
- ✅ Test reasoning_effort="medium" appends " /think"
37+
- ✅ Test reasoning_effort="high" appends " /think"
38+
- ✅ Test reasoning_effort="low" does NOT append
39+
- ✅ Test no reasoning_effort does NOT append
40+
- ✅ Test skips tool response messages
41+
- ✅ Test works with system messages
42+
- ✅ Test works with multiple messages (only last user message modified)
43+
- ✅ Test works with Pydantic ChatMessage objects
44+
45+
All 132 qwen-related tests pass, including the 8 new tests.
46+
47+
## Behavior
48+
- **reasoning_effort="low"**: No modification (standard behavior)
49+
- **reasoning_effort="medium"**: Appends " /think" to last client message
50+
- **reasoning_effort="high"**: Appends " /think" to last client message
51+
- **No reasoning_effort**: No modification (standard behavior)
52+
53+
## Notes
54+
- The " /think" suffix is only appended to regular messages, not tool call responses
55+
- The modification happens before the message is sent to the Qwen API
56+
- This feature is specific to the Qwen OAuth connector and leverages Qwen's native reasoning capabilities

docs/zai-max-tokens-implementation.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ Both ZAI connectors (`zai` and `zai-coding-plan`) now enforce a 128K (131,072 to
77
## Implementation Details
88

99
### Default Behavior
10-
- **Default max_tokens**: 131,072 (128K)
11-
- This is the maximum supported by ZAI's backend models
12-
- Used when client doesn't explicitly specify max_tokens or provides invalid values (None, 0, negative)
10+
- **Default max_tokens**: 131,072 (128K)
11+
- This is the maximum supported by ZAI's backend models
12+
- Used when client doesn't explicitly specify max_tokens or provides invalid values (None, 0, negative)
1313

1414
### Client Override Rules
1515
Clients can override the default by explicitly setting `max_tokens` in their request:
@@ -25,10 +25,10 @@ Clients can override the default by explicitly setting `max_tokens` in their req
2525

2626
### Code Locations
2727

28-
#### ZaiCodingPlanBackend
29-
- File: `src/connectors/zai_coding_plan.py`
30-
- Method: `_prepare_anthropic_payload()`
31-
- Inherits from: `AnthropicBackend`
28+
#### ZaiCodingPlanBackend
29+
- File: `src/connectors/zai_coding_plan.py`
30+
- Method: `_prepare_payload()`
31+
- Inherits from: `OpenAIConnector`
3232

3333
#### ZAIConnector
3434
- File: `src/connectors/zai.py`
@@ -38,19 +38,19 @@ Clients can override the default by explicitly setting `max_tokens` in their req
3838
## Examples
3939

4040
### Example 1: No max_tokens specified
41-
```python
42-
request = {
43-
"model": "zai-coding-plan:claude-sonnet-4-20250514",
44-
"messages": [{"role": "user", "content": "Hello"}],
45-
# max_tokens not specified
46-
}
47-
# Result: max_tokens = 131072 (128K)
48-
```
41+
```python
42+
request = {
43+
"model": "zai-coding-plan:glm-4.6",
44+
"messages": [{"role": "user", "content": "Hello"}],
45+
# max_tokens not specified
46+
}
47+
# Result: max_tokens = 131072 (128K)
48+
```
4949

5050
### Example 2: Explicit valid value
5151
```python
5252
request = {
53-
"model": "zai-coding-plan:claude-sonnet-4-20250514",
53+
"model": "zai-coding-plan:glm-4.6",
5454
"messages": [{"role": "user", "content": "Hello"}],
5555
"max_tokens": 4096
5656
}
@@ -60,7 +60,7 @@ request = {
6060
### Example 3: Value below minimum
6161
```python
6262
request = {
63-
"model": "zai-coding-plan:claude-sonnet-4-20250514",
63+
"model": "zai-coding-plan:glm-4.6",
6464
"messages": [{"role": "user", "content": "Hello"}],
6565
"max_tokens": 512
6666
}
@@ -70,7 +70,7 @@ request = {
7070
### Example 4: Value above maximum
7171
```python
7272
request = {
73-
"model": "zai-coding-plan:claude-sonnet-4-20250514",
73+
"model": "zai-coding-plan:glm-4.6",
7474
"messages": [{"role": "user", "content": "Hello"}],
7575
"max_tokens": 200000
7676
}

0 commit comments

Comments
 (0)