Skip to content

Commit 205f003

Browse files
Mateuszclaude
andcommitted
Update Qwen OAuth reasoning behavior and enhance ZAI coding plan backend
- Change Qwen OAuth to append " /think" by default (except when reasoning_effort="low") - Improve ZAI coding plan backend with better error handling, logging, and model discovery - Add comprehensive debugging capabilities and masked API key logging - Update wire capture services with enhanced functionality - Add new test scripts for proxy and ZAI direct testing - Move Qwen reasoning documentation to docs/ folder - Update test suite state and various test files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 3e833e5 commit 205f003

18 files changed

+1025
-205
lines changed

QWEN_REASONING_EFFORT_FEATURE.md

Lines changed: 0 additions & 56 deletions
This file was deleted.

data/test_suite_state.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
{
2-
"test_count": 4528,
2+
"test_count": 4543,
33
"last_updated": "1761604243.5415785"
44
}
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Qwen OAuth Reasoning Effort Feature
2+
3+
## Overview
4+
Enhanced the Qwen OAuth connector to automatically append " /think" to messages to trigger Qwen's extended reasoning mode by default. The suffix is only skipped when reasoning effort is explicitly set to "low".
5+
6+
## Implementation Details
7+
8+
### Changes Made
9+
1. **Modified `src/connectors/qwen_oauth.py`**:
10+
- Updated `chat_completions()` method to detect `reasoning_effort` parameter
11+
- **By default**, appends " /think" to the last client message
12+
- Only skips appending when `reasoning_effort` is explicitly set to "low"
13+
- Only appends to user or system messages, not tool responses
14+
- Handles both Pydantic models and dict message formats
15+
16+
### How It Works
17+
- The connector checks if `reasoning_effort` is explicitly set to "low"
18+
- If NOT "low" (including None, empty string, or any other value), it appends " /think"
19+
- It finds the last client message (user or system role, skipping tool responses)
20+
- Appends " /think" to the content of that message
21+
- This triggers Qwen's extended reasoning mode for more thoughtful responses
22+
23+
### Usage Examples
24+
25+
**Default behavior (appends " /think"):**
26+
```python
27+
request = ChatRequest(
28+
model="qwen-turbo",
29+
messages=[
30+
ChatMessage(role="user", content="What is 2+2?")
31+
]
32+
# No reasoning_effort specified - will append " /think"
33+
)
34+
```
35+
Result: "What is 2+2? /think"
36+
37+
**Explicitly disable reasoning mode:**
38+
```python
39+
request = ChatRequest(
40+
model="qwen-turbo",
41+
messages=[
42+
ChatMessage(role="user", content="Simple question")
43+
],
44+
reasoning_effort="low" # Only "low" prevents appending
45+
)
46+
```
47+
Result: "Simple question" (no modification)
48+
49+
**Explicit reasoning modes (also append):**
50+
```python
51+
request = ChatRequest(
52+
model="qwen-turbo",
53+
messages=[
54+
ChatMessage(role="user", content="Complex problem")
55+
],
56+
reasoning_effort="high" # or "medium"
57+
)
58+
```
59+
Result: "Complex problem /think"
60+
61+
### Test Coverage
62+
Created comprehensive test suite in `tests/unit/test_qwen_oauth_reasoning_effort.py`:
63+
- ✅ Test default (no reasoning_effort) appends " /think"
64+
- ✅ Test reasoning_effort="medium" appends " /think"
65+
- ✅ Test reasoning_effort="high" appends " /think"
66+
- ✅ Test reasoning_effort="low" does NOT append
67+
- ✅ Test reasoning_effort=None appends " /think"
68+
- ✅ Test reasoning_effort="" (empty string) appends " /think"
69+
- ✅ Test skips tool response messages
70+
- ✅ Test works with system messages
71+
- ✅ Test works with multiple messages (only last user message modified)
72+
- ✅ Test works with Pydantic ChatMessage objects
73+
74+
All qwen-related tests pass, including the 10 new tests.
75+
76+
## Behavior Summary
77+
- **Default (no reasoning_effort)**: Appends " /think" ✅
78+
- **reasoning_effort=None**: Appends " /think" ✅
79+
- **reasoning_effort=""**: Appends " /think" ✅
80+
- **reasoning_effort="low"**: Does NOT append ❌
81+
- **reasoning_effort="medium"**: Appends " /think" ✅
82+
- **reasoning_effort="high"**: Appends " /think" ✅
83+
- **Any other value**: Appends " /think" ✅
84+
85+
## Notes
86+
- The " /think" suffix is only appended to regular messages, not tool call responses
87+
- The modification happens before the message is sent to the Qwen API
88+
- This feature is specific to the Qwen OAuth connector and leverages Qwen's native reasoning capabilities
89+
- The default behavior enables extended reasoning for better response quality
90+
- Users can opt-out by explicitly setting `reasoning_effort="low"`

scripts/proxy_test.py

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
#!/usr/bin/env python
2+
"""
3+
Quick check of the local proxy with OpenAI-compatible client.
4+
"""
5+
6+
from __future__ import annotations
7+
8+
import sys
9+
10+
from openai import OpenAI
11+
12+
13+
def main() -> None:
14+
sys.stdout.reconfigure(encoding="utf-8")
15+
client = OpenAI(
16+
api_key="test-placeholder",
17+
base_url="http://127.0.0.1:8000/v1",
18+
)
19+
import httpx
20+
21+
request_payload = {
22+
"model": "glm-4.6",
23+
"messages": [
24+
{"role": "system", "content": "You are a concise assistant."},
25+
{"role": "user", "content": "Return the string `ok` and nothing else."},
26+
],
27+
"stream": False,
28+
}
29+
with httpx.Client(base_url="http://127.0.0.1:8000") as client_raw:
30+
resp = client_raw.post(
31+
"/v1/chat/completions",
32+
json=request_payload,
33+
headers={"Authorization": "Bearer test-placeholder"},
34+
)
35+
print("status", resp.status_code)
36+
print("headers", resp.headers)
37+
print("body bytes", resp.content[:200])
38+
39+
40+
if __name__ == "__main__":
41+
main()

scripts/zai_direct_test.py

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
#!/usr/bin/env python
2+
"""
3+
Minimal OpenAI-compatible client for testing ZAI Coding Plan access.
4+
5+
This script sends a simple non-streaming chat completion request directly
6+
to https://api.z.ai/api/coding/paas/v4 using the provided API key.
7+
"""
8+
9+
from __future__ import annotations
10+
11+
import sys
12+
13+
from openai import OpenAI
14+
15+
16+
def main() -> None:
17+
sys.stdout.reconfigure(encoding="utf-8")
18+
19+
client = OpenAI(
20+
api_key="your-zai-api-key-here",
21+
base_url="https://api.z.ai/api/coding/paas/v4",
22+
)
23+
24+
response = client.chat.completions.create(
25+
model="glm-4.6",
26+
messages=[
27+
{"role": "system", "content": "You are a concise assistant."},
28+
{"role": "user", "content": "Return the string `ok` and nothing else."},
29+
],
30+
max_tokens=64,
31+
stream=False,
32+
)
33+
34+
print(response.model_dump_json(indent=2))
35+
36+
37+
if __name__ == "__main__":
38+
main()

src/connectors/qwen_oauth.py

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1013,10 +1013,10 @@ async def chat_completions(
10131013
"""Handle chat completions using Qwen OAuth API.
10141014
10151015
This overrides the parent class method to ensure credentials are valid before API call.
1016-
1016+
10171017
Special handling for reasoning_effort:
1018-
- When reasoning_effort is set to "medium" or "high", this method appends " /think"
1019-
to the last client message (user or system role, not tool responses).
1018+
- By default, this method appends " /think" to the last client message (user or system role).
1019+
- The suffix is NOT appended only when reasoning_effort is explicitly set to "low".
10201020
- This triggers Qwen's extended reasoning mode for more thoughtful responses.
10211021
- The " /think" suffix is only appended to regular messages, not tool call responses.
10221022
"""
@@ -1041,13 +1041,17 @@ async def chat_completions(
10411041
)
10421042

10431043
# Handle reasoning_effort by appending " /think" to the last user message
1044+
# Append by default unless explicitly set to "low"
10441045
reasoning_effort = None
10451046
if hasattr(request_data, "reasoning_effort"):
10461047
reasoning_effort = request_data.reasoning_effort
10471048
elif isinstance(request_data, dict):
10481049
reasoning_effort = request_data.get("reasoning_effort")
10491050

1050-
if reasoning_effort in ("medium", "high") and processed_messages:
1051+
# Append " /think" unless reasoning_effort is explicitly "low"
1052+
should_append_think = reasoning_effort != "low"
1053+
1054+
if should_append_think and processed_messages:
10511055
# Find the last message from the client (user or system role, not tool responses)
10521056
last_client_message_idx = None
10531057
for idx in range(len(processed_messages) - 1, -1, -1):
@@ -1057,24 +1061,24 @@ async def chat_completions(
10571061
role = msg.role
10581062
elif isinstance(msg, dict):
10591063
role = msg.get("role")
1060-
1064+
10611065
# Skip tool response messages
10621066
if role in ("user", "system"):
10631067
last_client_message_idx = idx
10641068
break
1065-
1069+
10661070
if last_client_message_idx is not None:
10671071
# Append " /think" to the content of the last client message
10681072
msg = processed_messages[last_client_message_idx]
1069-
1073+
10701074
# Handle different message formats
10711075
if hasattr(msg, "content"):
10721076
content = msg.content
10731077
if isinstance(content, str):
10741078
# Create a modified copy of the message
10751079
if hasattr(msg, "model_copy"):
1076-
processed_messages[last_client_message_idx] = msg.model_copy(
1077-
update={"content": content + " /think"}
1080+
processed_messages[last_client_message_idx] = (
1081+
msg.model_copy(update={"content": content + " /think"})
10781082
)
10791083
elif hasattr(msg, "copy"):
10801084
modified_msg = msg.copy()
@@ -1084,7 +1088,7 @@ async def chat_completions(
10841088
# Fallback: modify in place
10851089
msg.content = content + " /think"
10861090
logger.info(
1087-
f"Appended ' /think' to last client message due to reasoning_effort={reasoning_effort}"
1091+
f"Appended ' /think' to last client message (reasoning_effort={reasoning_effort or 'default'})"
10881092
)
10891093
elif isinstance(msg, dict):
10901094
content = msg.get("content")
@@ -1094,10 +1098,9 @@ async def chat_completions(
10941098
modified_msg["content"] = content + " /think"
10951099
processed_messages[last_client_message_idx] = modified_msg
10961100
logger.info(
1097-
f"Appended ' /think' to last client message due to reasoning_effort={reasoning_effort}"
1101+
f"Appended ' /think' to last client message (reasoning_effort={reasoning_effort or 'default'})"
10981102
)
10991103

1100-
11011104
try:
11021105
# Use the effective model and properly extract just the model name part
11031106
# Strip any backend prefix (like "qwen-oauth:", "gemini-cli-oauth-personal:", etc.)

0 commit comments

Comments
 (0)