matdev83
diff --git a/‎DEBUGGING_EMPTY_RESPONSES.md‎
Lines changed: 101 additions & 0 deletions b/‎DEBUGGING_EMPTY_RESPONSES.md‎
Lines changed: 101 additions & 0 deletions
diff --git a/‎STREAMING_TRANSLATION_FIX.md‎
Lines changed: 118 additions & 0 deletions b/‎STREAMING_TRANSLATION_FIX.md‎
Lines changed: 118 additions & 0 deletions
diff --git a/‎data/test_suite_state.json‎
Lines changed: 1 addition & 1 deletion b/‎data/test_suite_state.json‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/zai-max-tokens-implementation.md‎
Lines changed: 97 additions & 0 deletions b/‎docs/zai-max-tokens-implementation.md‎
Lines changed: 97 additions & 0 deletions
@@ -0,0 +1,101 @@
+# Debugging Empty Streaming Responses from ZAI
+
+## Current Issue
+
+Client reports: "The model's response ended unexpectedly (no assistant messages). This may be a sign of rate limiting."
+
+## Evidence from Logs
+
+### Wire Capture (`logs/wire_capture.log`)
+Shows streaming responses contain only empty deltas:
+```json
+{
+  "id": "chatcmpl-...",
+  "object": "chat.completion.chunk",
+  "created": 1761854814,
+  "model": "claude-3-opus-20240229",
+  "choices": [{
+    "index": 0,
+    "delta": {},  // ← EMPTY!
+    "finish_reason": null
+  }]
+}
+```
+
+Each request receives exactly 2 chunks, both with empty deltas, then stream ends.
+
+### Proxy Log (`logs/proxy.log`)
+Shows the request is being processed and streaming response is returned, but no errors are logged.
+
+## Root Cause Analysis
+
+The translation is working (chunks are in correct OpenAI format), but the chunks contain no content. This means:
+
+1. **Either**: ZAI backend is sending events without content (ping events, metadata events, etc.)
+2. **Or**: Our translation function is not extracting content from the ZAI response format
+3. **Or**: ZAI backend is ending the stream prematurely without sending actual content
+
+## Debugging Steps Added
+
+Added logging to `src/connectors/anthropic.py` to capture:
+- Raw chunks from ZAI backend before translation
+- Translated chunk deltas after translation
+
+## Resolution
+
+**FOUND THE ISSUE**: ZAI backend is returning error events instead of content:
+
+```
+event: error
+data: {"type": "error", "error": {"type": "1113", "message": "Insufficient balance or no resource package. Please recharge."}, "request_id": "..."}
+```
+
+### Root Cause
+The ZAI API account has insufficient balance or no resource package. This is a **billing/account issue**, not a code issue.
+
+### Why Client Shows "No Assistant Messages"
+1. ZAI returns error events instead of content events
+2. Our translation correctly converts error events to empty deltas (no content)
+3. Client receives only empty chunks and reports "no assistant messages"
+
+### Solution
+**Recharge the ZAI API account** or ensure it has an active resource package.
+
+### Code Status
+The streaming translation fix is working correctly. The translation properly handles:
+- ✅ Anthropic SSE format parsing
+- ✅ Error event handling (now raises BackendError with clear message)
+- ✅ Content event handling (would extract text if present)
+
+### Improvements Made
+Added proper error handling in `src/connectors/anthropic.py`:
+- Detects error events in streaming responses
+- Extracts error message and type from error events
+- Raises `BackendError` with clear error message instead of silently returning empty responses
+- Includes error details for debugging
+
+Now when ZAI returns an error like "Insufficient balance", the client will receive a proper error message instead of "no assistant messages".
+
+### Tests Added
+Created `tests/unit/connectors/test_anthropic_error_handling.py` with 3 tests:
+- ✅ Test error event handling in Anthropic connector
+- ✅ Test generic error handling
+- ✅ Test zai-coding-plan inherits error handling
+
+All tests pass.
+
+## Expected Anthropic SSE Format
+
+Standard Anthropic streaming should include events like:
+```
+event: message_start
+data: {"type":"message_start","message":{"role":"assistant"}}
+
+event: content_block_delta  
+data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}}
+
+event: message_stop
+data: {"type":"message_stop"}
+```
+
+If ZAI is sending a different format, we need to adjust the translation accordingly.
@@ -0,0 +1,118 @@
+# Anthropic/ZAI Streaming Translation Fix
+
+## Problem
+
+The Anthropic connector (and by inheritance, the zai-coding-plan connector) was not translating streaming chunks to the internal domain format. This caused Anthropic-formatted SSE chunks to flow through the system untranslated, breaking cross-API compatibility.
+
+### Root Cause
+
+**OpenAI Connector** (`src/connectors/openai.py` lines 629-637):
+- ✅ Translates each streaming chunk using `translation_service.to_domain_stream_chunk()`
+- Converts OpenAI/Responses API format → domain format (OpenAI-compatible)
+
+**Anthropic Connector** (`src/connectors/anthropic.py` lines 530-540):
+- ❌ Did NOT translate streaming chunks
+- Just passed through raw Anthropic SSE chunks wrapped in `ProcessedResponse`
+
+**zai-coding-plan Connector**:
+- Inherits from `AnthropicBackend`
+- Does not override `_handle_streaming_response()`
+- Therefore inherited the broken streaming behavior
+
+## Solution
+
+### 1. Fixed Anthropic Connector Streaming (`src/connectors/anthropic.py`)
+
+Updated the `event_stream()` function to translate each chunk:
+
+```python
+async def event_stream() -> AsyncGenerator[ProcessedResponse, None]:
+    try:
+        async for chunk in response.aiter_text():
+            _capture_message_id(chunk)
+            
+            # Translate Anthropic SSE chunk to domain format
+            domain_chunk = self.translation_service.to_domain_stream_chunk(
+                chunk, "anthropic"
+            )
+            yield ProcessedResponse(content=domain_chunk)
+        
+        # Translate final [DONE] marker
+        done_chunk = self.translation_service.to_domain_stream_chunk(
+            "data: [DONE]\n\n", "anthropic"
+        )
+        yield ProcessedResponse(content=done_chunk)
+```
+
+### 2. Enhanced Translation Function (`src/core/domain/translation.py`)
+
+Updated `anthropic_to_domain_stream_chunk()` to handle SSE format:
+
+**Before**: Only accepted parsed JSON dicts
+**After**: Accepts both SSE-formatted strings and JSON dicts
+
+Key improvements:
+- Parses multi-line SSE events (with `event:` and `data:` lines)
+- Extracts JSON from `data:` lines
+- Handles all Anthropic event types:
+  - `message_start` → sets role
+  - `content_block_delta` → extracts text content
+  - `message_delta` → maps stop_reason to finish_reason
+  - `message_stop` → marks completion
+- Maps Anthropic stop reasons to OpenAI equivalents:
+  - `end_turn` → `stop`
+  - `max_tokens` → `length`
+  - `tool_use` → `tool_calls`
+- Handles `[DONE]` markers
+- Backward compatible with dict format
+
+## Tests Created
+
+### Translation Layer Tests (`tests/unit/core/domain/test_translation_anthropic_streaming.py`)
+
+16 comprehensive tests covering:
+- SSE content deltas
+- Message start/stop events
+- Stop reason mapping
+- [DONE] marker handling
+- Event line parsing
+- Multi-line SSE format
+- Invalid JSON handling
+- Backward compatibility with dict format
+- OpenAI structure preservation
+
+### Connector Tests (`tests/unit/connectors/test_anthropic_streaming_translation.py`)
+
+4 integration tests covering:
+- End-to-end Anthropic streaming translation
+- SSE format handling in connector
+- [DONE] marker translation
+- zai-coding-plan inheritance verification
+
+## Impact
+
+### Fixed
+- ✅ Anthropic connector now emits domain-formatted chunks
+- ✅ zai-coding-plan connector inherits the fix automatically
+- ✅ Cross-API translation works correctly for streaming
+- ✅ Downstream processors receive consistent OpenAI-style format
+
+### Verified
+- ✅ All 20 new tests pass
+- ✅ All 15 existing translation tests still pass
+- ✅ Backward compatibility maintained
+
+## Why Tests Didn't Catch This
+
+The existing tests mocked the translation service or didn't verify the actual format of streaming chunks. The new tests:
+1. Test the actual translation function with SSE input
+2. Test the connector's streaming handler end-to-end
+3. Verify the output format matches OpenAI structure
+4. Ensure zai-coding-plan inherits the correct behavior
+
+## Files Modified
+
+1. `src/connectors/anthropic.py` - Added streaming translation
+2. `src/core/domain/translation.py` - Enhanced SSE parsing
+3. `tests/unit/connectors/test_anthropic_streaming_translation.py` - New connector tests
+4. `tests/unit/core/domain/test_translation_anthropic_streaming.py` - New translation tests
@@ -1,4 +1,4 @@
 {
-  "test_count": 4494,
+  "test_count": 4502,
   "last_updated": "1761604243.5415785"
 }
@@ -0,0 +1,97 @@
+# ZAI Backend Max Tokens Implementation
+
+## Overview
+
+Both ZAI connectors (`zai` and `zai-coding-plan`) now enforce a 128K (131,072 tokens) maximum output limit as specified by the ZAI API provider.
+
+## Implementation Details
+
+### Default Behavior
+- **Default max_tokens**: 131,072 (128K)
+- This is the maximum supported by ZAI's backend models
+- Used when client doesn't explicitly specify max_tokens or provides invalid values (None, 0, negative)
+
+### Client Override Rules
+Clients can override the default by explicitly setting `max_tokens` in their request:
+
+1. **Valid Range**: 1,024 to 131,072 tokens
+   - Values below 1K are clamped to 1,024
+   - Values above 128K are clamped to 131,072
+   - Values within range are preserved as-is
+
+2. **Invalid Values**: None, 0, or negative numbers
+   - Automatically use the 128K default
+   - Ensures requests never fail due to missing/invalid max_tokens
+
+### Code Locations
+
+#### ZaiCodingPlanBackend
+- File: `src/connectors/zai_coding_plan.py`
+- Method: `_prepare_anthropic_payload()`
+- Inherits from: `AnthropicBackend`
+
+#### ZAIConnector
+- File: `src/connectors/zai.py`
+- Method: `_prepare_payload()`
+- Inherits from: `OpenAIConnector`
+
+## Examples
+
+### Example 1: No max_tokens specified
+```python
+request = {
+    "model": "zai-coding-plan:claude-sonnet-4-20250514",
+    "messages": [{"role": "user", "content": "Hello"}],
+    # max_tokens not specified
+}
+# Result: max_tokens = 131072 (128K)
+```
+
+### Example 2: Explicit valid value
+```python
+request = {
+    "model": "zai-coding-plan:claude-sonnet-4-20250514",
+    "messages": [{"role": "user", "content": "Hello"}],
+    "max_tokens": 4096
+}
+# Result: max_tokens = 4096 (preserved)
+```
+
+### Example 3: Value below minimum
+```python
+request = {
+    "model": "zai-coding-plan:claude-sonnet-4-20250514",
+    "messages": [{"role": "user", "content": "Hello"}],
+    "max_tokens": 512
+}
+# Result: max_tokens = 1024 (clamped to minimum)
+```
+
+### Example 4: Value above maximum
+```python
+request = {
+    "model": "zai-coding-plan:claude-sonnet-4-20250514",
+    "messages": [{"role": "user", "content": "Hello"}],
+    "max_tokens": 200000
+}
+# Result: max_tokens = 131072 (clamped to maximum)
+```
+
+## Testing
+
+Comprehensive test suite in `tests/unit/connectors/test_zai_max_tokens.py` covers:
+- Default behavior (None, 0, negative values)
+- Explicit valid values preservation
+- Minimum boundary clamping
+- Maximum boundary clamping
+- Exact boundary values
+
+All tests pass successfully.
+
+## Benefits
+
+1. **Prevents 422 Errors**: Ensures max_tokens is always valid
+2. **Maximizes Output**: Uses 128K by default for agentic coding tasks
+3. **Client Control**: Allows explicit override within valid range
+4. **Robust**: Handles edge cases (None, 0, negative, out-of-range)
+5. **Consistent**: Same logic across both ZAI connectors
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`	`1`	`{`
`2`		`- "test_count": 4494,`
	`2`	`+ "test_count": 4502,`
`3`	`3`	`"last_updated": "1761604243.5415785"`
`4`	`4`	`}`