feat(feishu): CardKit streaming cards and SSE hang fixes (#287) by 39499740 · Pull Request #292 · dataelement/Clawith

39499740 · 2026-04-04T05:25:48Z

Summary

Implements CardKit streaming card API for smooth typewriter-style output in Feishu (Lark), and fixes multiple SSE stream hanging issues that caused the bot to become unresponsive.

Closes #287

What's New

CardKit Streaming Card Integration

create_card_entity() — Create CardKit card entities for streaming
send_card_by_card_id() — Send cards by card_id via IM API
stream_card_content() — Element-level streaming content push (500ms refresh interval)
set_card_streaming_mode() — Enable/disable streaming mode on cards
update_cardkit_card() — Full card content update after streaming completes

Dual-Path Design with Graceful Degradation

CardKit path (preferred): create → stream → close streaming → final update
IM Patch path (fallback): send interactive card → patch updates
Plain text (last resort): simple text message

Streaming Flush Control

asyncio.Lock-protected _flush_stream() — prevents sequence conflicts
_SerialPatchQueue — serializes IM patch requests to prevent out-of-order overwrites
Heartbeat refresh task — periodic card updates during tool execution

Bug Fixes

1. Anthropic SSE Stream Hang (Root Cause)

AnthropicClient.stream() waited indefinitely for a message_stop event that Zhipu's Anthropic-compatible API never sends. After receiving message_delta with stop_reason, the aiter_lines() loop hung forever.

Fix: Break immediately on stop_reason in message_delta, before waiting for message_stop.

2. `CancelledError` Not Caught in Heartbeat Cleanup

In Python 3.9+, asyncio.CancelledError inherits from BaseException, not Exception. The except Exception: pass block in the heartbeat task cleanup did not catch it, causing the exception to propagate and skip the final card update entirely.

Fix: Changed to except (Exception, asyncio.CancelledError).

3. httpx System Proxy Interference

httpx.AsyncClient auto-detects macOS system proxy settings, which can interfere with long-lived SSE connections to LLM APIs.

Fix: Added proxy=None to all httpx.AsyncClient constructors.

4. httpx `aclose()` Indefinite Blocking

When a streaming connection was terminated early (via break), httpx.AsyncClient.aclose() could hang indefinitely waiting for the server to finish sending.

Fix: Wrapped aclose() in asyncio.wait_for(..., timeout=5.0) for all client classes.

5. OpenAI/Gemini SSE Stream Termination

OpenAICompatibleClient: Only broke on [DONE], not on finish_reason. If a provider sends finish_reason without [DONE], the stream hangs.
GeminiClient: [DONE] was only continued (not breaking), and finishReason was recorded but never acted on. The client relied solely on HTTP connection close.

Fix: Added finish_reason break protection to both clients, matching the Anthropic client's defensive pattern.

Files Changed

File	Changes
`backend/app/api/feishu.py`	CardKit streaming integration, flush control, CancelledError fix
`backend/app/api/websocket.py`	Stream return logging
`backend/app/services/feishu_service.py`	5 new CardKit API methods
`backend/app/services/feishu_ws.py`	WebSocket proxy bypass
`backend/app/services/llm_client.py`	SSE break protection (all 3 clients), proxy=None, aclose timeout

Testing

Provider	Mode	Model	Result
Zhipu	Anthropic-compatible	GLM	✅ Streaming cards work end-to-end
Zhipu	OpenAI-compatible	GLM	✅ Streaming cards work end-to-end
Google	Gemini native	Gemini 2.5 Flash	✅ Streaming cards work end-to-end

…ataelement#287) Replace the im.message.patch-based streaming approach with CardKit streaming APIs (create_card_entity, stream_card_content, set_card_streaming_mode, update_cardkit_card) for silky smooth typewriter-style streaming output in Feishu cards. Key changes: - Add CardKit API methods to FeishuService (create_card_entity, send_card_by_card_id, stream_card_content, set_card_streaming_mode, update_cardkit_card) using lark-oapi SDK - Refactor streaming output in feishu.py to use CardKit as primary path with automatic fallback to IM patch when CardKit is unavailable - Use schema 2.0 card format with streaming_mode and element_id for incremental content updates (typewriter animation handled by Feishu) - Add collapsible thinking panel in final card - Reduce streaming flush interval to 0.5s for CardKit path (vs 1.0s for IM patch fallback) Refs: dataelement#287

websockets >= 13 auto-detects macOS system proxy settings. When a local proxy is configured but unable to handle WSS upgrade, the connection fails with 'did not receive a valid HTTP response from proxy'. Force proxy=None to bypass this. Refs: dataelement#287

Three root causes fixed: 1. AnthropicClient.stream() - break on stop_reason instead of waiting for message_stop event. Zhipu's Anthropic-compatible API may not send message_stop, causing aiter_lines() to hang forever. 2. _heartbeat_task cancel - CancelledError inherits BaseException in Python 3.9+, so except Exception does not catch it. This caused the final card update to be skipped after LLM completion. 3. httpx client hardening - proxy=None to avoid system proxy issues with SSE streams, and asyncio.wait_for timeout on aclose() to prevent indefinite blocking when closing connections.

…eams OpenAICompatibleClient now breaks on finish_reason in addition to [DONE], protecting all providers (Minimax, Custom, DeepSeek, Qwen, etc.) from hanging if [DONE] is never sent. GeminiClient now breaks on both [DONE] and finishReason instead of relying solely on connection close to end the SSE stream.

Reduce log noise in production by downgrading verbose SSE/streaming diagnostic logs from info to debug level. Only warnings and errors remain at info level.

wisdomqin

Great work! The CardKit streaming integration is well-structured with a solid 3-tier fallback design (CardKit -> IM Patch -> plain text), and the SSE hang fixes address real root causes across all three LLM clients. Approving for merge.

We will address the following in a follow-up commit after merge:

Add lark-oapi to requirements.txt
Scope the websockets proxy patch (avoid global monkey-patch)
Add tool call status display to the CardKit streaming path
Add a size cap to _lark_clients cache to prevent unbounded growth

39499740 added 5 commits April 3, 2026 23:43

chore: downgrade debug logs from info to debug level

bcdc948

Reduce log noise in production by downgrading verbose SSE/streaming diagnostic logs from info to debug level. Only warnings and errors remain at info level.

wisdomqin approved these changes Apr 5, 2026

View reviewed changes

wisdomqin merged commit ba9a8c3 into dataelement:main Apr 5, 2026

39499740 deleted the codex/issue-287-feishu-streaming branch April 8, 2026 03:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(feishu): CardKit streaming cards and SSE hang fixes (#287)#292

feat(feishu): CardKit streaming cards and SSE hang fixes (#287)#292
wisdomqin merged 5 commits intodataelement:mainfrom
39499740:codex/issue-287-feishu-streaming

39499740 commented Apr 4, 2026

Uh oh!

wisdomqin left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

39499740 commented Apr 4, 2026

Summary

What's New

CardKit Streaming Card Integration

Dual-Path Design with Graceful Degradation

Streaming Flush Control

Bug Fixes

1. Anthropic SSE Stream Hang (Root Cause)

2. CancelledError Not Caught in Heartbeat Cleanup

3. httpx System Proxy Interference

4. httpx aclose() Indefinite Blocking

5. OpenAI/Gemini SSE Stream Termination

Files Changed

Testing

Uh oh!

wisdomqin left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

2. `CancelledError` Not Caught in Heartbeat Cleanup

4. httpx `aclose()` Indefinite Blocking