Skip to content

feat(feishu): CardKit streaming cards and SSE hang fixes (#287)#292

Merged
wisdomqin merged 5 commits intodataelement:mainfrom
39499740:codex/issue-287-feishu-streaming
Apr 5, 2026
Merged

feat(feishu): CardKit streaming cards and SSE hang fixes (#287)#292
wisdomqin merged 5 commits intodataelement:mainfrom
39499740:codex/issue-287-feishu-streaming

Conversation

@39499740
Copy link
Copy Markdown
Contributor

@39499740 39499740 commented Apr 4, 2026

Summary

Implements CardKit streaming card API for smooth typewriter-style output in Feishu (Lark), and fixes multiple SSE stream hanging issues that caused the bot to become unresponsive.

Closes #287

What's New

CardKit Streaming Card Integration

  • create_card_entity() — Create CardKit card entities for streaming
  • send_card_by_card_id() — Send cards by card_id via IM API
  • stream_card_content() — Element-level streaming content push (500ms refresh interval)
  • set_card_streaming_mode() — Enable/disable streaming mode on cards
  • update_cardkit_card() — Full card content update after streaming completes

Dual-Path Design with Graceful Degradation

  1. CardKit path (preferred): create → stream → close streaming → final update
  2. IM Patch path (fallback): send interactive card → patch updates
  3. Plain text (last resort): simple text message

Streaming Flush Control

  • asyncio.Lock-protected _flush_stream() — prevents sequence conflicts
  • _SerialPatchQueue — serializes IM patch requests to prevent out-of-order overwrites
  • Heartbeat refresh task — periodic card updates during tool execution

Bug Fixes

1. Anthropic SSE Stream Hang (Root Cause)

AnthropicClient.stream() waited indefinitely for a message_stop event that Zhipu's Anthropic-compatible API never sends. After receiving message_delta with stop_reason, the aiter_lines() loop hung forever.

Fix: Break immediately on stop_reason in message_delta, before waiting for message_stop.

2. CancelledError Not Caught in Heartbeat Cleanup

In Python 3.9+, asyncio.CancelledError inherits from BaseException, not Exception. The except Exception: pass block in the heartbeat task cleanup did not catch it, causing the exception to propagate and skip the final card update entirely.

Fix: Changed to except (Exception, asyncio.CancelledError).

3. httpx System Proxy Interference

httpx.AsyncClient auto-detects macOS system proxy settings, which can interfere with long-lived SSE connections to LLM APIs.

Fix: Added proxy=None to all httpx.AsyncClient constructors.

4. httpx aclose() Indefinite Blocking

When a streaming connection was terminated early (via break), httpx.AsyncClient.aclose() could hang indefinitely waiting for the server to finish sending.

Fix: Wrapped aclose() in asyncio.wait_for(..., timeout=5.0) for all client classes.

5. OpenAI/Gemini SSE Stream Termination

  • OpenAICompatibleClient: Only broke on [DONE], not on finish_reason. If a provider sends finish_reason without [DONE], the stream hangs.
  • GeminiClient: [DONE] was only continued (not breaking), and finishReason was recorded but never acted on. The client relied solely on HTTP connection close.

Fix: Added finish_reason break protection to both clients, matching the Anthropic client's defensive pattern.

Files Changed

File Changes
backend/app/api/feishu.py CardKit streaming integration, flush control, CancelledError fix
backend/app/api/websocket.py Stream return logging
backend/app/services/feishu_service.py 5 new CardKit API methods
backend/app/services/feishu_ws.py WebSocket proxy bypass
backend/app/services/llm_client.py SSE break protection (all 3 clients), proxy=None, aclose timeout

Testing

Provider Mode Model Result
Zhipu Anthropic-compatible GLM ✅ Streaming cards work end-to-end
Zhipu OpenAI-compatible GLM ✅ Streaming cards work end-to-end
Google Gemini native Gemini 2.5 Flash ✅ Streaming cards work end-to-end

39499740 added 5 commits April 3, 2026 23:43
…ataelement#287)

Replace the im.message.patch-based streaming approach with CardKit
streaming APIs (create_card_entity, stream_card_content,
set_card_streaming_mode, update_cardkit_card) for silky smooth
typewriter-style streaming output in Feishu cards.

Key changes:
- Add CardKit API methods to FeishuService (create_card_entity,
  send_card_by_card_id, stream_card_content, set_card_streaming_mode,
  update_cardkit_card) using lark-oapi SDK
- Refactor streaming output in feishu.py to use CardKit as primary path
  with automatic fallback to IM patch when CardKit is unavailable
- Use schema 2.0 card format with streaming_mode and element_id for
  incremental content updates (typewriter animation handled by Feishu)
- Add collapsible thinking panel in final card
- Reduce streaming flush interval to 0.5s for CardKit path (vs 1.0s
  for IM patch fallback)

Refs: dataelement#287
websockets >= 13 auto-detects macOS system proxy settings. When a local
proxy is configured but unable to handle WSS upgrade, the connection
fails with 'did not receive a valid HTTP response from proxy'. Force
proxy=None to bypass this.

Refs: dataelement#287
Three root causes fixed:

1. AnthropicClient.stream() - break on stop_reason instead of waiting
   for message_stop event. Zhipu's Anthropic-compatible API may not
   send message_stop, causing aiter_lines() to hang forever.

2. _heartbeat_task cancel - CancelledError inherits BaseException in
   Python 3.9+, so except Exception does not catch it. This caused the
   final card update to be skipped after LLM completion.

3. httpx client hardening - proxy=None to avoid system proxy issues
   with SSE streams, and asyncio.wait_for timeout on aclose() to
   prevent indefinite blocking when closing connections.
…eams

OpenAICompatibleClient now breaks on finish_reason in addition to [DONE],
protecting all providers (Minimax, Custom, DeepSeek, Qwen, etc.) from
hanging if [DONE] is never sent.

GeminiClient now breaks on both [DONE] and finishReason instead of
relying solely on connection close to end the SSE stream.
Reduce log noise in production by downgrading verbose SSE/streaming
diagnostic logs from info to debug level. Only warnings and errors
remain at info level.
Copy link
Copy Markdown
Contributor

@wisdomqin wisdomqin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! The CardKit streaming integration is well-structured with a solid 3-tier fallback design (CardKit -> IM Patch -> plain text), and the SSE hang fixes address real root causes across all three LLM clients. Approving for merge.

We will address the following in a follow-up commit after merge:

  1. Add lark-oapi to requirements.txt
  2. Scope the websockets proxy patch (avoid global monkey-patch)
  3. Add tool call status display to the CardKit streaming path
  4. Add a size cap to _lark_clients cache to prevent unbounded growth

@wisdomqin wisdomqin merged commit ba9a8c3 into dataelement:main Apr 5, 2026
@39499740 39499740 deleted the codex/issue-287-feishu-streaming branch April 8, 2026 03:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refer to the official Feishu library openclaw-lark to refactor part of the Feishu code, achieving a smooth streaming output

2 participants