Skip to content

Memory leak audit: audiobuffer growth, thread explosion, integration timeouts #4825

@beastoin

Description

@beastoin

Follow-up to PR #4784 (pusher memory leak fix — reduced pod memory ~80%). A deep audit of routers/pusher.py, routers/transcribe.py, and downstream utils found three remaining leak/backpressure patterns.

Current Behavior

  • Unbounded audiobuffer growth (routers/pusher.py:316-439): audiobuffer/trigger_audiobuffer extend on every audio chunk but only cleared when has_audio_apps_enabled or audio_bytes_webhook_delay_seconds is truthy. Users with no audio apps leak ~57MB/hour per connection.
  • Unbounded thread spawning (utils/conversations/process_conversation.py:686-731): 7+ raw threading.Thread().start() per conversation completion — no pooling, no rate limiting. Under sustained load, hundreds of concurrent threads pile up, each retaining a full Conversation object.
  • Long external integration timeouts (utils/app_integrations.py): 30s/15s/10s timeouts with thread-join blocking. A single slow app webhook blocks the entire pipeline, causing backpressure on upstream queues.

Expected Behavior

Audio buffers should only accumulate when there's a consumer. Background tasks should use a bounded thread pool. External integration timeouts should be short enough to prevent pipeline stalls.

Subtasks

  • Fix 1: Guard audiobuffer — only extend when audio apps/webhook enabled (5-line fix in pusher.py)
  • Fix 2: Replace per-conversation threading.Thread() with ThreadPoolExecutor(max_workers=32) in process_conversation.py
  • Fix 3: Reduce timeouts — 30s→10s (external), 15s→5s (audio bytes), 10s→5s (realtime)

Files to Modify

File Fix Change
backend/routers/pusher.py #1 Guard audiobuffer.extend() / trigger_audiobuffer.extend()
backend/utils/conversations/process_conversation.py #2 ThreadPoolExecutor + executor.submit()
backend/utils/app_integrations.py #3 Reduce timeout= values

Impact

Prevents additional memory growth paths not covered by PR #4784. Reduces thread count under load and limits backpressure from slow external apps.

Found during deep audit with Codex.

Metadata

Metadata

Assignees

No one assigned

    Labels

    maintainerLane: High-risk, cross-system changesmemoryLayer: Memory creation, syncing, storagep1Priority: Critical (score 22-29)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions