-
Notifications
You must be signed in to change notification settings - Fork 11.2k
Description
Summary
OpenCode's unbounded memory growth causes catastrophic, unrecoverable system failures on a Linux VM: a single opencode process balloons to 116 GB virtual / 21 GB RSS (on a 20 GB RAM machine), triggering OOM kills, kernel soft lockups across 7 of 8 CPUs simultaneously for up to 356 seconds, and RCU subsystem starvation — rendering the entire system completely dead. No SSH, no console input, no recovery without hard power-off.
This is not a gradual degradation. It is a total system kill that escalates with each restart cycle — crash intervals accelerated from 52 hours → 8 hours → 2.5 hours over 4 days.
Environment
| Component | Detail |
|---|---|
| OS | Debian 13 (trixie) |
| Kernel | 6.12.63+deb13-amd64, PREEMPT_DYNAMIC (voluntary) |
| CPU | AMD Ryzen 7 5800X 8-Core (8 vCPUs allocated) |
| RAM | 20 GB |
| Swap | 10.2 GB partition (/dev/sda5) |
| Hypervisor | VirtualBox 7.x (KVM paravirt, kvm-clock) |
| OpenCode | v1.1.56 (Go binary at /root/.local/bin/opencode) |
Crash Timeline (Feb 8–12, 2026)
Over 4 days, the system crashed 4 times with decreasing intervals between failures:
| Boot | Time Range | Survived | Failure Mode |
|---|---|---|---|
| -5 | Feb 8 12:39–12:42 | 3 min | Immediate crash (suspected) |
| -4 | Feb 8 12:44 – Feb 9 11:30 | 23 hours | Unknown |
| -3 | Feb 9 11:57 – Feb 11 15:55 | 52 hours | 2× OOM Kill — opencode at 111GB virt / 21GB RSS |
| -2 | Feb 11 16:06 – Feb 12 00:08 | 8 hours | 7/8 CPUs soft-locked 356s, RCU starvation, kernel panic-level |
| -1 | Feb 11 23:53 – Feb 12 02:20 | 2.5 hours | CPU#4 soft lockup cascade (21s → 140s → unrecoverable) |
| 0 | Feb 12 02:20 – present | Running | Already showing 74.7 GB virtual after 10 min |
The crash interval is accelerating: 52h → 8h → 2.5h.
Detailed Kernel Evidence
Event 1: OOM Kill #1 — Single process at 116 GB virtual memory (Feb 10, 00:29)
The OOM killer was invoked by opencode itself (PID 168718), and killed a sibling opencode process (PID 146787) that had consumed more memory than the entire physical RAM:
opencode invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
CPU: 6 UID: 0 PID: 168718 Comm: opencode Not tainted 6.12.63+deb13-amd64 #1 Debian 6.12.63-1
Process table at time of OOM (5 opencode instances running):
[ PID ] uid PID total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[ 3524] 0 3524 18643893 34165 34022 24 119 1433600 52448 0 opencode
[ 4768] 0 4768 18671866 17406 17215 90 101 1495040 55104 0 opencode
[ 18118] 0 18118 18613474 11315 11128 125 62 1421312 59584 0 opencode
[ 146787] 0 146787 29098165 5317087 5316838 0 249 60379136 914656 0 opencode ← KILLED
[ 168718] 0 168718 18598956 50332 50115 70 147 1478656 25952 0 opencode
Kill verdict:
oom-kill:constraint=CONSTRAINT_NONE,...task=opencode,pid=146787,uid=0
Out of memory: Killed process 146787 (opencode) total-vm:116392660kB, anon-rss:21267352kB, file-rss:0kB, shmem-rss:996kB, UID:0 pgtables:58964kB oom_score_adj:0
Key numbers for PID 146787:
- Virtual memory: 116,392,660 KB (111 GB) — 5.5× physical RAM
- Resident (RSS): 21,267,352 KB (20.3 GB) — exceeds total 20 GB physical RAM
- Page tables: 58,964 KB (57 MB) — page table overhead alone is enormous
- Swap entries: 914,656 pages (~3.5 GB in swap)
The 4 "normal" opencode instances each consumed ~75 GB virtual memory. Even without the monster process, the baseline is absurd.
Event 2: OOM Kill #2 — 13 concurrent opencode processes (Feb 11, 12:18)
36 hours later, the same pattern repeated but with 13 opencode processes alive simultaneously:
[ 4768] 0 4768 18706683 9129 9129 0 0 1531904 63464 0 opencode
[ 18118] 0 18118 18681059 14537 14475 0 62 1458176 55300 0 opencode
[ 210316] 0 210316 18672792 25590 25577 0 13 1413120 43360 0 opencode
[ 211759] 0 211759 18622517 3438 3408 9 21 1388544 72224 0 opencode
[ 223030] 0 223030 18621687 2125 2125 0 0 1433600 70720 0 opencode
[ 223761] 0 223761 18615358 2442 2407 35 0 1478656 69728 0 opencode
[ 256649] 0 256649 18663263 9088 9088 0 0 1241088 53467 0 opencode
[ 331552] 0 331552 26126040 5513885 5513829 0 56 61468672 1612240 0 opencode ← KILLED
[ 336536] 0 336536 18596828 13919 13918 0 1 1314816 64754 0 opencode
[ 337847] 0 337847 18663382 12878 12864 14 0 1155072 54764 0 opencode
[ 337960] 0 337960 18605576 7985 7976 0 9 1445888 65344 0 opencode
[ 363065] 0 363065 18688901 16147 16147 0 0 1302528 61720 0 opencode
[ 696953] 0 696953 18568868 46722 46722 0 0 909312 1952 0 opencode
Kill verdict:
Out of memory: Killed process 331552 (opencode) total-vm:104504160kB, anon-rss:22055316kB, file-rss:0kB, shmem-rss:224kB, UID:0 pgtables:60028kB oom_score_adj:0
Key numbers for PID 331552:
- Virtual memory: 104,504,160 KB (99.7 GB)
- Resident (RSS): 22,055,316 KB (21 GB) — again exceeding total physical RAM
- Swap entries: 1,612,240 pages (~6.1 GB in swap)
Also present in the OOM dump: multiple chrome-headless-shell, bun, node (MainThread), python3, and git processes — all spawned by or related to opencode's tool ecosystem.
systemd-journald was forced to flush its caches due to memory pressure.
Event 3: Catastrophic multi-CPU soft lockup — 7/8 CPUs dead (Feb 11, 23:52)
This is the most severe event. 7 of 8 CPUs locked simultaneously, followed by escalation to 356-second lockups with opencode explicitly named as the offending process:
Phase 1 — Mass lockup (23:52:34):
watchdog: BUG: soft lockup - CPU#3 stuck for 43s! [systemd:1]
watchdog: BUG: soft lockup - CPU#5 stuck for 43s! [systemd-logind:700]
watchdog: BUG: soft lockup - CPU#2 stuck for 43s! [swapper/2:0]
watchdog: BUG: soft lockup - CPU#4 stuck for 43s! [swapper/4:0]
watchdog: BUG: soft lockup - CPU#7 stuck for 43s! [swapper/7:0]
watchdog: BUG: soft lockup - CPU#6 stuck for 43s! [Worker:103451]
watchdog: BUG: soft lockup - CPU#1 stuck for 43s! [systemd-journal:349]
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
Phase 2 — Escalation (00:00:25), 8 minutes later:
watchdog: BUG: soft lockup - CPU#7 stuck for 356s! [opencode:102278]
CPU: 7 UID: 0 PID: 102278 Comm: opencode Tainted: G L 6.12.63+deb13-amd64 #1 Debian 6.12.63-1
watchdog: BUG: soft lockup - CPU#2 stuck for 356s! [swapper/2:0]
watchdog: BUG: soft lockup - CPU#3 stuck for 356s! [swapper/3:0]
watchdog: BUG: soft lockup - CPU#4 stuck for 27s! [kworker/4:0+events]
rcu: rcu_preempt kthread starved for 95731 jiffies!
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
What happened:
- CPU#7 was held hostage by
opencodePID 102278 for 356 seconds (nearly 6 minutes) without yielding - The kernel's RCU (Read-Copy-Update) subsystem was starved for 95,731 jiffies (~958 seconds / 16 minutes) — this means the kernel could not perform basic memory reclamation, slab cache cleanup, or deferred freeing for over 16 minutes
- The kernel explicitly warned: "Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior"
- Multiple CPUs were stuck in
swapper(kernel idle process), meaning they couldn't even return to idle state — a sign of deep kernel-level deadlock caused by memory pressure - The
Tainted: G Lflag confirms the kernel was tainted by the soft lockup (L= SOFTLOCKUP)
Event 4: CPU#4 soft lockup cascade via tmux (Feb 12, 01:35)
The final crash before the current boot. The tmux server process (opencode's host) triggered the initial lockup:
watchdog: BUG: soft lockup - CPU#4 stuck for 21s! [tmux: server:1876]
watchdog: BUG: soft lockup - CPU#4 stuck for 140s! [swapper/4:0]
watchdog: BUG: soft lockup - CPU#4 stuck for 55s! [swapper/4:0]
watchdog: BUG: soft lockup - CPU#4 stuck for 62s! [swapper/4:0] # 42 minutes later, still stuck
Preceding clocksource warnings (showing progressive CPU starvation):
clocksource: Long readout interval, skipping watchdog check: ... interval=2388442930ns (2.3s)
clocksource: Long readout interval, skipping watchdog check: ... interval=10133624930ns (10.1s)
clocksource: Long readout interval, skipping watchdog check: ... interval=20316688930ns (20.3s)
The clocksource readout intervals escalated from 2.3s → 10.1s → 20.3s before the lockup hit, showing the system was progressively starving for CPU time.
Current Session — The Bomb is Already Ticking
Just 10 minutes after a fresh boot, the current opencode process already shows alarming numbers:
PID 1812: VSZ 74,771,716 kB (74.7 GB virtual), RSS 588,896 kB (589 MB)
74.7 GB virtual memory after 10 minutes of runtime. Based on the observed pattern, RSS will grow unboundedly until it exceeds physical RAM, triggering the same cascade.
Cascading Failure Mechanism
The progression follows a textbook cascading failure pattern:
opencode memory leak (unbounded growth)
→ RSS exceeds physical RAM
→ Kernel starts heavy swapping (10.2 GB swap fills)
→ Swap I/O saturates disk, all processes stall waiting for pages
→ CPU cores stuck in page fault handlers / swap writeback
→ Kernel watchdog detects soft lockup (no scheduling for 10s+)
→ RCU grace-period kthread starved (can't run GC)
→ Memory reclamation impossible
→ More OOM pressure, more swapping, more lockups
→ Total system death (no SSH, no console, no recovery)
The VirtualBox hypervisor layer adds an additional timing distortion — the virtual clock drifts when CPUs are overloaded (evidenced by clocksource: Long readout interval warnings), which makes the watchdog trigger more aggressively and the system less able to self-recover.
Root Cause Analysis
The memory growth pattern is consistent with the leaks identified in PR #10913:
AsyncQueuenever terminates (util/queue.ts) —[Symbol.asyncIterator]()loops forever viawhile (true), preventing GC of completed task objects and their closures- Bash tool unbounded string concatenation (
tool/bash.ts:167-189) — command output is accumulated without any size cap; long-running or verbose commands cause unbounded string growth - LSP diagnostics
Mapnever cleared (lsp/client.ts:51) — diagnostic entries accumulate indefinitely across file changes; the Map only grows, never shrinks - Bus subscription leaks — event subscriptions are created but never unsubscribed, holding references to closures and their captured scope
Additional contributing factors observed in this environment:
- Multiple concurrent opencode processes: Up to 13 instances observed simultaneously. Each "idle" instance consumes ~75 GB virtual memory. The process spawning appears unbounded.
- Child process accumulation: OOM dumps show
chrome-headless-shell,bun,node(MainThread),python3, andgitprocesses — all spawned by opencode's tool/MCP ecosystem and apparently not cleaned up. - Kernel preemption model: The kernel runs
PREEMPT_DYNAMICinvoluntarymode, meaning it won't forcibly preempt a CPU-bound userspace process. A runaway opencode process can monopolize a CPU core indefinitely.
Impact
- Complete system unresponsiveness — no SSH, no console input, no
Ctrl+C, noSysRq - Requires hard power-off to recover (VirtualBox "Power Off" or host kill)
- Data loss / corruption — unclean shutdowns corrupt systemd journal (
systemd-journald: File /var/log/journal/.../system.journal corrupted or uncleanly shut down, renaming and replacing) - Crash frequency accelerating — 52h → 8h → 2.5h between failures, suggesting the leak rate increases with accumulated state
- Host system impact — VirtualBox VM lockup can degrade host system responsiveness
Reproduction Steps
- Run opencode on a Linux system with ≤ 20 GB RAM (VM or bare metal)
- Use it actively with multiple tool calls, background agents, and LSP active
- Monitor memory growth:
watch -n5 'ps -o pid,vsz,rss,comm -C opencode; echo "---"; free -h' - Observe:
- Virtual memory climbs past 75 GB within minutes of startup
- RSS grows steadily without bound during active use
- After 2–52 hours (depending on workload intensity), RSS exceeds physical RAM
- System becomes completely unresponsive shortly after
Related Issues
| Issue | Title | Status |
|---|---|---|
| #9743 | Memory Leak: OOM Killer During Extended Runtime | 🔴 Open |
| #3013 | Uses a huge amount of memory | 🔴 Open (6 👍, multiple duplicates) |
| #5700 | Too high memory usage | 🔴 Open (dupes: #5363, #3995, #4315) |
| #6172 | High CPU (100%+) during LLM streaming in long sessions | 🔴 Open |
| #4804 | High CPU usage (increases even when idle) | 🟢 Closed |
| #10913 | fix: multiple memory leaks in long-running sessions | 🔴 Open PR |
This issue provides the most detailed kernel-level evidence of the downstream consequences of these memory leaks, including the exact cascading failure mechanism from memory leak → OOM → soft lockup → RCU starvation → total system death.
Suggested Fixes
Immediate (stop the bleeding):
- Merge PR fix: multiple memory leaks in long-running sessions #10913 — addresses 4 confirmed leak sources
- Add a self-imposed RSS limit — monitor own RSS via
/proc/self/statmand trigger graceful session compaction or restart when approaching a threshold (e.g., 4 GB) - Cap bash tool output buffer — truncate accumulated output after a configurable limit (e.g., 10 MB)
Structural:
- Bound concurrent child processes — 13 simultaneous opencode instances is excessive; implement a process pool with a hard cap
- Implement LSP diagnostics eviction — LRU or TTL-based eviction for the diagnostics Map
- Add periodic forced GC —
global.gc()with--expose-gcat regular intervals during long sessions - Clean up child processes on session end — ensure
chrome-headless-shell,bun,node,gitsubprocesses are terminated when their parent session ends
Defensive:
- Ship with a recommended systemd slice config — provide users a resource-limiting unit file that prevents opencode from killing the host system
- Add memory usage telemetry — log RSS at regular intervals so users and developers can identify leak patterns before they become catastrophic