[Critical] Memory leak causes kernel soft lockups (356s, 7/8 CPUs), OOM kills (111GB virt/21GB RSS), and total system death on Linux

## Summary

OpenCode's unbounded memory growth causes **catastrophic, unrecoverable system failures** on a Linux VM: a single `opencode` process balloons to **116 GB virtual / 21 GB RSS** (on a 20 GB RAM machine), triggering OOM kills, kernel soft lockups across **7 of 8 CPUs simultaneously for up to 356 seconds**, and RCU subsystem starvation — rendering the entire system completely dead. No SSH, no console input, no recovery without hard power-off.

This is not a gradual degradation. It is a **total system kill** that escalates with each restart cycle — crash intervals accelerated from 52 hours → 8 hours → 2.5 hours over 4 days.

## Environment

| Component | Detail |
|-----------|--------|
| **OS** | Debian 13 (trixie) |
| **Kernel** | 6.12.63+deb13-amd64, `PREEMPT_DYNAMIC` (voluntary) |
| **CPU** | AMD Ryzen 7 5800X 8-Core (8 vCPUs allocated) |
| **RAM** | 20 GB |
| **Swap** | 10.2 GB partition (`/dev/sda5`) |
| **Hypervisor** | VirtualBox 7.x (KVM paravirt, kvm-clock) |
| **OpenCode** | v1.1.56 (Go binary at `/root/.local/bin/opencode`) |

## Crash Timeline (Feb 8–12, 2026)

Over 4 days, the system crashed **4 times** with decreasing intervals between failures:

| Boot | Time Range | Survived | Failure Mode |
|------|-----------|----------|-------------|
| -5 | Feb 8 12:39–12:42 | **3 min** | Immediate crash (suspected) |
| -4 | Feb 8 12:44 – Feb 9 11:30 | 23 hours | Unknown |
| **-3** | Feb 9 11:57 – Feb 11 15:55 | **52 hours** | **2× OOM Kill** — opencode at 111GB virt / 21GB RSS |
| **-2** | Feb 11 16:06 – Feb 12 00:08 | **8 hours** | **7/8 CPUs soft-locked 356s**, RCU starvation, kernel panic-level |
| **-1** | Feb 11 23:53 – Feb 12 02:20 | **2.5 hours** | **CPU#4 soft lockup cascade** (21s → 140s → unrecoverable) |
| 0 | Feb 12 02:20 – present | Running | Already showing 74.7 GB virtual after 10 min |

**The crash interval is accelerating: 52h → 8h → 2.5h.**

---

## Detailed Kernel Evidence

### Event 1: OOM Kill #1 — Single process at 116 GB virtual memory (Feb 10, 00:29)

The OOM killer was invoked by opencode itself (PID 168718), and killed a sibling opencode process (PID 146787) that had consumed more memory than the entire physical RAM:

```
opencode invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
CPU: 6 UID: 0 PID: 168718 Comm: opencode Not tainted 6.12.63+deb13-amd64 #1  Debian 6.12.63-1
```

**Process table at time of OOM (5 opencode instances running):**

```
[  PID  ] uid  PID   total_vm      rss    rss_anon rss_file rss_shmem pgtables_bytes swapents  oom_score_adj name
[   3524]   0  3524  18643893    34165    34022       24       119  1433600    52448             0 opencode
[   4768]   0  4768  18671866    17406    17215       90       101  1495040    55104             0 opencode
[  18118]   0 18118  18613474    11315    11128      125        62  1421312    59584             0 opencode
[ 146787]   0 146787 29098165  5317087  5316838        0       249 60379136   914656             0 opencode  ← KILLED
[ 168718]   0 168718 18598956    50332    50115       70       147  1478656    25952             0 opencode
```

**Kill verdict:**
```
oom-kill:constraint=CONSTRAINT_NONE,...task=opencode,pid=146787,uid=0
Out of memory: Killed process 146787 (opencode) total-vm:116392660kB, anon-rss:21267352kB, file-rss:0kB, shmem-rss:996kB, UID:0 pgtables:58964kB oom_score_adj:0
```

Key numbers for PID 146787:
- **Virtual memory: 116,392,660 KB (111 GB)** — 5.5× physical RAM
- **Resident (RSS): 21,267,352 KB (20.3 GB)** — exceeds total 20 GB physical RAM
- **Page tables: 58,964 KB (57 MB)** — page table overhead alone is enormous
- **Swap entries: 914,656 pages (~3.5 GB in swap)**

The 4 "normal" opencode instances each consumed ~75 GB virtual memory. Even without the monster process, the baseline is absurd.

---

### Event 2: OOM Kill #2 — 13 concurrent opencode processes (Feb 11, 12:18)

36 hours later, the same pattern repeated but with **13 opencode processes** alive simultaneously:

```
[   4768]   0  4768  18706683     9129     9129        0         0  1531904    63464             0 opencode
[  18118]   0 18118  18681059    14537    14475        0        62  1458176    55300             0 opencode
[ 210316]   0 210316 18672792    25590    25577        0        13  1413120    43360             0 opencode
[ 211759]   0 211759 18622517     3438     3408        9        21  1388544    72224             0 opencode
[ 223030]   0 223030 18621687     2125     2125        0         0  1433600    70720             0 opencode
[ 223761]   0 223761 18615358     2442     2407       35         0  1478656    69728             0 opencode
[ 256649]   0 256649 18663263     9088     9088        0         0  1241088    53467             0 opencode
[ 331552]   0 331552 26126040  5513885  5513829        0        56 61468672  1612240             0 opencode  ← KILLED
[ 336536]   0 336536 18596828    13919    13918        0         1  1314816    64754             0 opencode
[ 337847]   0 337847 18663382    12878    12864       14         0  1155072    54764             0 opencode
[ 337960]   0 337960 18605576     7985     7976        0         9  1445888    65344             0 opencode
[ 363065]   0 363065 18688901    16147    16147        0         0  1302528    61720             0 opencode
[ 696953]   0 696953 18568868    46722    46722        0         0   909312     1952             0 opencode
```

**Kill verdict:**
```
Out of memory: Killed process 331552 (opencode) total-vm:104504160kB, anon-rss:22055316kB, file-rss:0kB, shmem-rss:224kB, UID:0 pgtables:60028kB oom_score_adj:0
```

Key numbers for PID 331552:
- **Virtual memory: 104,504,160 KB (99.7 GB)**
- **Resident (RSS): 22,055,316 KB (21 GB)** — again exceeding total physical RAM
- **Swap entries: 1,612,240 pages (~6.1 GB in swap)**

Also present in the OOM dump: multiple `chrome-headless-shell`, `bun`, `node` (MainThread), `python3`, and `git` processes — all spawned by or related to opencode's tool ecosystem.

`systemd-journald` was forced to flush its caches due to memory pressure.

---

### Event 3: Catastrophic multi-CPU soft lockup — 7/8 CPUs dead (Feb 11, 23:52)

This is the most severe event. **7 of 8 CPUs locked simultaneously**, followed by escalation to 356-second lockups with `opencode` explicitly named as the offending process:

**Phase 1 — Mass lockup (23:52:34):**
```
watchdog: BUG: soft lockup - CPU#3 stuck for 43s! [systemd:1]
watchdog: BUG: soft lockup - CPU#5 stuck for 43s! [systemd-logind:700]
watchdog: BUG: soft lockup - CPU#2 stuck for 43s! [swapper/2:0]
watchdog: BUG: soft lockup - CPU#4 stuck for 43s! [swapper/4:0]
watchdog: BUG: soft lockup - CPU#7 stuck for 43s! [swapper/7:0]
watchdog: BUG: soft lockup - CPU#6 stuck for 43s! [Worker:103451]
watchdog: BUG: soft lockup - CPU#1 stuck for 43s! [systemd-journal:349]
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
```

**Phase 2 — Escalation (00:00:25), 8 minutes later:**
```
watchdog: BUG: soft lockup - CPU#7 stuck for 356s! [opencode:102278]
CPU: 7 UID: 0 PID: 102278 Comm: opencode Tainted: G             L     6.12.63+deb13-amd64 #1  Debian 6.12.63-1
watchdog: BUG: soft lockup - CPU#2 stuck for 356s! [swapper/2:0]
watchdog: BUG: soft lockup - CPU#3 stuck for 356s! [swapper/3:0]
watchdog: BUG: soft lockup - CPU#4 stuck for 27s! [kworker/4:0+events]
rcu: rcu_preempt kthread starved for 95731 jiffies!
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
```

**What happened:**
- **CPU#7 was held hostage by `opencode` PID 102278 for 356 seconds (nearly 6 minutes)** without yielding
- The kernel's RCU (Read-Copy-Update) subsystem was **starved for 95,731 jiffies (~958 seconds / 16 minutes)** — this means the kernel could not perform basic memory reclamation, slab cache cleanup, or deferred freeing for over 16 minutes
- The kernel explicitly warned: *"Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior"*
- Multiple CPUs were stuck in `swapper` (kernel idle process), meaning they couldn't even return to idle state — a sign of deep kernel-level deadlock caused by memory pressure
- The `Tainted: G L` flag confirms the kernel was tainted by the soft lockup (`L` = SOFTLOCKUP)

---

### Event 4: CPU#4 soft lockup cascade via tmux (Feb 12, 01:35)

The final crash before the current boot. The tmux server process (opencode's host) triggered the initial lockup:

```
watchdog: BUG: soft lockup - CPU#4 stuck for 21s! [tmux: server:1876]
watchdog: BUG: soft lockup - CPU#4 stuck for 140s! [swapper/4:0]
watchdog: BUG: soft lockup - CPU#4 stuck for 55s! [swapper/4:0]
watchdog: BUG: soft lockup - CPU#4 stuck for 62s! [swapper/4:0]   # 42 minutes later, still stuck
```

**Preceding clocksource warnings** (showing progressive CPU starvation):
```
clocksource: Long readout interval, skipping watchdog check: ... interval=2388442930ns (2.3s)
clocksource: Long readout interval, skipping watchdog check: ... interval=10133624930ns (10.1s)
clocksource: Long readout interval, skipping watchdog check: ... interval=20316688930ns (20.3s)
```

The clocksource readout intervals escalated from 2.3s → 10.1s → 20.3s before the lockup hit, showing the system was progressively starving for CPU time.

---

### Current Session — The Bomb is Already Ticking

Just 10 minutes after a fresh boot, the current opencode process already shows alarming numbers:

```
PID 1812: VSZ 74,771,716 kB (74.7 GB virtual), RSS 588,896 kB (589 MB)
```

**74.7 GB virtual memory after 10 minutes of runtime.** Based on the observed pattern, RSS will grow unboundedly until it exceeds physical RAM, triggering the same cascade.

---

## Cascading Failure Mechanism

The progression follows a textbook cascading failure pattern:

```
opencode memory leak (unbounded growth)
    → RSS exceeds physical RAM
        → Kernel starts heavy swapping (10.2 GB swap fills)
            → Swap I/O saturates disk, all processes stall waiting for pages
                → CPU cores stuck in page fault handlers / swap writeback
                    → Kernel watchdog detects soft lockup (no scheduling for 10s+)
                        → RCU grace-period kthread starved (can't run GC)
                            → Memory reclamation impossible
                                → More OOM pressure, more swapping, more lockups
                                    → Total system death (no SSH, no console, no recovery)
```

The VirtualBox hypervisor layer adds an additional timing distortion — the virtual clock drifts when CPUs are overloaded (evidenced by `clocksource: Long readout interval` warnings), which makes the watchdog trigger more aggressively and the system less able to self-recover.

## Root Cause Analysis

The memory growth pattern is consistent with the leaks identified in PR #10913:

1. **`AsyncQueue` never terminates** (`util/queue.ts`) — `[Symbol.asyncIterator]()` loops forever via `while (true)`, preventing GC of completed task objects and their closures
2. **Bash tool unbounded string concatenation** (`tool/bash.ts:167-189`) — command output is accumulated without any size cap; long-running or verbose commands cause unbounded string growth
3. **LSP diagnostics `Map` never cleared** (`lsp/client.ts:51`) — diagnostic entries accumulate indefinitely across file changes; the Map only grows, never shrinks
4. **Bus subscription leaks** — event subscriptions are created but never unsubscribed, holding references to closures and their captured scope

### Additional contributing factors observed in this environment:

- **Multiple concurrent opencode processes**: Up to **13 instances** observed simultaneously. Each "idle" instance consumes ~75 GB virtual memory. The process spawning appears unbounded.
- **Child process accumulation**: OOM dumps show `chrome-headless-shell`, `bun`, `node` (MainThread), `python3`, and `git` processes — all spawned by opencode's tool/MCP ecosystem and apparently not cleaned up.
- **Kernel preemption model**: The kernel runs `PREEMPT_DYNAMIC` in `voluntary` mode, meaning it won't forcibly preempt a CPU-bound userspace process. A runaway opencode process can monopolize a CPU core indefinitely.

## Impact

- **Complete system unresponsiveness** — no SSH, no console input, no `Ctrl+C`, no `SysRq`
- **Requires hard power-off** to recover (VirtualBox "Power Off" or host kill)
- **Data loss / corruption** — unclean shutdowns corrupt systemd journal (`systemd-journald: File /var/log/journal/.../system.journal corrupted or uncleanly shut down, renaming and replacing`)
- **Crash frequency accelerating** — 52h → 8h → 2.5h between failures, suggesting the leak rate increases with accumulated state
- **Host system impact** — VirtualBox VM lockup can degrade host system responsiveness

## Reproduction Steps

1. Run opencode on a Linux system with ≤ 20 GB RAM (VM or bare metal)
2. Use it actively with multiple tool calls, background agents, and LSP active
3. Monitor memory growth:
   ```bash
   watch -n5 'ps -o pid,vsz,rss,comm -C opencode; echo "---"; free -h'
   ```
4. Observe:
   - Virtual memory climbs past 75 GB within minutes of startup
   - RSS grows steadily without bound during active use
   - After 2–52 hours (depending on workload intensity), RSS exceeds physical RAM
   - System becomes completely unresponsive shortly after

## Related Issues

| Issue | Title | Status |
|-------|-------|--------|
| #9743 | Memory Leak: OOM Killer During Extended Runtime | 🔴 Open |
| #3013 | Uses a huge amount of memory | 🔴 Open (6 👍, multiple duplicates) |
| #5700 | Too high memory usage | 🔴 Open (dupes: #5363, #3995, #4315) |
| #6172 | High CPU (100%+) during LLM streaming in long sessions | 🔴 Open |
| #4804 | High CPU usage (increases even when idle) | 🟢 Closed |
| #10913 | fix: multiple memory leaks in long-running sessions | 🔴 Open PR |

This issue provides the most detailed kernel-level evidence of the downstream consequences of these memory leaks, including the exact cascading failure mechanism from memory leak → OOM → soft lockup → RCU starvation → total system death.

## Suggested Fixes

### Immediate (stop the bleeding):
1. **Merge PR #10913** — addresses 4 confirmed leak sources
2. **Add a self-imposed RSS limit** — monitor own RSS via `/proc/self/statm` and trigger graceful session compaction or restart when approaching a threshold (e.g., 4 GB)
3. **Cap bash tool output buffer** — truncate accumulated output after a configurable limit (e.g., 10 MB)

### Structural:
4. **Bound concurrent child processes** — 13 simultaneous opencode instances is excessive; implement a process pool with a hard cap
5. **Implement LSP diagnostics eviction** — LRU or TTL-based eviction for the diagnostics Map
6. **Add periodic forced GC** — `global.gc()` with `--expose-gc` at regular intervals during long sessions
7. **Clean up child processes on session end** — ensure `chrome-headless-shell`, `bun`, `node`, `git` subprocesses are terminated when their parent session ends

### Defensive:
8. **Ship with a recommended systemd slice config** — provide users a resource-limiting unit file that prevents opencode from killing the host system
9. **Add memory usage telemetry** — log RSS at regular intervals so users and developers can identify leak patterns before they become catastrophic


Component	Detail
OS	Debian 13 (trixie)
Kernel	6.12.63+deb13-amd64, `PREEMPT_DYNAMIC` (voluntary)
CPU	AMD Ryzen 7 5800X 8-Core (8 vCPUs allocated)
RAM	20 GB
Swap	10.2 GB partition (`/dev/sda5`)
Hypervisor	VirtualBox 7.x (KVM paravirt, kvm-clock)
OpenCode	v1.1.56 (Go binary at `/root/.local/bin/opencode`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Critical] Memory leak causes kernel soft lockups (356s, 7/8 CPUs), OOM kills (111GB virt/21GB RSS), and total system death on Linux #13230

Summary

Environment

Crash Timeline (Feb 8–12, 2026)

Detailed Kernel Evidence

Event 1: OOM Kill #1 — Single process at 116 GB virtual memory (Feb 10, 00:29)

Event 2: OOM Kill #2 — 13 concurrent opencode processes (Feb 11, 12:18)

Event 3: Catastrophic multi-CPU soft lockup — 7/8 CPUs dead (Feb 11, 23:52)

Event 4: CPU#4 soft lockup cascade via tmux (Feb 12, 01:35)

Current Session — The Bomb is Already Ticking

Cascading Failure Mechanism

Root Cause Analysis

Additional contributing factors observed in this environment:

Impact

Reproduction Steps

Related Issues

Suggested Fixes

Immediate (stop the bleeding):

Structural:

Defensive:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Boot	Time Range	Survived	Failure Mode
-5	Feb 8 12:39–12:42	3 min	Immediate crash (suspected)
-4	Feb 8 12:44 – Feb 9 11:30	23 hours	Unknown
-3	Feb 9 11:57 – Feb 11 15:55	52 hours	2× OOM Kill — opencode at 111GB virt / 21GB RSS
-2	Feb 11 16:06 – Feb 12 00:08	8 hours	7/8 CPUs soft-locked 356s, RCU starvation, kernel panic-level
-1	Feb 11 23:53 – Feb 12 02:20	2.5 hours	CPU#4 soft lockup cascade (21s → 140s → unrecoverable)
0	Feb 12 02:20 – present	Running	Already showing 74.7 GB virtual after 10 min

Issue	Title	Status
#9743	Memory Leak: OOM Killer During Extended Runtime	🔴 Open
#3013	Uses a huge amount of memory	🔴 Open (6 👍, multiple duplicates)
#5700	Too high memory usage	🔴 Open (dupes: #5363, #3995, #4315)
#6172	High CPU (100%+) during LLM streaming in long sessions	🔴 Open
#4804	High CPU usage (increases even when idle)	🟢 Closed
#10913	fix: multiple memory leaks in long-running sessions	🔴 Open PR

[Critical] Memory leak causes kernel soft lockups (356s, 7/8 CPUs), OOM kills (111GB virt/21GB RSS), and total system death on Linux #13230

Description

Summary

Environment

Crash Timeline (Feb 8–12, 2026)

Detailed Kernel Evidence

Event 1: OOM Kill #1 — Single process at 116 GB virtual memory (Feb 10, 00:29)

Event 2: OOM Kill #2 — 13 concurrent opencode processes (Feb 11, 12:18)

Event 3: Catastrophic multi-CPU soft lockup — 7/8 CPUs dead (Feb 11, 23:52)

Event 4: CPU#4 soft lockup cascade via tmux (Feb 12, 01:35)

Current Session — The Bomb is Already Ticking

Cascading Failure Mechanism

Root Cause Analysis

Additional contributing factors observed in this environment:

Impact

Reproduction Steps

Related Issues

Suggested Fixes

Immediate (stop the bleeding):

Structural:

Defensive:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions