Skip to content

Fix bugs with connection switching#2957

Merged
sawka merged 3 commits intomainfrom
sawka/conn-switch
Feb 27, 2026
Merged

Fix bugs with connection switching#2957
sawka merged 3 commits intomainfrom
sawka/conn-switch

Conversation

@sawka
Copy link
Member

@sawka sawka commented Feb 27, 2026

  • Add per blockId mutex to block controller resync
  • Pass initial termsize through to startJob

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 27, 2026

Walkthrough

The pull request implements per-block synchronization for the ResyncController and enhances term size configuration for durable shell jobs. It adds a global sync map of mutexes in blockcontroller.go to serialize block resync operations, updates the startNewJob method signature in durableshellcontroller.go to accept runtime options for dynamic term size configuration, and introduces a thread-safe GetOrCreate utility method to SyncMap[T] in syncmap.go for lazy initialization.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The PR title 'Fix bugs with connection switching' is vague and does not clearly summarize the actual changes, which involve adding per-block mutexes and passing terminal size options. Consider using a more specific title that reflects the actual changes, such as 'Add per-block mutex for resync and pass terminal size options' or similar.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The PR description directly relates to the changeset, listing the two main additions: per-blockId mutex for resync and passing termsize through to startJob.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sawka/conn-switch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kilo-code-bot
Copy link
Contributor

kilo-code-bot bot commented Feb 27, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Files Reviewed (3 files)
  • pkg/blockcontroller/blockcontroller.go - Adds per-block mutex synchronization for ResyncController to prevent race conditions
  • pkg/blockcontroller/durableshellcontroller.go - Passes rtOpts through to startNewJob for proper terminal size initialization
  • pkg/util/ds/syncmap.go - Adds GetOrCreate helper method with proper thread-safety

The changes implement proper synchronization for block controller resync operations and ensure terminal size options are correctly propagated when starting new durable shell jobs. The code follows project conventions including the preferred lock.Lock(); defer lock.Unlock() pattern.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/blockcontroller/blockcontroller.go`:
- Around line 81-88: The blockResyncMutexMap created via ds.MakeSyncMap used by
getBlockResyncMutex grows without bounds because entries aren't removed when
blocks are deleted; update the implementation to avoid the leak by either (A)
replacing the dynamic map with bounded lock striping (create a fixed-size slice
of sync.Mutex and map blockId → mutex via a hash function used by
getBlockResyncMutex) or (B) add safe cleanup in DeleteBlock(): when
DeleteBlock(blockId) runs, remove or decrement a refcount for the corresponding
entry in blockResyncMutexMap (use a ref-counted wrapper stored in the map or
ds.MakeSyncMap.Remove/Release if available) so mutex entries are deleted when no
longer needed; modify getBlockResyncMutex and DeleteBlock to use the chosen
strategy and ensure concurrent access is safe.
- Around line 156-159: ResyncController currently holds the per-block mutex from
getBlockResyncMutex and calls DestroyBlockController while holding it, but other
callers (handleBlockCloseEvent and external callers in
workspaceservice.go/wshserver.go) call DestroyBlockController without that lock;
to fix, refactor DestroyBlockController by extracting the core teardown logic
into an internal helper (e.g., destroyBlockControllerUnlocked) that assumes the
caller holds the per-block mutex, then make the public DestroyBlockController
acquire getBlockResyncMutex(blockId).Lock()/Unlock() and call the helper; update
ResyncController to call the new unlocked helper directly to avoid
self-deadlock, and leave other callers using the public DestroyBlockController
so they obtain the mutex.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b2f3129 and 29786e5.

📒 Files selected for processing (3)
  • pkg/blockcontroller/blockcontroller.go
  • pkg/blockcontroller/durableshellcontroller.go
  • pkg/util/ds/syncmap.go

Comment on lines +81 to +88
blockResyncMutexMap = ds.MakeSyncMap[*sync.Mutex]()
)

func getBlockResyncMutex(blockId string) *sync.Mutex {
return blockResyncMutexMap.GetOrCreate(blockId, func() *sync.Mutex {
return &sync.Mutex{}
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Enumerate all uses of blockResyncMutexMap.
rg -n --type=go '\bblockResyncMutexMap\b' -C3

# Verify whether Delete is ever called for this map.
rg -n --type=go 'blockResyncMutexMap\.Delete\s*\(' || true

Repository: wavetermdev/waveterm

Length of output: 826


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find all uses of getBlockResyncMutex
rg -n --type=go 'getBlockResyncMutex' -C5

# Check for any cleanup, Reset, or Clear methods on blockResyncMutexMap
rg -n --type=go 'blockResyncMutexMap\.(Clear|Reset|Range)' || true

Repository: wavetermdev/waveterm

Length of output: 1683


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check for block deletion or removal patterns
rg -n --type=go 'DeleteBlock|RemoveBlock|block.*delete|block.*remove' -i | head -20

# Look for block lifecycle or creation patterns
rg -n --type=go 'type Block|CreateBlock|NewBlock' | head -20

Repository: wavetermdev/waveterm

Length of output: 3745


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check the DeleteBlock implementation
sed -n '153,200p' pkg/wcore/block.go

# Also check if blockcontroller is involved in block deletion
rg -n --type=go 'ResyncController|blockResync' | head -20

Repository: wavetermdev/waveterm

Length of output: 2489


blockResyncMutexMap has no cleanup for deleted blocks, causing unbounded map growth.

When blocks are deleted via DeleteBlock(), their corresponding mutex entries remain in blockResyncMutexMap indefinitely. Since blocks are frequently created and deleted throughout the application lifecycle, this creates a memory leak with monotonic map growth.

Consider either:

  • Bounded lock striping (hash blockId → fixed mutex array)
  • Safe cleanup in DeleteBlock() with ref-counted or weak references to lock entries
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/blockcontroller/blockcontroller.go` around lines 81 - 88, The
blockResyncMutexMap created via ds.MakeSyncMap used by getBlockResyncMutex grows
without bounds because entries aren't removed when blocks are deleted; update
the implementation to avoid the leak by either (A) replacing the dynamic map
with bounded lock striping (create a fixed-size slice of sync.Mutex and map
blockId → mutex via a hash function used by getBlockResyncMutex) or (B) add safe
cleanup in DeleteBlock(): when DeleteBlock(blockId) runs, remove or decrement a
refcount for the corresponding entry in blockResyncMutexMap (use a ref-counted
wrapper stored in the map or ds.MakeSyncMap.Remove/Release if available) so
mutex entries are deleted when no longer needed; modify getBlockResyncMutex and
DeleteBlock to use the chosen strategy and ensure concurrent access is safe.

Comment on lines +156 to +159
mu := getBlockResyncMutex(blockId)
mu.Lock()
defer mu.Unlock()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show key mutation paths for a block controller lifecycle.
rg -n --type=go '\bfunc\s+ResyncController\b|\bfunc\s+DestroyBlockController\b|\bfunc\s+handleBlockCloseEvent\b' -A35 -B5 pkg/blockcontroller/blockcontroller.go

# Show where block-scoped mutex is actually acquired.
rg -n --type=go '\bgetBlockResyncMutex\s*\(' -C2 pkg/blockcontroller/blockcontroller.go

Repository: wavetermdev/waveterm

Length of output: 3247


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find all callers of DestroyBlockController
rg -n --type=go '\bDestroyBlockController\s*\(' pkg/blockcontroller/blockcontroller.go

# Check the full ResyncController function to see all call sites
sed -n '151,200p' pkg/blockcontroller/blockcontroller.go

# Check if there are any other files calling DestroyBlockController
rg -l --type=go '\bDestroyBlockController\s*\(' pkg/

Repository: wavetermdev/waveterm

Length of output: 2129


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check callers in other files
rg -n --type=go '\bDestroyBlockController\s*\(' pkg/wshrpc/wshserver/wshserver.go pkg/service/workspaceservice/workspaceservice.go -B3 -A3

# Also check the full context around DestroyBlockController definition
sed -n '295,303p' pkg/blockcontroller/blockcontroller.go

Repository: wavetermdev/waveterm

Length of output: 1274


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check getController and deleteController implementations to understand what state they protect
sed -n '1,100p' pkg/blockcontroller/blockcontroller.go | grep -A10 'func getController\|func deleteController\|blockResyncMutexMap\|var '

# Get the full beginning of the file to see global state and any other synchronization
head -100 pkg/blockcontroller/blockcontroller.go

Repository: wavetermdev/waveterm

Length of output: 3519


Per-block serialization is incomplete without locking DestroyBlockController.

ResyncController acquires the per-block mutex (line 156-158) and calls DestroyBlockController at lines 176, 185, 216, 224, and 234 while holding it. However, handleBlockCloseEvent (line 146) launches DestroyBlockController asynchronously without this lock, and external callers in other files (workspaceservice.go, wshserver.go) also bypass it. This allows concurrent mutations to the same block's controller state.

Make DestroyBlockController acquire the per-block mutex before proceeding. To avoid self-deadlock when called from ResyncController, create an internal helper that skips lock acquisition for callers that already hold it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/blockcontroller/blockcontroller.go` around lines 156 - 159,
ResyncController currently holds the per-block mutex from getBlockResyncMutex
and calls DestroyBlockController while holding it, but other callers
(handleBlockCloseEvent and external callers in workspaceservice.go/wshserver.go)
call DestroyBlockController without that lock; to fix, refactor
DestroyBlockController by extracting the core teardown logic into an internal
helper (e.g., destroyBlockControllerUnlocked) that assumes the caller holds the
per-block mutex, then make the public DestroyBlockController acquire
getBlockResyncMutex(blockId).Lock()/Unlock() and call the helper; update
ResyncController to call the new unlocked helper directly to avoid
self-deadlock, and leave other callers using the public DestroyBlockController
so they obtain the mutex.

@sawka sawka merged commit 1ab58ea into main Feb 27, 2026
7 checks passed
@sawka sawka deleted the sawka/conn-switch branch February 27, 2026 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant