Open
Conversation
Add thread-safe locking to Gerbil's hash table implementation: - Add __raw-table-lock spinlock protecting all read and write operations on raw tables (ref, set!, update!, delete!, for-each, copy, clear!) - Use snapshot-under-lock for raw-table-for-each to prevent deadlock when callbacks access other tables (e.g. __specialize-method in MOP) - Add __gc-table-lock spinlock for gc-table operations (rehash, immediate creation, clear!) - Fix gc-table-set!/delete! to verify gcht is still current after ##gc-hash-table-set! (another thread may have resized concurrently) Serialize module compilation to prevent concurrent expander access: - Wrap compile-top-module in with-driver-mutex to serialize compilation - Make with-driver-mutex reentrant (check if current thread holds mutex) - Use +driver-mutex+ for import/mx in make.ss instead of separate mutex, ensuring imports and compilation share the same lock - Export +driver-mutex+ from compiler/driver Results: P1 SMP stdlib build fully passes (339 modules). P2 improved from ~21 to ~200 modules before crash. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
In Gambit SMP, (declare (not interrupts-enabled)) suppresses ___POLL()
generation at compile time. ___POLL is the ONLY mechanism by which a
processor detects GC synchronization requests from other processors.
Without it, the GC barrier waits indefinitely (no timeout) for the
processor to respond, causing deadlocks:
1. Processor A holds __raw-table-lock, allocates during rehash,
triggers GC, enters barrier waiting for Processor B
2. Processor B is spinning on __raw-table-lock in (not interrupts-enabled)
code with no ___POLL checks, cannot respond to GC sync
3. Deadlock: A waits for B (barrier), B waits for A (spinlock)
The fix: remove (declare (not interrupts-enabled)) from all table
operation functions. The __lock-inline! macro already has its own
local (not interrupts-enabled) for the fast spin loop. Code between
lock/unlock now gets ___POLL checks, allowing the processor to
participate in GC sync while holding the lock. The GC properly
updates stack-referenced objects after moving them, so this is safe.
Results: P2 CORES=2 stdlib build now FULLY PASSES (316 modules).
Previously crashed at ~27 modules with SIGSEGV.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Gambit submodule: force lazy rehash (prevents non-eq? table corruption), fix rehash self-deadlock in striped spinlock array. Gerbil runtime: remove (declare (not interrupts-enabled)) from control.ss, interface.ss, and mop.ss — these declarations prevent GC barrier synchronization in SMP builds, causing deadlocks when other VM processors need garbage collection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
With --enable-smp, the default processor count creates idle VM processors that cause heartbeat storms and a GC race condition at higher processor counts (SIGSEGV in compiled Scheme code accessing stale heap pointers after GC). The crash scales with processor count: p1 and p2 are reliable, p16 crashes consistently. Force p1 during build since compilation is single-threaded anyway. Runtime programs can set their own processor count via GAMBOPT. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update Gambit submodule with the fix for a bug in move_continuation() that caused SIGSEGV with multiple VM processors. The bug wrote to the wrong stack frame slot when marking the first break frame's msection as NULL (not a stack overflow), causing ___stack_overflow_undo_if_possible to read stale data and corrupt stack_break with a code label address. With this fix, Gerbil builds and runs correctly with the default SMP processor count, so remove the p1 GAMBOPT workaround from build.sh. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Match the installed Gerbil's Gambit version (v4.9.7-43-gbe71ec38). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
👷 Deploy request for elastic-ritchie-8f47f9 pending review.Visit the deploys page to approve it
|
Collaborator
|
the problems are real, but the fixes incorrect.... glibal locks? WTF. |
Picks up RC subsystem race fixes, GC mark_rc for all processors, and atomic device refcount for multi-processor builds.
Picks up fca110bb which adds spinlock protection to the device group doubly-linked list and CAS-based thread-safe foreign object release, fixing use-after-free crashes during SMP shutdown. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The spinlock macro in util.ss still had (declare (not interrupts-enabled)) after d18e633 removed it from mop.ss. This caused a GC barrier deadlock: a thread spinning to acquire an inline lock could not respond to GC synchronization requests, stalling the GC and preventing the lock holder from ever releasing it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fix device condvar broadcast deadlock: move btq-deq-remove to the
"no more threads" branch so stranded condvars are not lost on trylock
failure. Also fix execute-pending-compile-jobs! to use thread-send/
thread-receive instead of thread-join! to avoid SMP condvar race.
Changes:
- src/gambit: update submodule to 78105346
- configure: update default_gambit_tag to 78105346
- src/gerbil/compiler/base.ss: use thread-send/receive in execute-pending-compile-jobs!
- src/bootstrap/gerbil/compiler/base~0.scm: matching compiled update
- src/build.sh: add m8G to GAMBOPT for SMP builds
- src/build/build{0,1}.ss: add explicit exit to avoid SMP shutdown hang
- src/build/build-bach.ss, std, lang, srfi, r7rs-large, tools: same
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…removal Compiled bootstrap counterparts for the __lock-inline! fix: - src/bootstrap/gerbil/runtime/interface~0.scm - src/bootstrap/gerbil/runtime/mop~0.scm - src/bootstrap/gerbil/runtime/util~1.scm Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR enables Gerbil to build and run with Gambit's SMP (Symmetric Multi-Processing) mode, where multiple VM processors (OS threads) execute Scheme threads concurrently with stop-the-world garbage collection.
Gambit has had
--enable-smpas a configure option for a while, but actually using it with a real workload like building Gerbil's 1184-module standard library exposed a number of race conditions at both the Gambit runtime level and the Gerbil runtime level. This PR contains the Gerbil-side fixes. The corresponding Gambit fixes are in a separate PR against the Gambit repository.With all fixes applied, Gerbil's full build pipeline (
./build.sh— gambit, stage0, stage1, stdlib, libgerbil, lang, r7rs-large, srfi, tools) runs reliably with the default SMP processor count. Single-threaded programs likegscsee zero overhead from the SMP build.What changed and why
1. Enable SMP in the Gambit configure flags (
configure)This is the top-level change that turns on SMP. Everything else in this PR exists because of what broke when we flipped this switch.
The Gambit submodule is also pointed to a fork (
ober/gambit) that contains the necessary Gambit runtime fixes (GC barrier races, hash table spinlocks,move_continuationstack corruption, deferred processor startup, etc.).2. Hash table locking (
src/gerbil/runtime/table.ss)This is the largest and most important change on the Gerbil side.
The problem: Gerbil's hash tables (
raw-tableandgc-table) are used pervasively throughout the runtime and compiler — the module registry, symbol tables, expander state, compilation caches, etc. In single-threaded mode, Gambit's scheduler guarantees that only one thread runs at a time, and(declare (not interrupts-enabled))prevents thread switches during critical sections. In SMP mode, multiple processors run truly concurrently, so these tables get hammered from multiple OS threads simultaneously with no synchronization. The result is corrupted probe sequences, lost entries, and crashes.The fix — raw tables: A global spinlock (
__raw-table-lock) now protects allraw-tableoperations:ref,set!,update!,delete!,for-each,copy, andclear!. I used a single global lock rather than per-table locks because:For
raw-table-for-each, the lock is held only long enough to snapshot the table vector, then released before calling the user's callback. This prevents deadlocks when the callback itself accesses other tables.The same locking pattern is applied to the specialized table variants (eq-int-table, etc.) that have their own optimized
ref/set/update/delmethods.The fix — gc-tables: GC-managed hash tables (
gc-table) have a different problem. Gambit's GC can relocate the underlyinggc-hash-tableobject, which means thegchtpointer you obtained before a mutation might be stale by the time the mutation returns. In single-threaded mode this didn't matter because GC couldn't fire between obtaining the pointer and using it (interrupts were disabled). In SMP mode, another processor can trigger GC at any moment.The fix adds a verify-retry pattern to
gc-table-set!andgc-table-delete!:After the mutation, we check if
gchtis still the current backing table. If another thread triggered GC and the table was relocated, we retry the operation against the new table.__gc-table-rehash!is also protected by a spinlock (__gc-table-lock) to prevent concurrent rehashes from corrupting the table. The lazy-creation of the immediate sub-table (for non-heap-allocated keys like fixnums and characters) uses double-checked locking to avoid creating duplicates.3. Remove
(declare (not interrupts-enabled))(multiple files)Files:
table.ss,control.ss,interface.ss,mop.ssThis is the change that's easiest to get wrong if you don't understand Gambit's SMP internals, so I want to explain it carefully.
In Gambit, the compiler inserts
___POLL()calls at function entry points.___POLL()does two things: (1) checks if the current thread's quantum has expired, enabling cooperative thread scheduling, and (2) checks if another processor has requested a stop-the-world GC synchronization. When you write(declare (not interrupts-enabled)), the compiler suppresses these___POLL()calls entirely.In single-threaded mode, suppressing
___POLL()is a reasonable optimization — it prevents unnecessary thread switches in critical sections and provides a lightweight form of atomicity.In SMP mode, suppressing
___POLL()is catastrophic. Here's why:INTR_SYNC_OPand waits.(not interrupts-enabled)— there are no___POLL()calls in this code path.This is especially bad when Processor B is spinning on a spinlock (like
__raw-table-lock) waiting for Processor A — which just triggered GC. Classic deadlock: A waits for B (barrier), B waits for A (lock).The fix is straightforward: remove
(declare (not interrupts-enabled))everywhere. The atomicity it was providing is now handled by explicit spinlocks, which spin in normal Scheme code that has___POLL()calls, so processors can always respond to GC barrier requests.In each case,
(declare (not safe))was kept where it was already present — that's an orthogonal optimization that skips type checks, and doesn't affect GC barrier participation.Specific locations:
table.ss:__eq-hash,__gc-table-e,gc-table-ref,gc-table-set!,gc-table-delete!,gc-table-for-each,__object->eq-hash. These were the most critical because hash table operations are hot paths during compilation.control.ss: Thedynamic-windguards inwith-lockandwith-unwind-protect. These use##vector-cas!for atomicity (which is inherently atomic), so the(not interrupts-enabled)declaration was redundant anyway.interface.ss: Thedefcastmacro's generated cast function. Interface dispatch involves type lookups that must not block GC.mop.ss:type-of,class-of, and the type dispatch lambdas in the MOP initialization. These are called on every method dispatch, so blocking GC here was a frequent source of barrier timeouts.4. Serialized compilation (
src/gerbil/compiler/driver.ss)The problem: Gerbil's parallel build system (
make.ss) spawns multiple worker threads that compile modules concurrently. Each worker callscompile-top-module, which accesses shared mutable state: the expander's module registry, the optimizer's binding tables, and the compiler's code generation tables. In single-threaded Gambit, the cooperative scheduler plus strategicwith-driver-mutexcalls aroundoptimize!was sufficient. In SMP mode, true concurrency means two workers can be insidecompile-top-modulesimultaneously, corrupting shared state.The fix: Wrap the entire
compile-top-modulebody inwith-driver-mutex, effectively serializing compilation. This might seem like it defeats the purpose of parallel builds, but it doesn't — the parallelism in Gerbil's build system is primarily about I/O overlap (one module compiles while another's output is being written to disk, while a third's dependencies are being resolved). The actual compilation of any single module must be serialized because the expander and compiler maintain global mutable state.The
with-driver-mutexmacro was also made reentrant:This is needed because
compile-top-modulecallsoptimize!, which was previously wrapped in its ownwith-driver-mutex. Without reentrancy, this would deadlock — the thread already holds the mutex fromcompile-top-moduleand then tries to acquire it again insideoptimize!. With the reentrant check, the nested call recognizes the current thread already owns the mutex and proceeds without locking.+driver-mutex+is also exported from the module so thatmake.sscan use it (see below).5. Unified import/compile mutex (
src/std/make.ss)The problem:
make.ss(the parallel build driver) had its own localimport-mxmutex for protectingimport-modulecalls, separate from the+driver-mutex+used by the compiler. This creates a lock ordering problem: a worker thread might holdimport-mxwhile callingimport-module, which internally triggers compilation, which tries to acquire+driver-mutex+. Meanwhile another worker holds+driver-mutex+and tries to import, trying to acquireimport-mx. Classic ABBA deadlock.More fundamentally,
import-modulemutates the same global__module-registrythatcompile-top-modulereads and writes. Using different mutexes for the same shared state provides no protection at all.The fix: Remove the local
import-mxand use+driver-mutex+(imported fromdriver.ss) for import serialization. Now imports and compilations share a single lock, eliminating both the lock ordering problem and the concurrent access to__module-registry.6. Build configuration (
configure,build.sh)configure: Added--enable-multiple-threaded-vms --enable-smpto the Gambit configure flags. Updated the Gambit submodule tag to point to the fork with SMP runtime fixes.build.sh: Removed the,p1GAMBOPT setting that was temporarily limiting Gerbil builds to a single VM processor while fixes were being developed. No longer needed..gitmodules: Pointed toober/gambit(the fork with SMP fixes) instead ofgambit/gambit.Testing
./build.shend-to-end,p1workaround)gscperformanceThe single-threaded
gscresult is worth calling out: with the deferred processor startup fix on the Gambit side, programs that don't create threads (likegsc, which dominates Gerbil's build time) run with zero SMP overhead. The extra VM processor threads are only spawned on the firstthread-start!call.Files changed
configure.gitmodulesober/gambitforksrc/gerbil/runtime/table.sssrc/gerbil/runtime/control.ss(not interrupts-enabled)src/gerbil/runtime/interface.ss(not interrupts-enabled)src/gerbil/runtime/mop.ss(not interrupts-enabled)src/gerbil/compiler/driver.sscompile-top-module, reentrant mutex, export+driver-mutex+src/std/make.ss+driver-mutex+for imports instead of local mutexFuture work
__raw-table-lockworks but could become a bottleneck under heavy concurrent access. If profiling shows contention, we could embed a lock in the table struct (would require a bootstrap cycle).compile-top-moduleis serialized. With some refactoring to isolate the mutable expander state, we could allow more overlap between compilation stages.