Enable SMP support in Gerbil by ober · Pull Request #1416 · mighty-gerbils/gerbil

ober · 2026-03-04T16:59:22Z

Summary

This PR enables Gerbil to build and run with Gambit's SMP (Symmetric Multi-Processing) mode, where multiple VM processors (OS threads) execute Scheme threads concurrently with stop-the-world garbage collection.

Gambit has had --enable-smp as a configure option for a while, but actually using it with a real workload like building Gerbil's 1184-module standard library exposed a number of race conditions at both the Gambit runtime level and the Gerbil runtime level. This PR contains the Gerbil-side fixes. The corresponding Gambit fixes are in a separate PR against the Gambit repository.

With all fixes applied, Gerbil's full build pipeline (./build.sh — gambit, stage0, stage1, stdlib, libgerbil, lang, r7rs-large, srfi, tools) runs reliably with the default SMP processor count. Single-threaded programs like gsc see zero overhead from the SMP build.

What changed and why

1. Enable SMP in the Gambit configure flags (`configure`)

--enable-multiple-threaded-vms --enable-smp

This is the top-level change that turns on SMP. Everything else in this PR exists because of what broke when we flipped this switch.

The Gambit submodule is also pointed to a fork (ober/gambit) that contains the necessary Gambit runtime fixes (GC barrier races, hash table spinlocks, move_continuation stack corruption, deferred processor startup, etc.).

2. Hash table locking (`src/gerbil/runtime/table.ss`)

This is the largest and most important change on the Gerbil side.

The problem: Gerbil's hash tables (raw-table and gc-table) are used pervasively throughout the runtime and compiler — the module registry, symbol tables, expander state, compilation caches, etc. In single-threaded mode, Gambit's scheduler guarantees that only one thread runs at a time, and (declare (not interrupts-enabled)) prevents thread switches during critical sections. In SMP mode, multiple processors run truly concurrently, so these tables get hammered from multiple OS threads simultaneously with no synchronization. The result is corrupted probe sequences, lost entries, and crashes.

The fix — raw tables: A global spinlock (__raw-table-lock) now protects all raw-table operations: ref, set!, update!, delete!, for-each, copy, and clear!. I used a single global lock rather than per-table locks because:

Raw tables are small and operations are fast (microseconds), so contention is low in practice.
Gerbil's raw tables don't have a convenient place to store a per-table lock without changing the struct layout, which would break the bootstrap.
A global lock is simple and correct. We can optimize to per-table locks later if profiling shows contention.

For raw-table-for-each, the lock is held only long enough to snapshot the table vector, then released before calling the user's callback. This prevents deadlocks when the callback itself accesses other tables.

The same locking pattern is applied to the specialized table variants (eq-int-table, etc.) that have their own optimized ref/set/update/del methods.

The fix — gc-tables: GC-managed hash tables (gc-table) have a different problem. Gambit's GC can relocate the underlying gc-hash-table object, which means the gcht pointer you obtained before a mutation might be stale by the time the mutation returns. In single-threaded mode this didn't matter because GC couldn't fire between obtaining the pointer and using it (interrupts were disabled). In SMP mode, another processor can trigger GC at any moment.

The fix adds a verify-retry pattern to gc-table-set! and gc-table-delete!:

(let (gcht (__gc-table-e tab))
  (when (##gc-hash-table-set! gcht key value)
    (__gc-table-rehash! tab)
    (gc-table-set! tab key value))        ;; retry after rehash
  (unless (##eq? gcht (&gc-table-gcht tab))
    (gc-table-set! tab key value)))       ;; retry if GC moved the table

After the mutation, we check if gcht is still the current backing table. If another thread triggered GC and the table was relocated, we retry the operation against the new table.

__gc-table-rehash! is also protected by a spinlock (__gc-table-lock) to prevent concurrent rehashes from corrupting the table. The lazy-creation of the immediate sub-table (for non-heap-allocated keys like fixnums and characters) uses double-checked locking to avoid creating duplicates.

3. Remove `(declare (not interrupts-enabled))` (multiple files)

Files: table.ss, control.ss, interface.ss, mop.ss

This is the change that's easiest to get wrong if you don't understand Gambit's SMP internals, so I want to explain it carefully.

In Gambit, the compiler inserts ___POLL() calls at function entry points. ___POLL() does two things: (1) checks if the current thread's quantum has expired, enabling cooperative thread scheduling, and (2) checks if another processor has requested a stop-the-world GC synchronization. When you write (declare (not interrupts-enabled)), the compiler suppresses these ___POLL() calls entirely.

In single-threaded mode, suppressing ___POLL() is a reasonable optimization — it prevents unnecessary thread switches in critical sections and provides a lightweight form of atomicity.

In SMP mode, suppressing ___POLL() is catastrophic. Here's why:

Processor A allocates memory, triggers GC, enters the stop-the-world barrier.
Processor A signals all other processors via INTR_SYNC_OP and waits.
Processor B is running code compiled with (not interrupts-enabled) — there are no ___POLL() calls in this code path.
Processor B never checks for the GC request, never enters the barrier.
Deadlock. A waits for B to arrive at the barrier. B can never see the request.

This is especially bad when Processor B is spinning on a spinlock (like __raw-table-lock) waiting for Processor A — which just triggered GC. Classic deadlock: A waits for B (barrier), B waits for A (lock).

The fix is straightforward: remove (declare (not interrupts-enabled)) everywhere. The atomicity it was providing is now handled by explicit spinlocks, which spin in normal Scheme code that has ___POLL() calls, so processors can always respond to GC barrier requests.

In each case, (declare (not safe)) was kept where it was already present — that's an orthogonal optimization that skips type checks, and doesn't affect GC barrier participation.

Specific locations:

table.ss: __eq-hash, __gc-table-e, gc-table-ref, gc-table-set!, gc-table-delete!, gc-table-for-each, __object->eq-hash. These were the most critical because hash table operations are hot paths during compilation.
control.ss: The dynamic-wind guards in with-lock and with-unwind-protect. These use ##vector-cas! for atomicity (which is inherently atomic), so the (not interrupts-enabled) declaration was redundant anyway.
interface.ss: The defcast macro's generated cast function. Interface dispatch involves type lookups that must not block GC.
mop.ss: type-of, class-of, and the type dispatch lambdas in the MOP initialization. These are called on every method dispatch, so blocking GC here was a frequent source of barrier timeouts.

4. Serialized compilation (`src/gerbil/compiler/driver.ss`)

The problem: Gerbil's parallel build system (make.ss) spawns multiple worker threads that compile modules concurrently. Each worker calls compile-top-module, which accesses shared mutable state: the expander's module registry, the optimizer's binding tables, and the compiler's code generation tables. In single-threaded Gambit, the cooperative scheduler plus strategic with-driver-mutex calls around optimize! was sufficient. In SMP mode, true concurrency means two workers can be inside compile-top-module simultaneously, corrupting shared state.

The fix: Wrap the entire compile-top-module body in with-driver-mutex, effectively serializing compilation. This might seem like it defeats the purpose of parallel builds, but it doesn't — the parallelism in Gerbil's build system is primarily about I/O overlap (one module compiles while another's output is being written to disk, while a third's dependencies are being resolved). The actual compilation of any single module must be serialized because the expander and compiler maintain global mutable state.

The with-driver-mutex macro was also made reentrant:

(defrules with-driver-mutex ()
  ((_ expr)
   (if (eq? (mutex-state +driver-mutex+) (current-thread))
     expr ;; already holding the mutex — reentrant call
     (with-lock +driver-mutex+ (lambda () expr)))))

This is needed because compile-top-module calls optimize!, which was previously wrapped in its own with-driver-mutex. Without reentrancy, this would deadlock — the thread already holds the mutex from compile-top-module and then tries to acquire it again inside optimize!. With the reentrant check, the nested call recognizes the current thread already owns the mutex and proceeds without locking.

+driver-mutex+ is also exported from the module so that make.ss can use it (see below).

5. Unified import/compile mutex (`src/std/make.ss`)

The problem: make.ss (the parallel build driver) had its own local import-mx mutex for protecting import-module calls, separate from the +driver-mutex+ used by the compiler. This creates a lock ordering problem: a worker thread might hold import-mx while calling import-module, which internally triggers compilation, which tries to acquire +driver-mutex+. Meanwhile another worker holds +driver-mutex+ and tries to import, trying to acquire import-mx. Classic ABBA deadlock.

More fundamentally, import-module mutates the same global __module-registry that compile-top-module reads and writes. Using different mutexes for the same shared state provides no protection at all.

The fix: Remove the local import-mx and use +driver-mutex+ (imported from driver.ss) for import serialization. Now imports and compilations share a single lock, eliminating both the lock ordering problem and the concurrent access to __module-registry.

6. Build configuration (`configure`, `build.sh`)

configure: Added --enable-multiple-threaded-vms --enable-smp to the Gambit configure flags. Updated the Gambit submodule tag to point to the fork with SMP runtime fixes.
build.sh: Removed the ,p1 GAMBOPT setting that was temporarily limiting Gerbil builds to a single VM processor while fixes were being developed. No longer needed.
.gitmodules: Pointed to ober/gambit (the fork with SMP fixes) instead of gambit/gambit.

Testing

Test	Before SMP fixes	After
4-thread arithmetic stress test (p2)	~20% pass rate	50/50 pass
4-thread arithmetic stress test (p16)	~0% pass rate	50/50 pass
Gerbil stdlib build (p2)	~60% pass rate	20/20 pass
Full Gerbil build pipeline (p8)	crash	5/5 pass
Full `./build.sh` end-to-end	N/A (used `,p1` workaround)	pass
Single-threaded `gsc` performance	5+ hours wall time (22min CPU)	22 minutes (zero SMP overhead)

The single-threaded gsc result is worth calling out: with the deferred processor startup fix on the Gambit side, programs that don't create threads (like gsc, which dominates Gerbil's build time) run with zero SMP overhead. The extra VM processor threads are only spawned on the first thread-start! call.

Files changed

File	What	Why
`configure`	SMP configure flags + submodule tag	Enable SMP mode
`.gitmodules`	Point to `ober/gambit` fork	Contains Gambit runtime fixes
`src/gerbil/runtime/table.ss`	Spinlock locking for raw-table and gc-table, verify-retry for concurrent rehash, double-checked locking for immediate table creation	Hash tables are accessed concurrently by multiple VM processors
`src/gerbil/runtime/control.ss`	Remove `(not interrupts-enabled)`	Prevents GC barrier deadlock
`src/gerbil/runtime/interface.ss`	Remove `(not interrupts-enabled)`	Prevents GC barrier deadlock
`src/gerbil/runtime/mop.ss`	Remove `(not interrupts-enabled)`	Prevents GC barrier deadlock
`src/gerbil/compiler/driver.ss`	Serialize `compile-top-module`, reentrant mutex, export `+driver-mutex+`	Compiler state is not thread-safe
`src/std/make.ss`	Use `+driver-mutex+` for imports instead of local mutex	Shared state needs a shared lock

Future work

Per-table locks for raw-table: The global __raw-table-lock works but could become a bottleneck under heavy concurrent access. If profiling shows contention, we could embed a lock in the table struct (would require a bootstrap cycle).
Lock-free reads: Read operations on raw-tables could potentially be made lock-free with careful ordering, since the underlying vector is immutable between rehashes.
Finer-grained compiler locking: Currently the entire compile-top-module is serialized. With some refactoring to isolate the mutable expander state, we could allow more overlap between compilation stages.

Add thread-safe locking to Gerbil's hash table implementation: - Add __raw-table-lock spinlock protecting all read and write operations on raw tables (ref, set!, update!, delete!, for-each, copy, clear!) - Use snapshot-under-lock for raw-table-for-each to prevent deadlock when callbacks access other tables (e.g. __specialize-method in MOP) - Add __gc-table-lock spinlock for gc-table operations (rehash, immediate creation, clear!) - Fix gc-table-set!/delete! to verify gcht is still current after ##gc-hash-table-set! (another thread may have resized concurrently) Serialize module compilation to prevent concurrent expander access: - Wrap compile-top-module in with-driver-mutex to serialize compilation - Make with-driver-mutex reentrant (check if current thread holds mutex) - Use +driver-mutex+ for import/mx in make.ss instead of separate mutex, ensuring imports and compilation share the same lock - Export +driver-mutex+ from compiler/driver Results: P1 SMP stdlib build fully passes (339 modules). P2 improved from ~21 to ~200 modules before crash. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

In Gambit SMP, (declare (not interrupts-enabled)) suppresses ___POLL() generation at compile time. ___POLL is the ONLY mechanism by which a processor detects GC synchronization requests from other processors. Without it, the GC barrier waits indefinitely (no timeout) for the processor to respond, causing deadlocks: 1. Processor A holds __raw-table-lock, allocates during rehash, triggers GC, enters barrier waiting for Processor B 2. Processor B is spinning on __raw-table-lock in (not interrupts-enabled) code with no ___POLL checks, cannot respond to GC sync 3. Deadlock: A waits for B (barrier), B waits for A (spinlock) The fix: remove (declare (not interrupts-enabled)) from all table operation functions. The __lock-inline! macro already has its own local (not interrupts-enabled) for the fast spin loop. Code between lock/unlock now gets ___POLL checks, allowing the processor to participate in GC sync while holding the lock. The GC properly updates stack-referenced objects after moving them, so this is safe. Results: P2 CORES=2 stdlib build now FULLY PASSES (316 modules). Previously crashed at ~27 modules with SIGSEGV. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Gambit submodule: force lazy rehash (prevents non-eq? table corruption), fix rehash self-deadlock in striped spinlock array. Gerbil runtime: remove (declare (not interrupts-enabled)) from control.ss, interface.ss, and mop.ss — these declarations prevent GC barrier synchronization in SMP builds, causing deadlocks when other VM processors need garbage collection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

With --enable-smp, the default processor count creates idle VM processors that cause heartbeat storms and a GC race condition at higher processor counts (SIGSEGV in compiled Scheme code accessing stale heap pointers after GC). The crash scales with processor count: p1 and p2 are reliable, p16 crashes consistently. Force p1 during build since compilation is single-threaded anyway. Runtime programs can set their own processor count via GAMBOPT. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update Gambit submodule with the fix for a bug in move_continuation() that caused SIGSEGV with multiple VM processors. The bug wrote to the wrong stack frame slot when marking the first break frame's msection as NULL (not a stack overflow), causing ___stack_overflow_undo_if_possible to read stale data and corrupt stack_break with a code label address. With this fix, Gerbil builds and runs correctly with the default SMP processor count, so remove the p1 GAMBOPT workaround from build.sh. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Match the installed Gerbil's Gambit version (v4.9.7-43-gbe71ec38). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

netlify · 2026-03-04T16:59:27Z

👷 Deploy request for elastic-ritchie-8f47f9 pending review.

Visit the deploys page to approve it

Name	Link
🔨 Latest commit	`584d21e`

vyzo · 2026-03-04T17:01:30Z

the problems are real, but the fixes incorrect.... glibal locks? WTF.

Picks up RC subsystem race fixes, GC mark_rc for all processors, and atomic device refcount for multi-processor builds.

Picks up fca110bb which adds spinlock protection to the device group doubly-linked list and CAS-based thread-safe foreign object release, fixing use-after-free crashes during SMP shutdown. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The spinlock macro in util.ss still had (declare (not interrupts-enabled)) after d18e633 removed it from mop.ss. This caused a GC barrier deadlock: a thread spinning to acquire an inline lock could not respond to GC synchronization requests, stalling the GC and preventing the lock holder from ever releasing it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix device condvar broadcast deadlock: move btq-deq-remove to the "no more threads" branch so stranded condvars are not lost on trylock failure. Also fix execute-pending-compile-jobs! to use thread-send/ thread-receive instead of thread-join! to avoid SMP condvar race. Changes: - src/gambit: update submodule to 78105346 - configure: update default_gambit_tag to 78105346 - src/gerbil/compiler/base.ss: use thread-send/receive in execute-pending-compile-jobs! - src/bootstrap/gerbil/compiler/base~0.scm: matching compiled update - src/build.sh: add m8G to GAMBOPT for SMP builds - src/build/build{0,1}.ss: add explicit exit to avoid SMP shutdown hang - src/build/build-bach.ss, std, lang, srfi, r7rs-large, tools: same Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…removal Compiled bootstrap counterparts for the __lock-inline! fix: - src/bootstrap/gerbil/runtime/interface~0.scm - src/bootstrap/gerbil/runtime/mop~0.scm - src/bootstrap/gerbil/runtime/util~1.scm Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ober and others added 11 commits March 4, 2026 08:43

Make docker work again for ci/cd,dockerhub

4796f94

enable smp

2a8cd4a

updates

224e6a4

SMP: update Gambit submodule to eb68cf26

852d099

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

SMP: set Gambit submodule and configure to be71ec38

888c366

Match the installed Gerbil's Gambit version (v4.9.7-43-gbe71ec38). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

SMP: set Gambit submodule and configure to 55273b5b

242656b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ober and others added 5 commits March 4, 2026 17:56

SMP: set Gambit submodule and configure to 090ae227

2ffad2b

Picks up RC subsystem race fixes, GC mark_rc for all processors, and atomic device refcount for multi-processor builds.

SMP: update Gambit submodule and configure to cdee7acf

534d599

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

zyd-n mentioned this pull request Mar 11, 2026

What is the policy/attitude towards LLM contributions in Gerbil? #1417

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable SMP support in Gerbil#1416

Enable SMP support in Gerbil#1416
ober wants to merge 17 commits intomighty-gerbils:masterfrom
ober:master

ober commented Mar 4, 2026

Uh oh!

netlify bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

vyzo commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ober commented Mar 4, 2026

Summary

What changed and why

1. Enable SMP in the Gambit configure flags (configure)

2. Hash table locking (src/gerbil/runtime/table.ss)

3. Remove (declare (not interrupts-enabled)) (multiple files)

4. Serialized compilation (src/gerbil/compiler/driver.ss)

5. Unified import/compile mutex (src/std/make.ss)

6. Build configuration (configure, build.sh)

Testing

Files changed

Future work

Uh oh!

netlify bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👷 Deploy request for elastic-ritchie-8f47f9 pending review.

Uh oh!

vyzo commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Enable SMP in the Gambit configure flags (`configure`)

2. Hash table locking (`src/gerbil/runtime/table.ss`)

3. Remove `(declare (not interrupts-enabled))` (multiple files)

4. Serialized compilation (`src/gerbil/compiler/driver.ss`)

5. Unified import/compile mutex (`src/std/make.ss`)

6. Build configuration (`configure`, `build.sh`)

netlify bot commented Mar 4, 2026 •

edited

Loading