Skip to content

Multi-core scheduler for the RP2040 #4925

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/internal/task/task_stack_cortexm.c
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
//go:build scheduler.tasks && cortexm
//go:build (scheduler.tasks || scheduler.cores) && cortexm
#include <stdint.h>

uintptr_t SystemStack() {
Expand Down
2 changes: 1 addition & 1 deletion src/internal/task/task_stack_cortexm.go
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
//go:build scheduler.tasks && cortexm
//go:build (scheduler.tasks || scheduler.cores) && cortexm

package task

Expand Down
29 changes: 26 additions & 3 deletions src/runtime/gc_stack_cores.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,24 @@ var gcScanState atomic.Uint32
// Start GC scan by pausing the world (all other cores) and scanning their
// stacks. It doesn't resume the world.
func gcMarkReachable() {
// If the other cores haven't started yet (for example, when a GC cycle
// happens during init()), we only need to scan the stack of the current
// core.
if !secondaryCoresStarted {
// Scan the stack(s) of the current core.
scanCurrentStack()
if !task.OnSystemStack() {
// Mark system stack.
markRoots(task.SystemStack(), stackTop)
}

// Scan globals.
findGlobals(markRoots)

// Nothing more to do: the other cores haven't started yet.
return
}

core := currentCPU()

// Interrupt all other cores.
Expand All @@ -38,7 +56,7 @@ func gcMarkReachable() {
// Busy-wait until all the other cores are ready. They certainly should be,
// after the scanning we did above.
for gcScanState.Load() != numCPU {
spinLoopHint()
spinLoopWait()
}
gcScanState.Store(0)

Expand All @@ -53,7 +71,7 @@ func gcMarkReachable() {

// Busy-wait until this core finished scanning.
for gcScanState.Load() == 0 {
spinLoopHint()
spinLoopWait()
}
gcScanState.Store(0)
}
Expand Down Expand Up @@ -81,6 +99,11 @@ func scanstack(sp uintptr) {

// Resume the world after a call to gcMarkReachable.
func gcResumeWorld() {
if !secondaryCoresStarted {
// Nothing to do: the world wasn't stopped in gcMarkReachable.
return
}

// Signal each core that they can resume.
hartID := currentCPU()
for i := uint32(0); i < numCPU; i++ {
Expand All @@ -95,7 +118,7 @@ func gcResumeWorld() {
// Busy-wait until the core acknowledges the signal (and is going to return
// from the interrupt handler).
for gcScanState.Load() != numCPU-1 {
spinLoopHint()
spinLoopWait()
}
gcScanState.Store(0)
}
285 changes: 285 additions & 0 deletions src/runtime/runtime_rp2040.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,17 @@ package runtime

import (
"device/arm"
"device/rp"
"internal/task"
"machine"
"machine/usb/cdc"
"runtime/interrupt"
"runtime/volatile"
"unsafe"
)

const numCPU = 2

// machineTicks is provided by package machine.
func machineTicks() uint64

Expand Down Expand Up @@ -43,6 +50,284 @@ func sleepTicks(d timeUnit) {
}
}

// Currently sleeping core, or 0xff.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ diff src/runtime/runtime_rp2*
1c1
< //go:build rp2040
---
> //go:build rp2350
31a32,35
>       if d <= 0 {
>               return
>       }
>

Can the two files be merged, unlocking multicore on rp2350 for (approximately) free?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, but I don't have a RP2350 to test. I think there are small differences that need to be accounted for.
I plan on getting a RP2350 soon!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will give you one next week!

// Must only be accessed with the scheduler lock held.
var sleepingCore uint8 = 0xff

// Return whether another core is sleeping.
// May only be called with the scheduler lock held.
func hasSleepingCore() bool {
return sleepingCore != 0xff
}

// Almost identical to sleepTicks, except that it will unlock/lock the scheduler
// while sleeping and is interruptible by interruptSleepTicksMulticore.
// This may only be called with the scheduler lock held.
func sleepTicksMulticore(d timeUnit) {
sleepingCore = uint8(currentCPU())

// Note: interruptSleepTicksMulticore will be able to interrupt this, since
// it executes the "sev" instruction which would make sleepTicks return
// immediately without sleeping. Even if it happens while configuring the
// sleep operation.

schedulerLock.Unlock()
sleepTicks(d)
schedulerLock.Lock()

sleepingCore = 0xff
}

// Interrupt an ongoing call to sleepTicksMulticore on another core.
func interruptSleepTicksMulticore(wakeup timeUnit) {
arm.Asm("sev")
}

// Number of cores that are currently in schedulerUnlockAndWait.
// It is possible for both cores to be sleeping, if the program is waiting for
// an interrupt (or is deadlocked).
var waitingCore uint8

// Put the scheduler to sleep, since there are no tasks to run.
// This will unlock the scheduler lock, and must be called with the scheduler
// lock held.
func schedulerUnlockAndWait() {
waitingCore++
schedulerLock.Unlock()
arm.Asm("wfe")
schedulerLock.Lock()
waitingCore--
}

// Wake another core, if one is sleeping. Must be called with the scheduler lock
// held.
func schedulerWake() {
if waitingCore != 0 {
arm.Asm("sev")
}
}

// Return the current core number: 0 or 1.
func currentCPU() uint32 {
return rp.SIO.CPUID.Get()
}

// Start the secondary cores for this chip.
// On the RP2040, there is only one other core to start.
func startSecondaryCores() {
// Start the second core of the RP2040.
// See section 2.8.2 in the datasheet.
seq := 0
for {
cmd := core1StartSequence[seq]
if cmd == 0 {
multicore_fifo_drain()
arm.Asm("sev")
}
multicore_fifo_push_blocking(cmd)
response := multicore_fifo_pop_blocking()
if cmd != response {
seq = 0
continue
}
seq = seq + 1
if seq >= len(core1StartSequence) {
break
}
}

// Enable the FIFO interrupt for the GC stop the world phase.
// We can only do this after we don't need the FIFO anymore for starting the
// second core.
intr := interrupt.New(rp.IRQ_SIO_IRQ_PROC0, func(intr interrupt.Interrupt) {
switch rp.SIO.FIFO_RD.Get() {
case 1:
gcInterruptHandler(0)
}
})
intr.Enable()
intr.SetPriority(0xff)
}

var core1StartSequence = [...]uint32{
0, 0, 1,
uint32(uintptr(unsafe.Pointer(&__isr_vector))),
uint32(uintptr(unsafe.Pointer(&stack1TopSymbol))),
uint32(exportedFuncPtr(runCore1)),
}

//go:extern __isr_vector
var __isr_vector [0]uint32

//go:extern _stack1_top
var stack1TopSymbol [0]uint32

// The function that is started on the second core.
//
//export tinygo_runCore1
func runCore1() {
// Clear sticky bit that seems to have been set while starting this core.
rp.SIO.FIFO_ST.Set(rp.SIO_FIFO_ST_ROE)

// Enable the FIFO interrupt, mainly used for the stop-the-world phase of
// the GC.
// Use the lowest possible priority (highest priority value), so that other
// interrupts can still happen while the GC is running.
intr := interrupt.New(rp.IRQ_SIO_IRQ_PROC1, func(intr interrupt.Interrupt) {
switch rp.SIO.FIFO_RD.Get() {
case 1:
gcInterruptHandler(1)
}
})
intr.Enable()
intr.SetPriority(0xff)

// Now start running the scheduler on this core.
schedulerLock.Lock()
scheduler(false)
schedulerLock.Unlock()

// The main function returned.
exit(0)
}

// The below multicore_fifo_* functions have been translated from the Raspberry
// Pi Pico SDK.

func multicore_fifo_rvalid() bool {
return rp.SIO.FIFO_ST.Get()&rp.SIO_FIFO_ST_VLD != 0
}

func multicore_fifo_wready() bool {
return rp.SIO.FIFO_ST.Get()&rp.SIO_FIFO_ST_RDY != 0
}

func multicore_fifo_drain() {
for multicore_fifo_rvalid() {
rp.SIO.FIFO_RD.Get()
}
}

func multicore_fifo_push_blocking(data uint32) {
for !multicore_fifo_wready() {
}
rp.SIO.FIFO_WR.Set(data)
arm.Asm("sev")
}

func multicore_fifo_pop_blocking() uint32 {
for !multicore_fifo_rvalid() {
arm.Asm("wfe")
}

return rp.SIO.FIFO_RD.Get()
}

// Value used to communicate between the GC core and the other (paused) cores.
var gcSignalWait volatile.Register8

// The GC interrupted this core for the stop-the-world phase.
// This function handles that, and only returns after the stop-the-world phase
// ended.
func gcInterruptHandler(hartID uint32) {
// Let the GC know we're ready.
gcScanState.Add(1)
arm.Asm("sev")

// Wait until we get a signal to start scanning.
for gcSignalWait.Get() == 0 {
arm.Asm("wfe")
}
gcSignalWait.Set(0)

// Scan the stack(s) of this core.
scanCurrentStack()
if !task.OnSystemStack() {
// Mark system stack.
markRoots(task.SystemStack(), coreStackTop(hartID))
}

// Signal we've finished scanning.
gcScanState.Store(1)
arm.Asm("sev")

// Wait until we get a signal that the stop-the-world phase has ended.
for gcSignalWait.Get() == 0 {
arm.Asm("wfe")
}
gcSignalWait.Set(0)

// Signal we received the signal and are going to exit the interrupt.
gcScanState.Add(1)
arm.Asm("sev")
}

// Pause the given core by sending it an interrupt.
func gcPauseCore(core uint32) {
rp.SIO.FIFO_WR.Set(1)
}

// Signal the given core that it can resume one step.
// This is called twice after gcPauseCore: the first time to scan the stack of
// the core, and the second time to end the stop-the-world phase.
func gcSignalCore(core uint32) {
gcSignalWait.Set(1)
arm.Asm("sev")
}

// Returns the stack top (highest address) of the system stack of the given
// core.
func coreStackTop(core uint32) uintptr {
switch core {
case 0:
return uintptr(unsafe.Pointer(&stackTopSymbol))
case 1:
return uintptr(unsafe.Pointer(&stack1TopSymbol))
default:
runtimePanic("unexpected core")
return 0
}
}

// These spinlocks are needed by the runtime.
var (
printLock = spinLock{id: 0}
schedulerLock = spinLock{id: 1}
atomicsLock = spinLock{id: 2}
futexLock = spinLock{id: 3}
)

// A hardware spinlock, one of the 32 spinlocks defined in the SIO peripheral.
type spinLock struct {
id uint8
}

// Return the spinlock register: rp.SIO.SPINLOCKx
func (l *spinLock) spinlock() *volatile.Register32 {
return (*volatile.Register32)(unsafe.Add(unsafe.Pointer(&rp.SIO.SPINLOCK0), l.id*4))
}

func (l *spinLock) Lock() {
// Wait for the lock to be available.
spinlock := l.spinlock()
for spinlock.Get() == 0 {
// TODO: use wfe and send an event when unlocking so the CPU can go to
// sleep while waiting for the lock.
// Unfortunately when doing that, time.Sleep() seems to hang somewhere.
// This needs some debugging to figure out.
Comment on lines +314 to +317
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could hide some other subtle error that affects not just time.Sleep. It seems risky to enable the multicore scheduler by default in v0.38.0 with so little bake time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the voice of reason"? 😺

This should build with multicore I think:

tinygo flash -target=pico -scheduler=cores -ldflags="--defsym=__num_stacks=2" path/to/code

Perhaps we should consider it experimental for this release?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eliasnaur from reading the comment carefully, I interpret it to mean that the implementation in this PR could be improved by using wfe, but it does not yet due to the time.Sleep issue mentioned.

That does not change the point you are making overall. 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another implication of changing the default to multicore, is that default power use will be higher with the second core active. ⚡

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think power consumption will be affected much: when one of the cores has nothing to do it will just go to sleep. But I didn't measure this. (Should be easy to do by compiling with -scheduler=tasks and -scheduler=cores and measuring the difference).

Copy link
Member Author

@aykevl aykevl Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the way it works (at least how I understand it), is that both cores are always running. The second core is just asleep on reset. With multicore support, it is pulled out of this sleep state and starts the scheduler - which in many cases will put it right back to sleep waiting for a goroutine to run. So yeah it spends a little time configuring stuff and then goes back to sleep.
...but again, I didn't measure anything so I'm only speculating how it should work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yeah it spends a little time configuring stuff and then goes back to sleep.

That is what all of the docs and references I could find are claiming about power use. So it would appear to be fine as it is from that point of view.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could disable multicore support for now, and leave it for people who want to test it. It's possible there's a bug I didn't discover yet.

As much as this feature excites me, I think @eliasnaur is probably correct that we should not make it the default yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aykevl has a better idea of the robustness of the multi-core support than I; I merely pointed out the timing.

I'm also very excited about this code, and I would be comfortable making multi-core default, if the release notes spelled out instructions for switching back to the single-core scheduler. It's just a --scheduler=... argument, right? Assuming __num_stacks=2 is benign.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had already confirmed that -scheduler=tasks does what you would expect and runs on single core.

I assume that the extra stack space is wasted in that case, so I also tested this command (note I missed the extldflags in my original comment):

tinygo flash -target nano-rp2040 -ldflags="--extldflags '--defsym=__num_stacks=1'" -stack-size 8kb -scheduler tasks /path/to/my/code

This also worked correctly, and according to -size full did save some bytes.

Based on this, I think we can merge this PR as it is, and just add the docs to fallback to single core.

}
}

func (l *spinLock) Unlock() {
l.spinlock().Set(0)
}

// Wait until a signal is received, indicating that it can resume from the
// spinloop.
func spinLoopWait() {
arm.Asm("wfe")
}

func waitForEvents() {
arm.Asm("wfe")
}
Expand Down
4 changes: 2 additions & 2 deletions src/runtime/runtime_tinygoriscv_qemu.go
Original file line number Diff line number Diff line change
Expand Up @@ -360,7 +360,7 @@ type spinLock struct {
func (l *spinLock) Lock() {
// Try to replace 0 with 1. Once we succeed, the lock has been acquired.
for !l.Uint32.CompareAndSwap(0, 1) {
spinLoopHint()
spinLoopWait()
}
}

Expand All @@ -376,7 +376,7 @@ func (l *spinLock) Unlock() {

// Hint to the CPU that this core is just waiting, and the core can go into a
// lower energy state.
func spinLoopHint() {
func spinLoopWait() {
// This is a no-op in QEMU TCG (but added here for completeness):
// https://github.com/qemu/qemu/blob/v9.2.3/target/riscv/insn_trans/trans_rvi.c.inc#L856
riscv.Asm("pause")
Expand Down
Loading