This document describes thread signaling, a Zircon kernel mechanism used to implement thread suspend and kill operations. Thread signaling is not related to object signaling.
The target audience is kernel developers and anyone interested in understanding how suspend and kill operations work in the kernel.
Suspend and kill are operations that can be performed on threads. Both of these operations are asynchronous in that the caller must wait for the operation to complete. Inside the kernel, these operations are implemented as instance methods on the Thread struct:
Thread::Suspend
- Request a thread to suspend its execution until
it is resumed via Thread::Resume. Suspend is used to implement
debuggers. Once suspended, a thread's register state can be
read/written prior to resuming it. This operation is exposed to user
mode via zx_task_suspend()
.
Thread::Kill
- Request a thread to terminate itself. This
operation is not directly exposed to user mode. That is, attempting
to zx_task_kill()
a thread is an error. However, this operation
is indirectly exposed via process destruction, both voluntary and
involuntary.
Notice that both of these operations are described as requests. The caller is requesting that the target suspend or, in the case of kill, terminate its execution. The caller has no ability to forcibly suspend or terminate the target. While the target cannot refuse the request, it can delay action until the appropriate time and place. This is a key element of the design.
To understand why these operations are requests, consider the alternative of forcibly killing or suspending a thread. If a thread is forcibly killed while holding a resource (like a mutex) then it won't get the chance to free the resource before it's destroyed. You could end up with memory leaks, permanently locked locks, corrupted data structures, all sorts of bad stuff.
By modeling kill and suspend as requests that can only be performed by the target thread, we provide a way for the target to free its resources and perform any necessary cleanup before it stops executing, temporarily (in the case of suspend) or permanently (in the case of kill).
Before we cover how kill and suspend requests are issued, let's talk about the safety of thread termination.
There is one place where it's always safe for a thread to suspend or terminate its execution, the "edge" of the kernel, just before returning from the kernel back to user mode. Before returning to user mode, the thread unwinds its callstack, executing the destructors of any RAII objects. By the time it has reached the edge and is about to return to user mode, there will be nothing left on the kernel stack. It is here that a thread may safely suspend or terminate its execution.
Concretely, there are two safe points at which a thread may suspend or terminate. They are just before returning to user mode from a syscall and just before returning to user mode from an exception/fault/interrupt handler (exception handler, for short).
Note, exception handlers are not just invoked when executing in user mode. They can also be invoked when executing in kernel mode. When returning back to kernel mode it is not safe to suspend or terminate because the outer kernel mode context may still be holding a resource. In other words, an exception handler is only a safe point when it is triggered from a user mode context.
So we know that kill and suspend are merely requests and that it's up to the target thread to decide when and how to fulfill the request. We also know that the only safe places for a thread to suspend or terminate itself are at the edges of the kernel, just before returning to user mode. How do thread signals fit into all this?
Thread signals are the mechanism by which suspend and kill are requested. Each
Thread object has a field containing the set of asserted signals. There's a bit
for suspend, THREAD_SIGNAL_SUSPEND
, and a bit for kill, THREAD_SIGNAL_KILL
.
Requesting a thread to suspend or terminate is achieved by setting the appropriate bit on the target Thread object and then, depending on the target's state, poking it in some way to ensure it reaches a safe point in a timely fashion. The exact type of poke depends on the target thread's state: sleep/blocked, suspended, or running. Note, there are two flavors of sleeping/blocked, interruptible and uninterruptible. We'll focus on interruptible and ignore uninterruptible.
If the target thread is sleeping or blocked then by definition it's not running,
but it's in the kernel. Since only a running thread can check its signals we
must wake or unblock it. When a thread is unblocked or woken, it's given a
zx_status_t
. Usually the value is ZX_OK
or ZX_ERR_TIMED_OUT
. However
when waking a thread early like this we use a special zx_status_t
value,
ZX_ERR_INTERNAL_INTR_KILLED
in the case of a kill operation and
ZX_ERR_INTERNAL_INTR_RETRY
in the case of a suspend operation.
When a thread is woken/unblocked, it will see the zx_status_t
result and begin
backing out of the kernel, unwinding its stack. In general, any kernel function
returning one of the two special values will cause its caller to immediately
return, propagating that value.
Eventually, when the stack has unwound, the thread will be at the edge, a safe
point. It is here, just before returning to user mode, that the thread checks
its signals once more and acts on them by calling
arch_iframe_process_pending_signals()
or
x86_syscall_process_pending_signals()
.
Just like the sleeping/blocked case, the thread must resume execution in order
for it to be killed. In the case of kill, the thread will be unblocked with
ZX_ERR_INTERNAL_INTR_KILLED
and unwind until just before returning to user
mode where it acts on the signal.
The target thread could be running user code or kernel code. If it's running user code, then we'll need to force it to enter the kernel where it can check the signals field of its Thread struct. If it's running kernel code, then we'll have to trust that it will check for pending signals in a reasonable time frame.
The sender can't know if the target is in kernel mode or user mode so it behaves the same in either case. The sender sends an Inter-processor Interrupt (IPI) to the CPU on which the target is currently running. Part of the interrupt handlers job is to check for and optionally process pending signals.
If the handler was invoked in a user context, that is, the CPU was in user mode
at the time of the interrupt, then it's a safe point to suspend/terminate and
the handler will call arch_iframe_process_pending_signals()
.
If, however, the handler was invoked in a kernel context, then the handler will do nothing because it can't know the state of the thread at the point it was interrupted. It's not safe to suspend/terminate here. Instead, the handler will return to the kernel context from which it was invoked and rely on this outer context to eventually notice the signal and reach a safe point.
You may be wondering if the IPI is really necessary. There are two cases where
it's critical. The first is when the target thread is running in user mode and
simply not entering the kernel on its own. On a lightly loaded system with no
other interrupt traffic, a thread may not enter the kernel for extended periods
of time, or ever in the case of an infinite loop. We need the IPI in this case
to ensure the target thread observes and processes any pending signals in a
timely manner. The second is when the target thread is performing a long
running operation in the kernel, but not checking for pending signals. These
are rare, but do exist. The best example would be the execution of a guest OS
via zx_vcpu_enter()
. The interrupt would cause a VMEXIT back to the host
kernel where it can check for pending signals and unwind.
Let's walk through an example to see how this all works. Imagine thread A is
suspending thread B, as B is performing a zx_port_wait()
. Depending on
exactly when the operation is performed, we can end up in one of several
different scenarios. We'll examine each scenario briefly.
Thread A issues the suspend just before thread B begins its zx_port_wait()
syscall. Thread B is still in user mode and is running. Thread A sets thread
B's THREAD_SIGNAL_SUSPEND
bit and issues an IPI to thread B's current CPU.
Thread B's CPU takes the interrupt and calls the interrupt handler. Just before
returning back to user mode, thread B checks its pending signals. Seeing that
THREAD_SIGNAL_SUSPEND
is set, it suspends itself. Here's a sketch of thread
B's callstack:
suspend_self()
interrupt_handler()
---- interrupt ----
user code
Later on, after being resumed, thread B will return back to user mode as if nothing happened.
Thread A issues the suspend after thread B has entered the kernel to
perform a zx_port_wait()
syscall. Thread B is executing kernel
code and hasn't yet blocked. Just like Scenario 1, thread A issues an
IPI, which causes thread B to check for pending signals:
interrupt_handler()
---- interrupt ----
PortDispatcher::Dequeue()
sys_port_wait()
syscall_dispatch()
---- syscall ----
vdso
zx_port_wait()
user code
However, this time the interrupt handler sees that it was invoked in kernel
context rather than user context so it does not suspend itself. Instead it
returns back to the kernel context in which it was invoked. Thread B reaches
the core of the zx_port_wait()
operation, the point at which it will block if
there are no packets available. Thread B sees there are no packets available
and prepares to block:
WaitQueue::BlockEtcPreamble()
WaitQueue::BlockEtc()
PortDispatcher::Dequeue()
sys_port_wait()
syscall_dispatch()
---- syscall ----
vdso
zx_port_wait()
user code
Just before blocking, it checks for pending signals and sees that it has been
asked to suspend. Instead of blocking it returns ZX_ERR_INTERNAL_INTR_RETRY
and the callstack unwinds to the edge, just prior to returning to user mode:
WaitQueue::BlockEtcPreabmle() ZX_ERR_INTERNAL_INTR_RETRY
WaitQueue::BlockEtc() |
PortDispatcher::Dequeue() |
sys_port_wait() |
syscall_dispatch() V
---- syscall ----
vdso
zx_port_wait()
user code
Here the thread checks for pending signals and suspends itself. Upon being
resumed, the thread returns to user mode (to the vDSO) with the status result
ZX_ERR_INTERNAL_INTR_RETRY
. The vDSO has special logic for handling
syscalls that return ZX_ERR_INTERNAL_INTR_RETRY
, it simply reissues the
syscall with the original arguments:
suspend_self() ZX_ERR_INTERNAL_INTR_RETRY
syscall_dispatch() |
---- syscall ---- | A
vdso |______|
zx_port_wait()
user code
Thread A issues the suspend after thread B has entered the kernel and blocked,
waiting for a port packet. Thread A sees that thread B is blocked so it
unblocks thread B with the value ZX_ERR_INTERNAL_INTR_RETRY
. From this point
on the behavior matches that of Scenario 2. The call returns to user mode where
it is retried by the vDSO:
blocked ZX_ERR_INTERNAL_INTR_RETRY
WaitQueue::BlockEtcPostamble() |
WaitQueue::BlockEtc() |
PortDispatcher::Dequeue() |
sys_port_wait() |
syscall_dispatch() |
---- syscall ---- | A
vdso |______|
zx_port_wait()
user code
While thread B was blocked, waiting on a port packet, a packet arrived,
unblocking it (with ZX_OK
):
blocked ZX_OK
WaitQueue::BlockEtcPostamble() |
WaitQueue::BlockEtc() |
PortDispatcher::Dequeue() V
sys_port_wait()
syscall_dispatch()
---- syscall ----
vdso
zx_port_wait()
user code
Thread B is now unwinding toward user mode when thread A issues a suspend. Thread A sets the bit, see that thread B is marked as running so it sends an IPI. Similar to the "Suspend just before syscall" case, the interrupt handler executes:
interrupt_handler()
---- interrupt ----
PortDispatcher::Dequeue()
sys_port_wait()
syscall_dispatch()
---- syscall ----
vdso
zx_port_wait()
user code
However, this time it does not check for pending signals because the handler
interrupted kernel context rather than user context. The handler completes and
thread B continues to unwind. Eventually, thread B reaches the edge and is
about to return from the syscall to user mode. Here it checks for pending
signals, sees THREAD_SIGNAL_SUSPEND
and suspends itself:
suspend_self()
syscall_dispatch()
---- syscall ----
vdso
zx_port_wait()
user code
Upon being resumed, it will return to user mode with the status result that
unblocked it (ZX_OK
):
syscall_dispatch() ZX_OK
---- syscall ---- |
vdso V
zx_port_wait()
user code
The key points to take away are:
-
You can't forcibly suspend or kill a thread. You can only ask it to suspend or terminate itself.
-
Thread signals are the mechanism for asking a thread to suspend or terminate.
-
Threads should only suspend or terminate their execution at specific points within the kernel. In particular, a thread may only suspend or terminate execution when it holds no resources (e.g. locks) and is about to return from kernel mode to user mode.
-
In order to remain responsive, long running kernel operations must periodically check for pending signals and return if any are set.