Conversation
Member
|
@jaybosamiya-ms , it would be good to have you review the interface-level changes in this PR. |
|
🤖 SemverChecks 🤖 No breaking API changes detected Note: this does not mean API is unchanged, or even that there are no breaking changes; simply, none of the detections triggered. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR supports SIGINT/SIGALRM and syscall
alarm.On Linux, the workflow is the following:
prepare_to_run_guest) --> check TLS and queue signals --> process signalcheck_for_interrupt--> check TLS and queue signals --> process signalFor shim to be able to retrieve pending signals from platform, add a new
SignalProviderwith the interfacefn take_pending_signals(&self, mut f: impl FnMut(litebox::shim::Signal)).Additionally, to support
alarm, we need a mechanism to set up a timer with callback. I also added a new interfacefn schedule_interrupt(&self, thread: Option<&Self::ThreadHandle>, delay: core::time::Duration)toThreadProvider. For platform that does not implement it, the fallback solution is to check alarm just before switching back to the guest (which causes delay in delivering signals or never delivering signals at all if the program is being blocked and waiting for a signal).One tricky problem is that when a signal arrives, the running thread may be about to be blocked or blocked already (i.e., calling futex wait), and we don't interrupt a thread if it is running inside LiteBox code. In which case, we never have the chance to process that signal. It might be possible to examine the RIP of the thread in the signal handler and divert the execution to skip futex wait call it if RIP is close to it (e.g., use assembly code to build a critical region), but this may not work on Windows (where we don't call syscall instruction directly). The current solution I have is based on the observation that for all blocking operations (e.g., futex wait, sleep), they all boil down to
RawMutex::block_or_timeoutwith the per-threadWaitStateand expected valueThreadState::WAITING.WaitStatestores the thread status likeRUNNING_IN_HOST,WAITING,WOKEN, and etc. Thus, in the signal handler, besides recording the signal, we can update the thread state fromWAITINGtoWOKENand then callsfutex wake. If the thread is about to call futex wait, futex call would return immediately because of the change of the value. If the thread is being blocked, it would wake up and return. To accessWaitStatein the signal handler, I added two new interfaceson_wait_startandon_wait_endtoRawMutexso that platform call set and clearWaitStatein its TLS.The signal handlers for SIGNINT/SIGNALRM can be called on any threads. However, we don't want non-guest threads (e.g., background network worker, additional test thread spawned by
cargo test) to be called upon. For background network worker, we can block those signals. For test thread, the signal handler would check if it has our custom TLS. If not, it blocks those signals and raise them again.On Windows, the implementation is almost the same except that signal handler is called on a kernel thread (and thus cannot access litebox thread's TLS easily). I have to use a global variable to record all active threads.
I added some end-to-end C tests and Rust unit tests. To make Rust unit test work, I also introduced a new interface
fn run_test_thread<R>(f: impl FnOnce() -> R) -> RtoThreadProvider. This is because spawning a guest thread (viasys_clone) and a regular thread in test (via std::thread::spawn) have different routes and require different setup. For example, while running in test, there is no guest code, and thus we don't need to do fs/gs swap (hence no setup for one of fs/gs). To make the code work for both guest thread and non-guest thread, we can set gs == fs before running the test thread.