-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a stall detector that logs stacktraces of unyielding tasks, redux #499
Commits on Jan 11, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 9605100 - Browse repository at this point
Copy the full SHA 9605100View commit details -
use the preempt timer duration as the stall detector threshold
We have a rudimentary stall detector that triggers if a queue doesn't yield within 5ms of being scheduled for execution. Using the preempt timer duration instead will limit the false positives we log (if the user specify a preempt timer of 100ms then they are fine with a task queue running for that much time.)
Configuration menu - View commit details
-
Copy full SHA for f60f577 - Browse repository at this point
Copy the full SHA f60f577View commit details -
Configuration menu - View commit details
-
Copy full SHA for 849ab2f - Browse repository at this point
Copy the full SHA 849ab2fView commit details -
asynchronously record stack traces when a task queue goes over budget
Knowing if a queue is stalling the reactor is nice, but finding the exact code location where the stall occurs remains very hard. Especially if a task queue hosts many concurrent fibers. To help with that, this commit introduces a stall detection mechanism that records stack traces of stalling tasks. It works as follows: * When a task queue is scheduled for execution, we set up a timer that triggers some time after the queue is expected to yield (we add a 10% error margin to avoid false positives). A background thread collocated with the local executor waits on the timer at all times. * When the timer fires, the thread sends a signal (SIGUSR1) to the local executor thread. Upon receiving the signal, the local executor records a complete trace of the local stack. Here we take advantage of the fact that by default, the kernel invokes signal handlers on top of the existing stack. i.e. the frames we record are those of the problematic user code that was meant to yield. The recorded frames are pushed on a non-blocking communication channel that links the signal handler and the local executor. * When a task queue yields, the local executor disarm the timer and checks the communication channel for potential recorded frames, if there are any then we can conclude that the queue stalled, so we log them. This code works in practice but has two major drawbacks: * The timer dance is expensive; expect a high number of syscall. Because of this runtime overhead, the stall detector is disabled by default. To opt-in, the feature `stall-detection` must be enabled at compile-time. * We log stalls only after the queue yield. Therefore, if there is a bug in your code and your queue never yields, the stall detector will never log the code location that's at fault (even though we probably have recorded the trace by then). The reason for this is that logging from a signal handler is illegal.
Configuration menu - View commit details
-
Copy full SHA for 75681e7 - Browse repository at this point
Copy the full SHA 75681e7View commit details
Commits on Jan 13, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 77662c4 - Browse repository at this point
Copy the full SHA 77662c4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9ae41f7 - Browse repository at this point
Copy the full SHA 9ae41f7View commit details -
Add trait-based handler, to allow for customizing signal
and trace collection eventually.
Configuration menu - View commit details
-
Copy full SHA for 77e005c - Browse repository at this point
Copy the full SHA 77e005cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 186a4d1 - Browse repository at this point
Copy the full SHA 186a4d1View commit details
Commits on Jan 14, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 647ce55 - Browse repository at this point
Copy the full SHA 647ce55View commit details -
Configuration menu - View commit details
-
Copy full SHA for aee9682 - Browse repository at this point
Copy the full SHA aee9682View commit details -
Configuration menu - View commit details
-
Copy full SHA for c24ad19 - Browse repository at this point
Copy the full SHA c24ad19View commit details
Commits on Jan 20, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 99237c1 - Browse repository at this point
Copy the full SHA 99237c1View commit details -
Refactor stall detector tests and add coverage
for checking that incoming signals match expected executor.
Configuration menu - View commit details
-
Copy full SHA for af66c8d - Browse repository at this point
Copy the full SHA af66c8dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8a1ca54 - Browse repository at this point
Copy the full SHA 8a1ca54View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5c7a31b - Browse repository at this point
Copy the full SHA 5c7a31bView commit details -
Configuration menu - View commit details
-
Copy full SHA for b479ab5 - Browse repository at this point
Copy the full SHA b479ab5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0721f20 - Browse repository at this point
Copy the full SHA 0721f20View commit details -
Make
StallDetector
completely self-containedby moving signal_id from `LocalExecutor`.
Configuration menu - View commit details
-
Copy full SHA for 0477a0b - Browse repository at this point
Copy the full SHA 0477a0bView commit details -
Configuration menu - View commit details
-
Copy full SHA for bdaf8ee - Browse repository at this point
Copy the full SHA bdaf8eeView commit details -
Rename
DefaultStallDetectionHandler
->LoggingStallDetectionHandler
and make all knobs configurable.
Configuration menu - View commit details
-
Copy full SHA for 04fddd6 - Browse repository at this point
Copy the full SHA 04fddd6View commit details
Commits on Jan 21, 2022
-
Configuration menu - View commit details
-
Copy full SHA for b06a369 - Browse repository at this point
Copy the full SHA b06a369View commit details -
Configuration menu - View commit details
-
Copy full SHA for 79bfca1 - Browse repository at this point
Copy the full SHA 79bfca1View commit details
Commits on Jan 24, 2022
-
Configuration menu - View commit details
-
Copy full SHA for dec6adb - Browse repository at this point
Copy the full SHA dec6adbView commit details
Commits on Jan 25, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 515781b - Browse repository at this point
Copy the full SHA 515781bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 65f305d - Browse repository at this point
Copy the full SHA 65f305dView commit details
Commits on Jan 26, 2022
-
Don't export
LocalExecutor::detect_stalls
for now;need to write tests to validate enabling/disabling at runtime.
Configuration menu - View commit details
-
Copy full SHA for 9be026d - Browse repository at this point
Copy the full SHA 9be026dView commit details