-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a way to measure and report IO latency #476
Conversation
c7c8e19
to
37e4f68
Compare
Is there any way we can make this something the user enable with a flag? For this, we should at least do it at executor creation time, but not unconditionally. Measuring time can be incredibly expensive. The rest of the code looks ok and ready to go. |
Fair point, I'll add an executor config to disable all that stuff! |
Add the plumbings necessary to measure the post-reactor scheduler delay, i.e., the time it takes for the scheduler to invoke the task that consumes the result of a fulfilled source. I suspect a bug in Glommio creates very high latency spikes; this commit is part of my effort to find it. There are three kinds of latency measurements I would like Glommio to have: * Pre-reactor delay: the delay between the moment an IO is scheduled and the moment it enters the ring; * IO latency: the time a source spends in the kernel; * Post-reactor delay: the delay this commit measures, as explained above.
Measure the time a source spends in the ring, including the submission queue.
We now export IO and scheduler latencies as distribution in the io stats. This commit adds distribution support to Glommio. We use "sketches" that give a statistical approximation of quantiles because we want the overhead (space and time) to be low when recording latencies. A side effect of this is that pulling the stats from the reactor now clears them (because a distribution needs to be cleared regularly to be useful). I don't expect this to be problematic because, in practice, users want rates from the stats and keep the previous values in memory to compute deltas. This should, therefore, greatly simplify this logic.
Not strictly necessary right now but consistency with the IO stats is desirable
Recording latency can be expensive so gate this feature behind a config knob that's disabled by default.
Greetings @HippoBaro! It looks like your PR added a new or changed an existing dependency, and CI has failed to validate your changes. Thank you! |
Add the plumbings necessary to measure various kinds of IO latency,
We now keep track of two kinds of IO latency:
come back at the tasks that consume the IO;
We now export them as distribution in the
IoStats
struct. And this PRadds distribution support to Glommio. We use "sketches" that give a
statistical approximation of quantiles because we want the overhead
(space and time) to be low when recording latencies.
A side effect of this is that pulling the stats from the reactor now
clears them (because a distribution needs to be cleared regularly to be
helpful). I don't expect this to be problematic because, in practice,
users want rates from the stats and keep the previous values in memory
to compute deltas. This should, therefore, greatly simplify this logic.