Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mute CQEs of send/write to reduce wakeups #1264

Open
pyhd opened this issue Oct 11, 2024 · 5 comments
Open

Mute CQEs of send/write to reduce wakeups #1264

pyhd opened this issue Oct 11, 2024 · 5 comments

Comments

@pyhd
Copy link

pyhd commented Oct 11, 2024

wait_timeout(nr) is generally a good way to reduce wakeups from kernel, while CQEs of send/write can bring unnecessary "noise", especially from plenty of zero-copy. In essence, it is difficult to estimate when send/write will return, yet their CQEs are generally not latency sensitive. So I think a possible solution is to flag MUTE_SUCCESS in the SQE, then its CQE will not be counted as wakeable.

if (sq_ready) {
    submit_and_wait_timeout(nr, 1ms);
} else {
    if (inflight_sends)
        wait_timeout(1, 100ms);  // even if no wakeup CQEs, muted CQEs will still be reaped in a poll way.
    else
        wait(1); // if no pending send/write CQEs
}
@axboe
Copy link
Owner

axboe commented Oct 11, 2024

Yep this is not a bad idea, we've bounced around ideas for this very thing in the past as well. Send is a good example - generally they complete inline (eg immediatley), but it's not guaranteed. And while you don't need an immediate notification for them, generally you do want to see one so that you know the data it sent can get reused. Hence IOSQE_CQE_SKIP_SUCCESS isn't really useful for this case.

I think what we'd need is something like a low priority completion, in the sense that it doesn't need to wakeup the task waiting, but it should be included in the "I'm waiting for this number of events" accounting.

A quick work-around with the existing code may be to just discount the write/send in the wait_nr.

@axboe
Copy link
Owner

axboe commented Oct 14, 2024

@redbaron
Copy link

what if CQ is overflowing with now ignored CQEs and no wakeup worthy CQE has arrived?

@axboe
Copy link
Owner

axboe commented Oct 15, 2024

There are several conditions that would still cause it to wake, like a short send/write (or an error), and overflow would be another one. Didn't cover the overflow case, but that will be done too. Anything but a fully successful send with a normal CQE posting would wake things up, naturally.

@pyhd
Copy link
Author

pyhd commented Oct 15, 2024

@axboe

I think what we'd need is something like a low priority completion, in the sense that it doesn't need to wakeup the task waiting, but it should be included in the "I'm waiting for this number of events" accounting.

I suppose you want to put a backlog limit on ignorable events, but it will bring a new parameter to all existing wait_cqe variants. It might be a little confusing.

https://lore.kernel.org/io-uring/20241014205416.456078-1-axboe@kernel.dk/T/#m19db4fd576c4cf3c5a5ef3ea0b71e175a3574e15

Tossed out a suggestion for handling something like this.

I am afraid inline is not enough, because the number of inline is more predictable. On the other hand, async success and zc notifications are much out of our control, especially when inflight CQEs outnumber potential read/recv CQEs incredibly. Therefore, even if inline success can be ignored, the CQ ring may still be flooded by infight CQEs from previous rounds.

However, MUTE_SUCCESS could probably be less confusing. e.g. In a submit_wait_timeout(nr) syscall, the developer can expect explicitly nr incoming requests or errors, while any muted CQEs are just byproducts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants