Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{join,leave}_multicast_v{4,6} functions should probably be async #5674

Open
RReverser opened this issue May 7, 2023 · 9 comments
Open

{join,leave}_multicast_v{4,6} functions should probably be async #5674

RReverser opened this issue May 7, 2023 · 9 comments
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-net Module: tokio/net

Comments

@RReverser
Copy link

Currently multicast join/leave functions on the UdpSocket are defined as synchronous.

Those functions need to send MLD messages to the multicast group, which makes them perform network I/O, which is normally offloaded and exposed via async API in tokio, so it's somewhat surprising those few aren't.

Indeed, as any I/O, they can take considerable amount of time, especially in problematic cases like when the network takes a long time to respond. This is the order of timing I'm seeing on my system when joining multicast on all available network interfaces (something I need for the app I'm working on):

DEBUG    ┝━ join_multicast_group [ 125µs | 0.13% ]
DEBUG    │  ┕━ 🐛 [debug]:  | return: ()
DEBUG    ┝━ join_multicast_group [ 28.2µs | 0.03% ]
DEBUG    │  ┕━ 🐛 [debug]:  | return: ()
DEBUG    ┝━ join_multicast_group [ 61.4ms | 63.76% ]
ERROR    │  ┕━ 🚨 [error]:  | error: An unknown, invalid, or unsupported option or level was specified in a getsockopt or setsockopt call. (os error 10042)
DEBUG    ┝━ join_multicast_group [ 25.9µs | 0.03% ]
DEBUG    │  ┕━ 🐛 [debug]:  | return: ()
DEBUG    ┝━ join_multicast_group [ 12.9µs | 0.01% ]
DEBUG    │  ┕━ 🐛 [debug]:  | return: ()
DEBUG    ┝━ join_multicast_group [ 13.3µs | 0.01% ]
DEBUG    │  ┕━ 🐛 [debug]:  | return: ()
DEBUG    ┝━ join_multicast_group [ 21.3µs | 0.02% ]
DEBUG    │  ┕━ 🐛 [debug]:  | return: ()
DEBUG    ┝━ join_multicast_group [ 16.5µs | 0.02% ]
DEBUG    │  ┕━ 🐛 [debug]:  | return: ()
DEBUG    ┕━ join_multicast_group [ 12.9µs | 0.01% ]
DEBUG       ┕━ 🐛 [debug]:  | return: ()

Those are release timings, in debug mode they're a bit worse.

While blocking the main thread for 12-125µs can be acceptable with some stretch, the 61.4ms in the exceptional problematic case is definitely not, and, because all network interfaces on different machines are different, there's no way to predict how long the call is going to take and seems safer to always offload those to a dedicated I/O thread.

I realise changing that would be a breaking change, but given that I haven't seen previous discussion on this in the repo, I thought I'd bring them up. I think it's worth tracking this issue to change their signature (and implementation) in the next major release, and meanwhile at least add a note to the docs suggesting that those functions can be long-blocking and probably should be wrapped in the spawn_blocking API.

@RReverser RReverser added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels May 7, 2023
@Darksonn Darksonn added the M-net Module: tokio/net label May 8, 2023
@Darksonn
Copy link
Contributor

Darksonn commented May 8, 2023

If they perform IO, then I'm surprised they work at all, considering that the socket is set to non-blocking. I would expect them to fail with WouldBlock errors often.

@RReverser
Copy link
Author

That's a good point. I'm no expert, but from a quick read, apparently the idea is that OS should just put MLD messages in a queue for you when socket is in a non-blocking mode, so those functions would return w/o waiting for messages to be actually sent. At least in my case, I'm on Windows, so I wonder if its implementation differs and it still waits for the messages regardless of socket status.

@RReverser
Copy link
Author

apparently the idea is that OS should just put MLD messages in a queue for you when socket is in a non-blocking mode, so those functions would return w/o waiting for messages to be actually sent

Although I wonder if that also means that non-Windows OS don't actually report correct success/failure status from join_multicast_v6 if they don't wait for response to their MLD request. If so, that seems like another reason to make those async.

@Darksonn
Copy link
Contributor

Darksonn commented May 8, 2023

@Thomasdezeeuw Do you know any more about these methods?

@RReverser
Copy link
Author

RReverser commented May 8, 2023

Those functions need to send MLD messages to the multicast group

Self-correction: only IPv6 sends MLD messages, IPv4 uses IGMP. Not that it matters much for the overall issues as both methods still need to send messages IIUC.

@Thomasdezeeuw
Copy link
Contributor

I'm not aware any OS provides an async interface for this, I think all OSs use setsockopt(2) for this, which is synchronous. Maybe io_uring will provide an interface for setsockopt at some point, but it doesn't at this time.

@RReverser
Copy link
Author

Thanks for confirmation. I googled around and also couldn't find any async way to retrieve their status.

Given that those APIs are currently blocking and might block for a long time, do you think this suggestion

and probably should be wrapped in the spawn_blocking API

makes sense?

At least for the initial implementation, Tokio could wrap them into spawn_blocking and switch to native async API if/when it's ever implemented.

@Darksonn
Copy link
Contributor

Darksonn commented May 8, 2023

We can't really wrap them in spawn_blocking without deprecating them and providing an async version under a new name.

@RReverser
Copy link
Author

RReverser commented May 8, 2023

We can't really wrap them in spawn_blocking without deprecating them and providing an async version under a new name.

Sure, I understand that. As I mentioned in the issue description, I realise this would be a breaking change, although it seems like a worthwhile long-term.

For now either adding them as new methods as you say or even just

at least add a note to the docs suggesting that those functions can be long-blocking and probably should be wrapped in the spawn_blocking API

would be helpful I think for those who, like me, didn't realise that those functions might block for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-net Module: tokio/net
Projects
None yet
Development

No branches or pull requests

3 participants