-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Description
Describe the bug
Linux DCO (2.6 and 2.7) is losing at least peer delete messages from the kernel (and maybe key swap and float in 2.7). This was uncovered working on #883 and realizing that didn't fully fix the issue.
In both OpenVPN 2.6 (after #883 is fixed) and OpenVPN 2.7 when Linux is running in DCO mode and Netlink messages are received they are processed by a callback that just set some flags on a single dco_context_t object. If or more messages arrive, that same callback just updates (and overwrites) the flags on that DCO object. Depending on how the Netlink receive was triggered, those flags may not even be processed (only the code in forward.c/multi.c call dco_do_read then handle the flags; stats calls in dco_linux.c don't do anything).
Consider this sequence - the event loop from multi.c calls:
multi_process_per_second_timers -> multi_process_per_second_timers_dowork -> multi_print_status which then jumps to linux_dco.c: dco_get_peer_stats_multi -> dco_get_peer -> ovpn_nl_msg_send -> ovpn_nl_recvmsgs -> nl_recvmsgs
At that point if there are multiple messages not received, the callback is called multiple times -- once per message. The issue is the callback (ovpn_handle_msg) calls e.g. ovpn_handle_peer_del_ntf (or ovpn_handle_peer_float_ntf or ovpn_handle_key_swap_ntf) and those all just set flags on the dco_context_t object. Those flags are normally handled by multi_process_incoming_dco (via switch (dco->dco_del_peer_reason)) but if you track the call chain above back through from the callback, multi_process_incoming_dco isn't anywhere in the process.
This is also an issue on the forward.c client side, but more between float/keyswap/del; multi.c is more of an issue since multiple del can come in at once and are much easier to overwrite each other.
To Reproduce
Run a server on Linux in DCO mode. Connect multiple clients and disconnect them all at the same time. Watch the logs at verb 4 or higher and look for the SIGTERM - not all peers will receive a disconnect message. Also watch the stats output for connected devices/throughput/public IPs/etc. and see not all devices disappear.
Expected behavior
All devices disconnect fully in user space.
Version information (please complete the following information):
- OS: Alma 9.6
- OpenVPN version: 2.6 head, 2.6 head with DCO Netlink Message Mix-up on 2.6 (Fixed in 2.7 with OpenVPN#793) #883 fixed, and 2.7 head