Skip to content

floating IP due to "route VPN_IP net_gateway" causes 60 second "Disallow float" in openvpn 2.6 #704

@wdoekes

Description

@wdoekes

Describe the bug

OpenVPN unjustly blocks a source IP switch immediately after connection setup.

  • We're using a (different) VPN (main) with a default gateway;
  • we connect to the target VPN (3.3.3.3) with source IP 2.2.2.2;
  • once connected to targetVPN, targetVPN pushes its own IP 3.3.3.3 with net_gateway so we don't get VPN-in-VPN;
  • this is detected as a floating IP by openvpn.

With openvpn 2.5, this works flawlessly.

But with openvpn 2.6, it's counted as a second connection, and we get Disallow float to an address taken by another client 1.1.1.1:sourcePort. This lasts for 60 seconds until client-instance restarting, after which the second connection is finally allowed.

During these 60 seconds, all traffic to/through targetVPN is disallowed.

To Reproduce

  • Take openvpn 2.6, set verb 5, add push "route 3.3.3.3 255.255.255.255 net_gateway".
  • Connect to a different VPN with a default gateway first (with IP 2.2.2.2).
  • Then connect to targetVPN.
  • When the net_gateway rule is installed, the apparent source IP changes from 2.2.2.2 to 1.1.1.1.
  • Watch how the following messages appear:
Connection Attempt MULTI: multi_create_instance called
2.2.2.2:34817 Re-using SSL/TLS context
...
2.2.2.2:34817 peer info: IV_VER=2.5.11
...
2.2.2.2:34817 [theCommonName] Peer Connection Initiated with [AF_INET]2.2.2.2:34817
...
SENT CONTROL [theCommonName]: 'PUSH_REPLY,route 3.3.3.3 255.255.255.255 net_gateway, ... ,peer-id 2,cipher AES-256-GCM' (status=1)
MULTI: multi_create_instance called
1.1.1.1:34817 Re-using SSL/TLS context
...
Float requested for peer 2 to 1.1.1.1:34817
Disallow float to an address taken by another client 1.1.1.1:34817
...

This continues for 60s.

...
Float requested for peer 2 to 1.1.1.1:34817
Disallow float to an address taken by another client 1.1.1.1:34817
1.1.1.1:34817 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
1.1.1.1:34817 TLS Error: TLS handshake failed
1.1.1.1:34817 SIGUSR1[soft,tls-error] received, client-instance restarting
Float requested for peer 2 to 1.1.1.1:34817
peer 2 (theCommonName) floated from 2.2.2.2:34817 to [AF_INET]1.1.1.1:34817

And then finally the VPN connection works.

Expected behavior

We shouldn't have to wait 60 seconds.

With openvpn 2.5 (server), we don't see the second MULTI: multi_create_instance called in the logs. And I believe that causes the second connection not to be added to m->hash. And because it's not in m->hash, we do not end up here:

    /* make sure that we don't float to an address taken by another client */  
    struct hash_element *he = hash_lookup_fast(hash, bucket, &real, hv);
    if (he)
    {
...
        /* do not float if target address is taken by client with another cert */
        if (!cert_hash_compare(m1->locked_cert_hash_set, m2->locked_cert_hash_set))
        {
            msg(D_MULTI_LOW, "Disallow float to an address taken by another client %s",
                multi_instance_string(ex_mi, false, &gc));

(Alternatively it does end up in m->hash, but the locked_cert_hash_set is then equal.)

I did not figure out exactly what changed between 2.5 and 2.6 that causes us to get in the "Disallow float" state. More concise connection setup? But it does seem to cause the problem.

Version information (please complete the following information):

  • OS: Ubuntu 24.04
  • OpenVPN version: 2.6.12-0ubuntu0.24.04.1
  • For the client, I tried both 2.5.11-0ubuntu0.22.04.1 and 2.6.12-0ubuntu0.24.04.1.

Additional context

The following patch creates this log and fixes the problem:

peer 2 (theCommonName) floating from 2.2.2.2:34817 to [AF_INET]1.1.1.1:34817 (m2 still setting up) state=8/0
--- a/src/openvpn/multi.c
+++ b/src/openvpn/multi.c
@@ -3159,7 +3159,16 @@ multi_process_float(struct multi_context
         struct tls_multi *m2 = ex_mi->context.c2.tls_multi;
 
         /* do not float if target address is taken by client with another cert */
-        if (!cert_hash_compare(m1->locked_cert_hash_set, m2->locked_cert_hash_set))
+        if (m1->locked_cert_hash_set && !m2->locked_cert_hash_set)
+        {
+            msg(M_INFO, "peer %" PRIu32 " (%s) floating from %s to %s (m2 still setting up) state=%d/%d",
+                m1->peer_id,
+                tls_common_name(m1, false),
+                mroute_addr_print(&mi->real, &gc),
+                print_link_socket_actual(&m->top.c2.from, &gc),
+                m1->multi_state, m2->multi_state);
+        }
+        else if (!cert_hash_compare(m1->locked_cert_hash_set, m2->locked_cert_hash_set))
         {
             msg(D_MULTI_LOW, "Disallow float to an address taken by another client %s",
                 multi_instance_string(ex_mi, false, &gc));

I guess we could check for if (m1->locked_cert_hash_set && !m2->locked_cert_hash_set && m2->multi_state == CAS_NOT_CONNECTED) to be more strict.

Let me know if this is the right approach.

Cheers,
Walter Doekes
OSSO B.V.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions