|
| 1 | +.. SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) |
| 2 | +
|
| 3 | +===================== |
| 4 | +BPF sk_lookup program |
| 5 | +===================== |
| 6 | + |
| 7 | +BPF sk_lookup program type (``BPF_PROG_TYPE_SK_LOOKUP``) introduces programmability |
| 8 | +into the socket lookup performed by the transport layer when a packet is to be |
| 9 | +delivered locally. |
| 10 | + |
| 11 | +When invoked BPF sk_lookup program can select a socket that will receive the |
| 12 | +incoming packet by calling the ``bpf_sk_assign()`` BPF helper function. |
| 13 | + |
| 14 | +Hooks for a common attach point (``BPF_SK_LOOKUP``) exist for both TCP and UDP. |
| 15 | + |
| 16 | +Motivation |
| 17 | +========== |
| 18 | + |
| 19 | +BPF sk_lookup program type was introduced to address setup scenarios where |
| 20 | +binding sockets to an address with ``bind()`` socket call is impractical, such |
| 21 | +as: |
| 22 | + |
| 23 | +1. receiving connections on a range of IP addresses, e.g. 192.0.2.0/24, when |
| 24 | + binding to a wildcard address ``INADRR_ANY`` is not possible due to a port |
| 25 | + conflict, |
| 26 | +2. receiving connections on all or a wide range of ports, i.e. an L7 proxy use |
| 27 | + case. |
| 28 | + |
| 29 | +Such setups would require creating and ``bind()``'ing one socket to each of the |
| 30 | +IP address/port in the range, leading to resource consumption and potential |
| 31 | +latency spikes during socket lookup. |
| 32 | + |
| 33 | +Attachment |
| 34 | +========== |
| 35 | + |
| 36 | +BPF sk_lookup program can be attached to a network namespace with |
| 37 | +``bpf(BPF_LINK_CREATE, ...)`` syscall using the ``BPF_SK_LOOKUP`` attach type and a |
| 38 | +netns FD as attachment ``target_fd``. |
| 39 | + |
| 40 | +Multiple programs can be attached to one network namespace. Programs will be |
| 41 | +invoked in the same order as they were attached. |
| 42 | + |
| 43 | +Hooks |
| 44 | +===== |
| 45 | + |
| 46 | +The attached BPF sk_lookup programs run whenever the transport layer needs to |
| 47 | +find a listening (TCP) or an unconnected (UDP) socket for an incoming packet. |
| 48 | + |
| 49 | +Incoming traffic to established (TCP) and connected (UDP) sockets is delivered |
| 50 | +as usual without triggering the BPF sk_lookup hook. |
| 51 | + |
| 52 | +The attached BPF programs must return with either ``SK_PASS`` or ``SK_DROP`` |
| 53 | +verdict code. As for other BPF program types that are network filters, |
| 54 | +``SK_PASS`` signifies that the socket lookup should continue on to regular |
| 55 | +hashtable-based lookup, while ``SK_DROP`` causes the transport layer to drop the |
| 56 | +packet. |
| 57 | + |
| 58 | +A BPF sk_lookup program can also select a socket to receive the packet by |
| 59 | +calling ``bpf_sk_assign()`` BPF helper. Typically, the program looks up a socket |
| 60 | +in a map holding sockets, such as ``SOCKMAP`` or ``SOCKHASH``, and passes a |
| 61 | +``struct bpf_sock *`` to ``bpf_sk_assign()`` helper to record the |
| 62 | +selection. Selecting a socket only takes effect if the program has terminated |
| 63 | +with ``SK_PASS`` code. |
| 64 | + |
| 65 | +When multiple programs are attached, the end result is determined from return |
| 66 | +codes of all the programs according to the following rules: |
| 67 | + |
| 68 | +1. If any program returned ``SK_PASS`` and selected a valid socket, the socket |
| 69 | + is used as the result of the socket lookup. |
| 70 | +2. If more than one program returned ``SK_PASS`` and selected a socket, the last |
| 71 | + selection takes effect. |
| 72 | +3. If any program returned ``SK_DROP``, and no program returned ``SK_PASS`` and |
| 73 | + selected a socket, socket lookup fails. |
| 74 | +4. If all programs returned ``SK_PASS`` and none of them selected a socket, |
| 75 | + socket lookup continues on. |
| 76 | + |
| 77 | +API |
| 78 | +=== |
| 79 | + |
| 80 | +In its context, an instance of ``struct bpf_sk_lookup``, BPF sk_lookup program |
| 81 | +receives information about the packet that triggered the socket lookup. Namely: |
| 82 | + |
| 83 | +* IP version (``AF_INET`` or ``AF_INET6``), |
| 84 | +* L4 protocol identifier (``IPPROTO_TCP`` or ``IPPROTO_UDP``), |
| 85 | +* source and destination IP address, |
| 86 | +* source and destination L4 port, |
| 87 | +* the socket that has been selected with ``bpf_sk_assign()``. |
| 88 | + |
| 89 | +Refer to ``struct bpf_sk_lookup`` declaration in ``linux/bpf.h`` user API |
| 90 | +header, and `bpf-helpers(7) |
| 91 | +<https://man7.org/linux/man-pages/man7/bpf-helpers.7.html>`_ man-page section |
| 92 | +for ``bpf_sk_assign()`` for details. |
| 93 | + |
| 94 | +Example |
| 95 | +======= |
| 96 | + |
| 97 | +See ``tools/testing/selftests/bpf/prog_tests/sk_lookup.c`` for the reference |
| 98 | +implementation. |
0 commit comments