Skip to content

Conversation

@dylandreimerink
Copy link
Member

@dylandreimerink dylandreimerink commented Nov 21, 2025

This PR introduces a deduplication mechanism for BTF types.

Most of the time BTF types originate from a single spec where a compiler or other external tool has already ensured that types are unique. In such cases we can simply rely on pointer equality to determine if two types are the same.

When dealing with manually generated BTF or merging multiple BTF specs, duplicate types are common. Meaning we have multiple different go objects which represent the same underlying BTF type.

It is useful to be able to deduplicate these types, both to reduce the size of the resulting BTF, as well as allowing name based lookups after combining multiple specs with duplicate types.

The deduplication algorithm is loosely based on the one used in libbpf. This version for example does not do FWD type resolution, as that is only needed when combining BTF from multiple compilation units, which is something typically not seen in eBPF use-cases (only pahole). In the libbpf implementation the first step is string deduplication, however, we do this step during marshaling, and thus we do not deduplicate strings in the Go representation.

When a type is deduplicated, we try to deduplicate not just that root type, but the full subtree of types reachable from that type. We start by traversing all types in post-order, and any time there is an edge we try to replace that child with an equal type we have already seen.

Comparing every type with those seen before would be very expensive. So what we do is we compute a hash of each type. The hash is an approximation of all properties of the type, including recursively hashing child types. When using this hash as key in a map we end up with a set of candidate types which might be equal to the type we are currently deduplicating. We still need to do a full equality check to be sure two types are equal, both to avoid hash collisions as well as to compare properties which are not included in the hash (recursion limit). In practically every case a hash narrows down to 0 or 1 candidate types.

Once we have narrowed down to candidate types, we do a full equality check in which we walk the two types to be compared together in depth first manner, and bail out as soon as we find a difference. Since types can form cycles with pointers, we keep track of already visited types in the current equality check, and assume types are equal.

This deduplication mechanism can be used via a standalone function, but is also integrated in the BTF spec builder via a new method to add and deduplicate a types.

The implementation as it stands performs reasonable for its intended application:

goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf/btf
cpu: 13th Gen Intel(R) Core(TM) i7-13800H
                      │   sec/op    │
DeduplicateSKBuff-20    9.253m ± 2%
DeduplicateVMLinux-20   103.5m ± 9%

                      │     B/op      │
DeduplicateSKBuff-20    2.744Mi ±  1%
DeduplicateVMLinux-20   30.02Mi ± 16%

                      │  allocs/op   │
DeduplicateSKBuff-20     318.0 ±  1%
DeduplicateVMLinux-20   8.065k ± 18%

So ~9ms for struct sk_buff which has ~5800 types, and ~103ms for vmlinux which has ~93000 types. For smaller types such as map key and value types the cost of obviously way cheaper.

This commit introduces a deduplication mechanism for BTF types.

Most of the time BTF types originate from a single spec where a compiler
or other external tool has already ensured that types are unique. In
such cases we can simply rely on pointer equality to determine if two
types are the same.

When dealing with manually generated BTF or merging multiple BTF
specs, duplicate types are common. Meaning we have multiple different
go objects which represent the same underlying BTF type.

It is useful to be able to deduplicate these types, both to reduce
the size of the resulting BTF, as well as allowing name based lookups
after combining multiple specs with duplicate types.

The deduplication algorithm is loosely based on the one used in libbpf.
This version for example does not do FWD type resolution, as that is
only needed when combining BTF from multiple compilation units, which
is something typically not seen in eBPF use-cases (only pahole).
In the libbpf implementation the first step is string deduplication,
however, we do this step during marshaling, and thus we do not
deduplicate strings in the Go representation.

When a type is deduplicated, we try to deduplicate not just that root
type, but the full subtree of types reachable from that type. We start
by traversing all types in post-order, and any time there is an edge
we try to replace that child with an equal type we have already seen.

Comparing every type with those seen before would be very expensive.
So what we do is we compute a hash of each type. The hash is an
approximation of all properties of the type, including recursively
hashing child types. When using this hash as key in a map we end up
with a set of candidate types which might be equal to the type we
are currently deduplicating. We still need to do a full equality check
to be sure two types are equal, both to avoid hash collisions as well
as to compare properties which are not included in the hash (recursion
limit). In practically every case a hash narrows down to 0 or 1
candidate types.

Once we have narrowed down to candidate types, we do a full equality
check in which we walk the two types to be compared together in depth
first manner, and bail out as soon as we find a difference.
Since types can form cycles with pointers, we keep track of already
visited types in the current equality check, and assume types are equal.

This deduplication mechanism can be used via a standalone function,
but is also integrated in the BTF spec builder via a new method to
add and deduplicate a types.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
@dylandreimerink dylandreimerink marked this pull request as ready for review November 24, 2025 13:28
@dylandreimerink dylandreimerink requested review from lmb and ti-mo November 24, 2025 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants