btf: Add type deduplication #1903

dylandreimerink · 2025-11-21T15:28:27Z

This PR introduces a deduplication mechanism for BTF types.

Most of the time BTF types originate from a single spec where a compiler or other external tool has already ensured that types are unique. In such cases we can simply rely on pointer equality to determine if two types are the same.

When dealing with manually generated BTF or merging multiple BTF specs, duplicate types are common. Meaning we have multiple different go objects which represent the same underlying BTF type.

It is useful to be able to deduplicate these types, both to reduce the size of the resulting BTF, as well as allowing name based lookups after combining multiple specs with duplicate types.

The deduplication algorithm is loosely based on the one used in libbpf. This version for example does not do FWD type resolution, as that is only needed when combining BTF from multiple compilation units, which is something typically not seen in eBPF use-cases (only pahole). In the libbpf implementation the first step is string deduplication, however, we do this step during marshaling, and thus we do not deduplicate strings in the Go representation.

When a type is deduplicated, we try to deduplicate not just that root type, but the full subtree of types reachable from that type. We start by traversing all types in post-order, and any time there is an edge we try to replace that child with an equal type we have already seen.

Comparing every type with those seen before would be very expensive. So what we do is we compute a hash of each type. The hash is an approximation of all properties of the type, including recursively hashing child types. When using this hash as key in a map we end up with a set of candidate types which might be equal to the type we are currently deduplicating. We still need to do a full equality check to be sure two types are equal, both to avoid hash collisions as well as to compare properties which are not included in the hash (recursion limit). In practically every case a hash narrows down to 0 or 1 candidate types.

Once we have narrowed down to candidate types, we do a full equality check in which we walk the two types to be compared together in depth first manner, and bail out as soon as we find a difference. Since types can form cycles with pointers, we keep track of already visited types in the current equality check, and assume types are equal.

This deduplication mechanism can be used via a standalone function, but is also integrated in the BTF spec builder via a new method to add and deduplicate a types.

The implementation as it stands performs reasonable for its intended application:

goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf/btf
cpu: 13th Gen Intel(R) Core(TM) i7-13800H
                      │   sec/op    │
DeduplicateSKBuff-20    9.253m ± 2%
DeduplicateVMLinux-20   103.5m ± 9%

                      │     B/op      │
DeduplicateSKBuff-20    2.744Mi ±  1%
DeduplicateVMLinux-20   30.02Mi ± 16%

                      │  allocs/op   │
DeduplicateSKBuff-20     318.0 ±  1%
DeduplicateVMLinux-20   8.065k ± 18%

So ~9ms for struct sk_buff which has ~5800 types, and ~103ms for vmlinux which has ~93000 types. For smaller types such as map key and value types the cost of obviously way cheaper.

This commit introduces a deduplication mechanism for BTF types. Most of the time BTF types originate from a single spec where a compiler or other external tool has already ensured that types are unique. In such cases we can simply rely on pointer equality to determine if two types are the same. When dealing with manually generated BTF or merging multiple BTF specs, duplicate types are common. Meaning we have multiple different go objects which represent the same underlying BTF type. It is useful to be able to deduplicate these types, both to reduce the size of the resulting BTF, as well as allowing name based lookups after combining multiple specs with duplicate types. The deduplication algorithm is loosely based on the one used in libbpf. This version for example does not do FWD type resolution, as that is only needed when combining BTF from multiple compilation units, which is something typically not seen in eBPF use-cases (only pahole). In the libbpf implementation the first step is string deduplication, however, we do this step during marshaling, and thus we do not deduplicate strings in the Go representation. When a type is deduplicated, we try to deduplicate not just that root type, but the full subtree of types reachable from that type. We start by traversing all types in post-order, and any time there is an edge we try to replace that child with an equal type we have already seen. Comparing every type with those seen before would be very expensive. So what we do is we compute a hash of each type. The hash is an approximation of all properties of the type, including recursively hashing child types. When using this hash as key in a map we end up with a set of candidate types which might be equal to the type we are currently deduplicating. We still need to do a full equality check to be sure two types are equal, both to avoid hash collisions as well as to compare properties which are not included in the hash (recursion limit). In practically every case a hash narrows down to 0 or 1 candidate types. Once we have narrowed down to candidate types, we do a full equality check in which we walk the two types to be compared together in depth first manner, and bail out as soon as we find a difference. Since types can form cycles with pointers, we keep track of already visited types in the current equality check, and assume types are equal. This deduplication mechanism can be used via a standalone function, but is also integrated in the BTF spec builder via a new method to add and deduplicate a types. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

dylandreimerink mentioned this pull request Nov 21, 2025

BPF map spec registry cilium/cilium#42427

Open

dylandreimerink force-pushed the feature/dedup-btf branch from 48f732b to 8e2a957 Compare November 21, 2025 15:37

dylandreimerink marked this pull request as ready for review November 24, 2025 13:28

dylandreimerink requested review from lmb and ti-mo November 24, 2025 13:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

btf: Add type deduplication #1903

btf: Add type deduplication #1903

dylandreimerink commented Nov 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

btf: Add type deduplication #1903

Are you sure you want to change the base?

btf: Add type deduplication #1903

Conversation

dylandreimerink commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dylandreimerink commented Nov 21, 2025 •

edited

Loading