Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces a deduplication mechanism for BTF types.
Most of the time BTF types originate from a single spec where a compiler or other external tool has already ensured that types are unique. In such cases we can simply rely on pointer equality to determine if two types are the same.
When dealing with manually generated BTF or merging multiple BTF specs, duplicate types are common. Meaning we have multiple different go objects which represent the same underlying BTF type.
It is useful to be able to deduplicate these types, both to reduce the size of the resulting BTF, as well as allowing name based lookups after combining multiple specs with duplicate types.
The deduplication algorithm is loosely based on the one used in libbpf. This version for example does not do FWD type resolution, as that is only needed when combining BTF from multiple compilation units, which is something typically not seen in eBPF use-cases (only pahole). In the libbpf implementation the first step is string deduplication, however, we do this step during marshaling, and thus we do not deduplicate strings in the Go representation.
When a type is deduplicated, we try to deduplicate not just that root type, but the full subtree of types reachable from that type. We start by traversing all types in post-order, and any time there is an edge we try to replace that child with an equal type we have already seen.
Comparing every type with those seen before would be very expensive. So what we do is we compute a hash of each type. The hash is an approximation of all properties of the type, including recursively hashing child types. When using this hash as key in a map we end up with a set of candidate types which might be equal to the type we are currently deduplicating. We still need to do a full equality check to be sure two types are equal, both to avoid hash collisions as well as to compare properties which are not included in the hash (recursion limit). In practically every case a hash narrows down to 0 or 1 candidate types.
Once we have narrowed down to candidate types, we do a full equality check in which we walk the two types to be compared together in depth first manner, and bail out as soon as we find a difference. Since types can form cycles with pointers, we keep track of already visited types in the current equality check, and assume types are equal.
This deduplication mechanism can be used via a standalone function, but is also integrated in the BTF spec builder via a new method to add and deduplicate a types.
The implementation as it stands performs reasonable for its intended application:
So ~9ms for
struct sk_buffwhich has ~5800 types, and ~103ms for vmlinux which has ~93000 types. For smaller types such as map key and value types the cost of obviously way cheaper.