Skip to content

libbpf-tools: add tcpdrop to trace TCP packet drops #5329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Amaindex
Copy link
Contributor

Added tcpdrop tool, consisting of tcpdrop.bpf.c and tcpdrop.c, to trace TCP kernel-dropped packets using eBPF. Supports IPv4/IPv6 filtering and network namespace filtering, with output including timestamp, PID, IP addresses, ports, TCP state, and drop reason. Based on tcptop(8) from BCC.

Added tcpdrop tool, consisting of tcpdrop.bpf.c and tcpdrop.c, to
trace TCP kernel-dropped packets using eBPF. Supports IPv4/IPv6
filtering and network namespace filtering, with output including
timestamp, PID, IP addresses, ports, TCP state, and drop reason.
Based on tcptop(8) from BCC.

Signed-off-by: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Zi Li <zi.li@linux.dev>
Signed-off-by: Amaindex <amaindex@outlook.com>
@Amaindex
Copy link
Contributor Author

Hi @chenhengqi , we’ve got a C version of tcpdrop in this PR (#5329), sticking close to the Python version’s features and options. Could you take a peek when you’ve got a sec? Would love your thoughts :)

@Amaindex
Copy link
Contributor Author

Amaindex commented Jul 1, 2025

Hi @ekyooo and @chenhengqi ,

Thanks for the great feedback! I've made the following updates based on your suggestions:

  1. Switched to ksyms__load and ksyms__map_addr for symbol resolution in tcpdrop.c.
  2. Updated tcpdrop.bpf.c and tcpdrop.c to follow Linux kernel coding style.
  3. Improved IPv6 address handling with __u32 saddr_v6[4] and in6_u.u6_addr32 in both files.
  4. Removed bpf_printk debug statements from tcpdrop.bpf.c.
  5. Added /tcpdrop to .gitignore.
  6. Moved event struct to tcpdrop.h to avoid duplication.

Please take a look and let me know if there's anything else I can tweak!

@Amaindex
Copy link
Contributor Author

Amaindex commented Jul 1, 2025

Hi @chenhengqi ,

Regarding your suggestion to copy reason enums from the kernel for tcpdrop, we previously used this approach in tcpdrop.py. However, recent experience shows these enums vary across kernel versions and distros, and they're easy to verify. So, I think dynamic loading via parse_reason_enum is more robust. It might be good to update tcpdrop.py to match this approach for consistency. What do you think, or is there another way to handle this?

- Use ksyms__load and ksyms__map_addr for kernel symbol resolution.
- Follow Linux kernel coding style in tcpdrop.bpf.c and tcpdrop.c.
- Optimize IPv6 address handling with __u32 arrays and in6_u.u6_addr32.
- Remove bpf_printk debug statements from tcpdrop.bpf.c.
- Add /tcpdrop to .gitignore to exclude the binary.
- Define event struct in tcpdrop.h to prevent duplicate definitions.
- Check drop reason with bpf_core_field_exists in tcpdrop.bpf.c.

Signed-off-by: Zi Li <zi.li@linux.dev>
Signed-off-by: Amaindex <amaindex@outlook.com>
@chenhengqi
Copy link
Collaborator

Hi @chenhengqi ,

Regarding your suggestion to copy reason enums from the kernel for tcpdrop, we previously used this approach in tcpdrop.py. However, recent experience shows these enums vary across kernel versions and distros, and they're easy to verify. So, I think dynamic loading via parse_reason_enum is more robust. It might be good to update tcpdrop.py to match this approach for consistency. What do you think, or is there another way to handle this?

Do you have an example of these enums vary across kernel versions and distros ?
We have enum skb_drop_reason in vmlinux.h

@Amaindex
Copy link
Contributor Author

Amaindex commented Jul 3, 2025

Hi @chenhengqi ,
Regarding your suggestion to copy reason enums from the kernel for tcpdrop, we previously used this approach in tcpdrop.py. However, recent experience shows these enums vary across kernel versions and distros, and they're easy to verify. So, I think dynamic loading via parse_reason_enum is more robust. It might be good to update tcpdrop.py to match this approach for consistency. What do you think, or is there another way to handle this?

Do you have an example of these enums vary across kernel versions and distros ? We have enum skb_drop_reason in vmlinux.h

Take NETFILTER_DROP as an example. In kernel v5.15.186, as you can see in include/linux/skbuff.h, the skb_drop_reason enum lists NETFILTER_DROP as the 7th value (index 6):

enum skb_drop_reason {
    SKB_DROP_REASON_NOT_SPECIFIED,  /* 0 */
    SKB_DROP_REASON_NO_SOCKET,      /* 1 */
    SKB_DROP_REASON_PKT_TOO_SMALL,  /* 2 */
    SKB_DROP_REASON_TCP_CSUM,       /* 3 */
    SKB_DROP_REASON_SOCKET_FILTER,  /* 4 */
    SKB_DROP_REASON_UDP_CSUM,       /* 5 */
    SKB_DROP_REASON_NETFILTER_DROP, /* 6 */
    ...
};

This is reflected in the tracepoint format for /sys/kernel/debug/tracing/events/skb/kfree_skb/format, where NETFILTER_DROP is mapped to index 6 in the __print_symbolic output.

Now, fast forward to kernel v6.15.4, and things shift in include/net/dropreason-core.h. The skb_drop_reason enum has new entries, and NETFILTER_DROP moves to index 12:

enum skb_drop_reason {
    SKB_NOT_DROPPED_YET,           /* 0 */
    SKB_CONSUMED,                  /* 1 */
    SKB_DROP_REASON_NOT_SPECIFIED, /* 2 */
    SKB_DROP_REASON_NO_SOCKET,     /* 3 */
    SKB_DROP_REASON_SOCKET_CLOSE,  /* 4 */
    SKB_DROP_REASON_SOCKET_FILTER, /* 5 */
    SKB_DROP_REASON_SOCKET_RCVBUFF,/* 6 */
    SKB_DROP_REASON_UNIX_DISCONNECT,/* 7 */
    SKB_DROP_REASON_UNIX_SKIP_OOB, /* 8 */
    SKB_DROP_REASON_PKT_TOO_SMALL, /* 9 */
    SKB_DROP_REASON_TCP_CSUM,      /* 10 */
    SKB_DROP_REASON_UDP_CSUM,      /* 11 */
    SKB_DROP_REASON_NETFILTER_DROP,/* 12 */
    ...
};

The tracepoint format in v6.15.4 confirms this, with NETFILTER_DROP now at index 12 in the __print_symbolic output. This isn’t just a case of appending new values at the end—new entries like SKB_CONSUMED, SOCKET_CLOSE, SOCKET_RCVBUFF, etc., are inserted in the middle, shuffling the indices around.

Considering the skb_drop_reason index changes across kernel versions, parse_reason_enum for dynamic loading feels more adaptable than hardcoding the enums.

@chenhengqi
Copy link
Collaborator

Hi @chenhengqi ,
Regarding your suggestion to copy reason enums from the kernel for tcpdrop, we previously used this approach in tcpdrop.py. However, recent experience shows these enums vary across kernel versions and distros, and they're easy to verify. So, I think dynamic loading via parse_reason_enum is more robust. It might be good to update tcpdrop.py to match this approach for consistency. What do you think, or is there another way to handle this?

Do you have an example of these enums vary across kernel versions and distros ? We have enum skb_drop_reason in vmlinux.h

Take NETFILTER_DROP as an example. In kernel v5.15.186, as you can see in include/linux/skbuff.h, the skb_drop_reason enum lists NETFILTER_DROP as the 7th value (index 6):

enum skb_drop_reason {
    SKB_DROP_REASON_NOT_SPECIFIED,  /* 0 */
    SKB_DROP_REASON_NO_SOCKET,      /* 1 */
    SKB_DROP_REASON_PKT_TOO_SMALL,  /* 2 */
    SKB_DROP_REASON_TCP_CSUM,       /* 3 */
    SKB_DROP_REASON_SOCKET_FILTER,  /* 4 */
    SKB_DROP_REASON_UDP_CSUM,       /* 5 */
    SKB_DROP_REASON_NETFILTER_DROP, /* 6 */
    ...
};

This is reflected in the tracepoint format for /sys/kernel/debug/tracing/events/skb/kfree_skb/format, where NETFILTER_DROP is mapped to index 6 in the __print_symbolic output.

Now, fast forward to kernel v6.15.4, and things shift in include/net/dropreason-core.h. The skb_drop_reason enum has new entries, and NETFILTER_DROP moves to index 12:

enum skb_drop_reason {
    SKB_NOT_DROPPED_YET,           /* 0 */
    SKB_CONSUMED,                  /* 1 */
    SKB_DROP_REASON_NOT_SPECIFIED, /* 2 */
    SKB_DROP_REASON_NO_SOCKET,     /* 3 */
    SKB_DROP_REASON_SOCKET_CLOSE,  /* 4 */
    SKB_DROP_REASON_SOCKET_FILTER, /* 5 */
    SKB_DROP_REASON_SOCKET_RCVBUFF,/* 6 */
    SKB_DROP_REASON_UNIX_DISCONNECT,/* 7 */
    SKB_DROP_REASON_UNIX_SKIP_OOB, /* 8 */
    SKB_DROP_REASON_PKT_TOO_SMALL, /* 9 */
    SKB_DROP_REASON_TCP_CSUM,      /* 10 */
    SKB_DROP_REASON_UDP_CSUM,      /* 11 */
    SKB_DROP_REASON_NETFILTER_DROP,/* 12 */
    ...
};

The tracepoint format in v6.15.4 confirms this, with NETFILTER_DROP now at index 12 in the __print_symbolic output. This isn’t just a case of appending new values at the end—new entries like SKB_CONSUMED, SOCKET_CLOSE, SOCKET_RCVBUFF, etc., are inserted in the middle, shuffling the indices around.

Considering the skb_drop_reason index changes across kernel versions, parse_reason_enum for dynamic loading feels more adaptable than hardcoding the enums.

Sounds reasonable. I am OK with this approach.

Remove print_drop_reasons function and replace its call with a warning message
in main when parse_reason_enum fails.

Signed-off-by: Zi Li <zi.li@linux.dev>
Signed-off-by: Amaindex <amaindex@outlook.com>
@Amaindex
Copy link
Contributor Author

Hi @chenhengqi ,
I’ve removed print_drop_reasons and added a warning for parse failures in tcpdrop.c as you suggested, and reordered headers in tcpdrop.bpf.c to avoid compilation issues. Let me know if it looks good to go!

@chenhengqi
Copy link
Collaborator

Some comments are not resolved, please check.

…cpdrop

Move ipv4_only, ipv6_only, and netns_id to rodata section for better memory
management. Optimize tcpdrop.bpf.c by declaring variables upfront and
reordering operations for clarity. Update event struct to place stack_id
correctly. Fix missing newlines at file ends.

Signed-off-by: Zi Li <zi.li@linux.dev>
Signed-off-by: Amaindex <amaindex@outlook.com>
@Amaindex
Copy link
Contributor Author

Some comments are not resolved, please check.

Hi @chenhengqi ,
My apologies, I just saw these comments and have pushed the corresponding fixes.
Thank you for the detailed feedback. I learned a lot from your suggestions, and the patch is much better for it.

event->drop_reason = -1;
}

if (bpf_ringbuf_query(&events, BPF_RB_AVAIL_DATA) >= 511) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of bpf_ringbuf_query here ?

Comment on lines 94 to 106
protocol = args->protocol;
if (protocol != ETH_P_IP && protocol != ETH_P_IPV6) {
bpf_ringbuf_discard(event, 0);
return 0;
}
if (ipv4_only && protocol != ETH_P_IP) {
bpf_ringbuf_discard(event, 0);
return 0;
}
if (ipv6_only && protocol != ETH_P_IPV6) {
bpf_ringbuf_discard(event, 0);
return 0;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check these before bpf_ringbuf_reserve() so that we don't have to use many bpf_ringbuf_discard() in each branch.

Defer the BPF ring buffer event allocation in tcpdrop.bpf.c until all
preliminary checks are passed, reducing unnecessary discards and
improving performance. This ensures the event is only reserved when the
skb meets all processing conditions, minimizing resource waste.

Signed-off-by: Zi Li <zi.li@linux.dev>
Signed-off-by: Amaindex <amaindex@outlook.com>
@Amaindex
Copy link
Contributor Author

Hi @chenhengqi,
Thanks for the feedback. I’ve delayed the event allocation to cut down on unnecessary discards. Also, I removed the ringbuffer capacity check, which I overlooked from an earlier version, as it’s no longer needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants