Use raw tracepoints for net_dev_queue and netif_receive_skb #26651

usamasaqib · 2024-06-12T13:25:59Z

What does this PR do?

This PR adds support to use raw_tracepoints for net_dev_queue and netif_receive_skb probes.

Motivation

The recursion protection for kprobes and tracepoints is fairly heavy handed. It works by disallowing eBPF programs to nest on the same CPU, using a per-cpu variable bpf_prog_active.

For the two probes in question this is problematic because they run from within IRQ context, which means they can nest on to another eBPF program running from within a user context. This causes these probes to be skipped in such cases. Depending on the number of eBPF programs attached and the load on a system these misses can be very high.

Raw tracepoints on the other hand have more precise nesting control. An eBPF program attached to a raw tracepoint is skipped only if it is nesting a running instance of the same program. Converting these probes to use raw tracepoints instead can therefore completely eliminate misses due to nesting, since these only run from a single context.

Additional Notes

Statistics on misses of kprobes and raw tracepoints are exposed via recursion counts. These stats were added from kernel versions 6.7+.

To observe these misses happening, we can use KMT to run a fedora 40 VM (running kernel 6.8), run system-probe, generate load, and then use the eBPF check to get recursion counts for all attached programs.

Possible Drawbacks / Trade-offs

This change will cause regression in CPU usage for system-probe because, avoiding misses on netif_receive_skb results in more frequent flushes to happen. The flushes use perf_event_output which can increase CPU usage.

This may be mitigated by upcoming work allowing us to use the perf buffer to directly perform batching.

Describe how to test/QA your changes

pr-commenter · 2024-06-12T13:36:53Z

Test changes on VM

Use this command from test-infra-definitions to manually test this PR changes on a VM:

inv create-vm --pipeline-id=45516736 --os-family=ubuntu

Note: This applies to commit 5cc3e71

pr-commenter · 2024-06-12T14:05:24Z

Regression Detector

Regression Detector Results

Run ID: 3c594d1b-e28c-4a08-9342-48dd0a23bd37 Metrics dashboard Target profiles

Baseline: 772a730
Comparison: 5cc3e71

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	basic_py_check	% cpu utilization	+1.19	[-1.44, +3.82]	1	Logs
➖	otel_to_otel_logs	ingress throughput	+0.30	[-0.51, +1.10]	1	Logs
➖	pycheck_lots_of_tags	% cpu utilization	+0.18	[-2.34, +2.69]	1	Logs
➖	idle	memory utilization	+0.11	[+0.07, +0.16]	1	Logs
➖	uds_dogstatsd_to_api	ingress throughput	+0.00	[-0.09, +0.09]	1	Logs
➖	tcp_dd_logs_filter_exclude	ingress throughput	+0.00	[-0.01, +0.01]	1	Logs
➖	file_tree	memory utilization	-0.28	[-0.37, -0.19]	1	Logs
➖	tcp_syslog_to_blackhole	ingress throughput	-0.30	[-0.35, -0.24]	1	Logs
➖	idle_all_features	memory utilization	-0.43	[-0.49, -0.36]	1	Logs
➖	uds_dogstatsd_to_api_cpu	% cpu utilization	-0.95	[-1.68, -0.22]	1	Logs

Bounds Checks

perf	experiment	bounds_check_name	replicates_passed
✅	idle	memory_usage	10/10

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

pkg/network/ebpf/c/tracer.c

pkg/network/tracer/connection/kprobe/config.go

pkg/network/tracer/connection/kprobe/manager.go

pkg/network/ebpf/c/tracer.c

pkg/network/ebpf/c/runtime/usm.c

pkg/network/ebpf/c/prebuilt/usm.c

brycekahle · 2024-06-12T20:28:27Z

pkg/network/tracer/connection/kprobe/manager.go

@@ -93,6 +93,11 @@ func initManager(mgr *ddebpf.Manager, connCloseEventHandler ddebpf.EventHandler,
 		mgr.Probes = append(mgr.Probes, p)
 	}

+	// add Probe for net_dev_queue attached via raw tracepoint
+	mgr.Probes = append(mgr.Probes,
+		&manager.Probe{ProbeIdentificationPair: manager.ProbeIdentificationPair{EBPFFuncName: "raw_tracepoint__net__net_dev_queue", UID: probeUID}, TracepointName: "net_dev_queue", TracepointCategory: "net"},


It looks like TracepointCategory is not necessary for raw tracepoints? https://github.com/DataDog/ebpf-manager/blob/f1cd7d97ecbabc5ead106fcb1c2397125c7fc388/raw_tp.go#L12-L15

Should we add parsing in ebpf-manager for raw tracepoints to extract the TracepointName similar to how tracepoints are doing?

Sounds good. Ill make a PR for that

pkg/network/tracer/connection/kprobe/config.go

pkg/network/ebpf/probes/probes.go

guyarb

Since you're stating this is likely to create a cpu regression then please run load test and dogfooding
We need to asses if there is a regression, and if it is acceptable

guyarb · 2024-06-13T04:18:25Z

pkg/network/ebpf/c/prebuilt/usm.c

+int raw_tracepoint__net__netif_receive_skb(void *ctx) {
+    CHECK_BPF_PROGRAM_BYPASSED()
+    log_debug("tracepoint/net/netif_receive_skb");
+    // flush batch to userspace
+    // because perf events can't be sent from socket filter programs
+    http_batch_flush(ctx);
+    http2_batch_flush(ctx);
+    terminated_http2_batch_flush(ctx);
+    kafka_batch_flush(ctx);
+    postgres_batch_flush(ctx);
+    return 0;


since that's just a duplication of tracepoint__net__netif_receive_skb__pre_4_17_0 - we should create an helper function to be used instead of manually ensuring both implementations match

same comment for runtime,c

…lable

usamasaqib · 2024-06-14T16:16:40Z

/trigger-ci --variable RUN_ALL_BUILDS=true --variable RUN_KITCHEN_TESTS=true --variable RUN_E2E_TESTS=on --variable RUN_UNIT_TESTS=on --variable RUN_KMT_TESTS=on

dd-devflow · 2024-06-14T16:17:15Z

🚂 Gitlab pipeline started

Started pipeline #36803767

brycekahle · 2024-06-20T17:30:58Z

pkg/network/ebpf/c/tracer.c

@@ -1097,7 +1097,7 @@ static __always_inline struct sock *sk_buff_sk(struct sk_buff *skb) {
    return sk;
 }

-static __always_inline int handle_net_dev_queue(struct sk_buff *skb) {
+static __always_inline int handle_net_dev_queue(struct sk_buff* skb) {


nit: kernel formatting prefers the * next to the variable name:

When declaring pointer data or a function that returns a pointer type, the preferred use of * is adjacent to the data name or function name and not adjacent to the type name

brycekahle · 2024-06-20T17:31:25Z

pkg/network/ebpf/c/protocols/events.h

@@ -54,9 +55,9 @@
                }                                                                                       \
                                                                                                        \
                if (use_ring_buffer) {                                                                  \
-                    perf_ret = bpf_ringbuf_output(&name##_batch_events, batch, sizeof(batch_data_t), 0);\
+                    perf_ret = bpf_ringbuf_output_with_telemetry(&name##_batch_events, batch, sizeof(batch_data_t), 0);\


Let's make sure USM is OK with this.

added for debugging purposes. Will remove

This reverts commit 387455c.

agent-platform-auto-pr · 2024-06-21T11:04:56Z

[Fast Unit Tests Report]

On pipeline 45516736 (CI Visibility). The following jobs did not run any unit tests:

Jobs:

tests_deb-arm64-py3
tests_deb-x64-py3
tests_flavor_dogstatsd_deb-x64
tests_flavor_heroku_deb-x64
tests_flavor_iot_deb-x64
tests_rpm-arm64-py3
tests_rpm-x64-py3
tests_windows-x64

If you modified Go files and expected unit tests to run in these jobs, please double check the job logs. If you think tests should have been executed reach out to #agent-devx-help

use raw tracepoints when available

e31b53e

usamasaqib added changelog/no-changelog team/ebpf-platform qa/done Skip QA week as QA was done before merge and regressions are covered by tests labels Jun 12, 2024

usamasaqib added this to the 7.56.0 milestone Jun 12, 2024

github-actions bot added the component/system-probe label Jun 12, 2024

use common function

a5e959b

usamasaqib marked this pull request as ready for review June 12, 2024 13:37

usamasaqib requested review from a team as code owners June 12, 2024 13:37

lint

456fced

usamasaqib added 2 commits June 12, 2024 16:30

inline helper

07cfd4b

rename struct

4024bba

hmahmood reviewed Jun 12, 2024

View reviewed changes

pkg/network/ebpf/c/tracer.c Outdated Show resolved Hide resolved

hmahmood approved these changes Jun 12, 2024

View reviewed changes

fix build error

374b4b8

nplanel reviewed Jun 12, 2024

View reviewed changes

pkg/network/tracer/connection/kprobe/config.go Outdated Show resolved Hide resolved

usamasaqib added 2 commits June 12, 2024 17:11

fix variable name

9c2fb1f

define bpf_raw_tracepoint_args for runtime compilation

2d47838

brycekahle requested changes Jun 12, 2024

View reviewed changes

guyarb requested changes Jun 13, 2024

View reviewed changes

usamasaqib added 7 commits June 13, 2024 11:16

address comments

fb6afbc

fix probe function names

d5d1bfa

fix runtime build error

920e17e

only define bpf_raw_tracepoint_args for kernel versions where unuavai…

156fee4

…lable

add section attribute

4d95e8d

use feature detection for deciding program type

e428a35

remove unused var

e3845cc

usamasaqib added 5 commits June 13, 2024 16:03

add section attribute in prebuilt files

6764a4f

exclude unused probe

d243b80

do not use BPF_PROG for tracepoint

4fe1328

use BPF_PROG for net_dev_queue raw_tp

4af8af1

Merge branch 'main' into usama.saqib/use-raw-tp

b43928c

brycekahle approved these changes Jun 14, 2024

View reviewed changes

usamasaqib added 3 commits June 18, 2024 14:44

Merge branch 'main' into usama.saqib/use-raw-tp

0559f09

Merge branch 'main' into usama.saqib/use-raw-tp

7531372

collect telemetry for buffers in flush events function

387455c

usamasaqib requested a review from a team as a code owner June 20, 2024 10:53

brycekahle reviewed Jun 20, 2024

View reviewed changes

brycekahle approved these changes Jun 20, 2024

View reviewed changes

Revert "collect telemetry for buffers in flush events function"

6e7da64

This reverts commit 387455c.

Pythyu modified the milestones: 7.56.0, 7.57.0 Jul 5, 2024

usamasaqib modified the milestones: 7.57.0, Triage Jul 15, 2024

brycekahle assigned guyarb Jul 24, 2024

gjulianm modified the milestones: Triage, 7.58.0 Aug 9, 2024

brycekahle modified the milestones: 7.58.0, 7.59.0 Sep 6, 2024

Merge branch 'main' into usama.saqib/use-raw-tp

5cc3e71

usamasaqib modified the milestones: 7.59.0, Triage Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use raw tracepoints for net_dev_queue and netif_receive_skb #26651

Use raw tracepoints for net_dev_queue and netif_receive_skb #26651

usamasaqib commented Jun 12, 2024 •

edited

Loading

pr-commenter bot commented Jun 12, 2024 •

edited

Loading

pr-commenter bot commented Jun 12, 2024 •

edited

Loading

Fine details of change detection per experiment

Explanation

brycekahle Jun 12, 2024

brycekahle Jun 12, 2024

usamasaqib Jun 13, 2024

guyarb left a comment

guyarb Jun 13, 2024

guyarb Jun 13, 2024

usamasaqib commented Jun 14, 2024

dd-devflow bot commented Jun 14, 2024

brycekahle Jun 20, 2024

brycekahle Jun 20, 2024

usamasaqib Jun 21, 2024

agent-platform-auto-pr bot commented Jun 21, 2024 •

edited

Loading

Use raw tracepoints for net_dev_queue and netif_receive_skb #26651

Are you sure you want to change the base?

Use raw tracepoints for net_dev_queue and netif_receive_skb #26651

Conversation

usamasaqib commented Jun 12, 2024 • edited Loading

What does this PR do?

Motivation

Additional Notes

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

pr-commenter bot commented Jun 12, 2024 • edited Loading

Test changes on VM

pr-commenter bot commented Jun 12, 2024 • edited Loading

Regression Detector

Regression Detector Results

No significant changes in experiment optimization goals

Fine details of change detection per experiment

Bounds Checks

Explanation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guyarb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

usamasaqib commented Jun 14, 2024

dd-devflow bot commented Jun 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agent-platform-auto-pr bot commented Jun 21, 2024 • edited Loading

usamasaqib commented Jun 12, 2024 •

edited

Loading

pr-commenter bot commented Jun 12, 2024 •

edited

Loading

pr-commenter bot commented Jun 12, 2024 •

edited

Loading

agent-platform-auto-pr bot commented Jun 21, 2024 •

edited

Loading