-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use raw tracepoints for net_dev_queue and netif_receive_skb #26651
base: main
Are you sure you want to change the base?
Conversation
Test changes on VMUse this command from test-infra-definitions to manually test this PR changes on a VM: inv create-vm --pipeline-id=45516736 --os-family=ubuntu Note: This applies to commit 5cc3e71 |
Regression DetectorRegression Detector ResultsRun ID: 3c594d1b-e28c-4a08-9342-48dd0a23bd37 Metrics dashboard Target profiles Baseline: 772a730 Performance changes are noted in the perf column of each table:
No significant changes in experiment optimization goalsConfidence level: 90.00% There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
|
perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
---|---|---|---|---|---|---|
➖ | basic_py_check | % cpu utilization | +1.19 | [-1.44, +3.82] | 1 | Logs |
➖ | otel_to_otel_logs | ingress throughput | +0.30 | [-0.51, +1.10] | 1 | Logs |
➖ | pycheck_lots_of_tags | % cpu utilization | +0.18 | [-2.34, +2.69] | 1 | Logs |
➖ | idle | memory utilization | +0.11 | [+0.07, +0.16] | 1 | Logs |
➖ | uds_dogstatsd_to_api | ingress throughput | +0.00 | [-0.09, +0.09] | 1 | Logs |
➖ | tcp_dd_logs_filter_exclude | ingress throughput | +0.00 | [-0.01, +0.01] | 1 | Logs |
➖ | file_tree | memory utilization | -0.28 | [-0.37, -0.19] | 1 | Logs |
➖ | tcp_syslog_to_blackhole | ingress throughput | -0.30 | [-0.35, -0.24] | 1 | Logs |
➖ | idle_all_features | memory utilization | -0.43 | [-0.49, -0.36] | 1 | Logs |
➖ | uds_dogstatsd_to_api_cpu | % cpu utilization | -0.95 | [-1.68, -0.22] | 1 | Logs |
Bounds Checks
perf | experiment | bounds_check_name | replicates_passed |
---|---|---|---|
✅ | idle | memory_usage | 10/10 |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
@@ -93,6 +93,11 @@ func initManager(mgr *ddebpf.Manager, connCloseEventHandler ddebpf.EventHandler, | |||
mgr.Probes = append(mgr.Probes, p) | |||
} | |||
|
|||
// add Probe for net_dev_queue attached via raw tracepoint | |||
mgr.Probes = append(mgr.Probes, | |||
&manager.Probe{ProbeIdentificationPair: manager.ProbeIdentificationPair{EBPFFuncName: "raw_tracepoint__net__net_dev_queue", UID: probeUID}, TracepointName: "net_dev_queue", TracepointCategory: "net"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like TracepointCategory
is not necessary for raw tracepoints? https://github.com/DataDog/ebpf-manager/blob/f1cd7d97ecbabc5ead106fcb1c2397125c7fc388/raw_tp.go#L12-L15
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add parsing in ebpf-manager for raw tracepoints to extract the TracepointName
similar to how tracepoints are doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Ill make a PR for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're stating this is likely to create a cpu regression then please run load test and dogfooding
We need to asses if there is a regression, and if it is acceptable
pkg/network/ebpf/c/prebuilt/usm.c
Outdated
int raw_tracepoint__net__netif_receive_skb(void *ctx) { | ||
CHECK_BPF_PROGRAM_BYPASSED() | ||
log_debug("tracepoint/net/netif_receive_skb"); | ||
// flush batch to userspace | ||
// because perf events can't be sent from socket filter programs | ||
http_batch_flush(ctx); | ||
http2_batch_flush(ctx); | ||
terminated_http2_batch_flush(ctx); | ||
kafka_batch_flush(ctx); | ||
postgres_batch_flush(ctx); | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since that's just a duplication of tracepoint__net__netif_receive_skb__pre_4_17_0 - we should create an helper function to be used instead of manually ensuring both implementations match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment for runtime,c
/trigger-ci --variable RUN_ALL_BUILDS=true --variable RUN_KITCHEN_TESTS=true --variable RUN_E2E_TESTS=on --variable RUN_UNIT_TESTS=on --variable RUN_KMT_TESTS=on |
🚂 Gitlab pipeline started Started pipeline #36803767 |
@@ -1097,7 +1097,7 @@ static __always_inline struct sock *sk_buff_sk(struct sk_buff *skb) { | |||
return sk; | |||
} | |||
|
|||
static __always_inline int handle_net_dev_queue(struct sk_buff *skb) { | |||
static __always_inline int handle_net_dev_queue(struct sk_buff* skb) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: kernel formatting prefers the *
next to the variable name:
When declaring pointer data or a function that returns a pointer type, the preferred use of * is adjacent to the data name or function name and not adjacent to the type name
@@ -54,9 +55,9 @@ | |||
} \ | |||
\ | |||
if (use_ring_buffer) { \ | |||
perf_ret = bpf_ringbuf_output(&name##_batch_events, batch, sizeof(batch_data_t), 0);\ | |||
perf_ret = bpf_ringbuf_output_with_telemetry(&name##_batch_events, batch, sizeof(batch_data_t), 0);\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure USM is OK with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added for debugging purposes. Will remove
This reverts commit 387455c.
[Fast Unit Tests Report] On pipeline 45516736 (CI Visibility). The following jobs did not run any unit tests: Jobs:
If you modified Go files and expected unit tests to run in these jobs, please double check the job logs. If you think tests should have been executed reach out to #agent-devx-help |
What does this PR do?
This PR adds support to use
raw_tracepoints
fornet_dev_queue
andnetif_receive_skb
probes.Motivation
The recursion protection for kprobes and tracepoints is fairly heavy handed. It works by disallowing eBPF programs to nest on the same CPU, using a per-cpu variable bpf_prog_active.
For the two probes in question this is problematic because they run from within IRQ context, which means they can nest on to another eBPF program running from within a user context. This causes these probes to be skipped in such cases. Depending on the number of eBPF programs attached and the load on a system these misses can be very high.
Raw tracepoints on the other hand have more precise nesting control. An eBPF program attached to a raw tracepoint is skipped only if it is nesting a running instance of the same program. Converting these probes to use raw tracepoints instead can therefore completely eliminate misses due to nesting, since these only run from a single context.
Additional Notes
Statistics on misses of kprobes and raw tracepoints are exposed via recursion counts. These stats were added from kernel versions 6.7+.
To observe these misses happening, we can use KMT to run a fedora 40 VM (running kernel 6.8), run system-probe, generate load, and then use the eBPF check to get recursion counts for all attached programs.
Possible Drawbacks / Trade-offs
This change will cause regression in CPU usage for system-probe because, avoiding misses on
netif_receive_skb
results in more frequent flushes to happen. The flushes useperf_event_output
which can increase CPU usage.This may be mitigated by upcoming work allowing us to use the perf buffer to directly perform batching.
Describe how to test/QA your changes