Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add TLS tracing source debugging mode and associated feature flag (#1652
) Summary: Add TLS tracing source debugging mode and associated feature flag This debugging feature flag is intended for adhoc investigations for determining why a particular TLS source (libpython, statically linked open/boringssl, OpenSSL v1.x, etc) was traced for a given protocol. BoringSSL tracing was recently rolled out and it appeared this functionality was covering every protocol that Pixie supports. This was unexpected based on the open source / popular projects we believe should be covered by this tracing. As I tested this change, I found a bug in how a TLS source is attributed. The libpython TLS source was identified as statically linked (see Test Plan below for details). I believe this is likely the discrepancy causing the statically linked source to show up for every supported protocol. That bug will be addressed in a follow up change. When reviewing the Test Plan, please note that the changes in 5a23004 were needed to produce the prometheus output and to verify that the TLS source misattribution is present. Relevant Issues: #692 Type of change: /kind feature Test Plan: Verified the following - [x] Simulating the mismatched fd case still causes the `DCHECK` to occur <details><summary>openssl_trace_bpf_test with mismatched fd error</summary> ``` # Introduce change to simulate mismatched fd case. The openssl_trace_bpf_test cases are known to require calling send/read multiple times for the same fd. This change will cause the DCHECK to occur on the second read/write call -- $ git diff diff --git a/src/stirling/source_connectors/socket_tracer/bcc_bpf/socket_trace.c b/src/stirling/source_connectors/socket_tracer/bcc_bpf/socket_trace.c index ba5df955dd..e1c64a1a7d 100644 --- a/src/stirling/source_connectors/socket_tracer/bcc_bpf/socket_trace.c +++ b/src/stirling/source_connectors/socket_tracer/bcc_bpf/socket_trace.c @@ -185,7 +185,7 @@ static __inline void propagate_fd_to_user_space_call(uint64_t pid_tgid, int fd) int current_fd = nested_syscall_fd_ptr->fd; if (current_fd == kInvalidFD) { nested_syscall_fd_ptr->fd = fd; - } else if (current_fd != fd) { + } else /*if (current_fd != fd)*/ { // Found two different fds during a single SSL_write/SSL_read call. This invalidates // our tls tracing assumptions and must be recorded. nested_syscall_fd_ptr->mismatched_fds = true; # Verify that the mismatched fd case still causes a crash from the DCHECK $ ./scripts/sudo_bazel_run.sh -c dbg src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test INFO: Invocation ID: 7c4390fd-e585-49fe-ac2a-22e17122567e INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/7c4390fd-e585-49fe-ac2a-22e17122567e INFO: Analyzed target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test (0 packages loaded, 0 targets configured). INFO: Found 1 target... Target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test up-to-date: bazel-bin/src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test INFO: Elapsed time: 3.826s, Critical Path: 3.39s INFO: 4 processes: 1 internal, 3 linux-sandbox. INFO: Running command line: external/bazel_tools/tools/test/test-setup.sh /bin/bash -c '"$@"' /bin/bash sudo src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/7c4390fd-e585-49fe-ac2a-22e17122567e INFO: Build completed successfully, 4 total actions exec ${PAGER:-/usr/bin/less} "$0" \|\| exit 1 Executing tests from //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test ----------------------------------------------------------------------------- I20230804 14:55:53.269637 3497837 env.cc:47] Started: src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test [==========] Running 18 tests from 6 test suites. [----------] Global test environment set-up. [----------] 3 tests from OpenSSLTraceTest/0, where TypeParam = px::stirling::NginxOpenSSL_1_1_0_ContainerWrapper [ RUN ] OpenSSLTraceTest/0.ssl_capture_curl_client I20230804 14:55:53.423983 3497837 container_runner.cc:36] Loaded image: localhost/bazel/src/stirling/source_connectors/socket_tracer/testing/containers:nginx_openssl_1_1_0_image I20230804 14:55:53.424044 3497837 container_runner.cc:114] podman run --timeout=3600 --rm -q --pid=host --name=nginx_6373812169417785 localhost/bazel/src/stirling/source_connectors/socket_tracer/testing/containers:nginx_openssl_1_1_0_image I20230804 14:55:53.664819 3497837 container_runner.cc:144] Container nginx_6373812169417785 status: running I20230804 14:55:53.698279 3497837 container_runner.cc:175] Container nginx_6373812169417785 process PID: 3497990 I20230804 14:55:53.698315 3497837 container_runner.cc:177] Container nginx_6373812169417785 waiting for log message: I20230804 14:55:53.732992 3497837 container_runner.cc:189] Container nginx_6373812169417785 status: running I20230804 14:55:53.733016 3497837 container_runner.cc:225] Container nginx_6373812169417785 is ready. I20230804 14:55:54.733592 3497837 linux_headers.cc:211] Found Linux kernel version using .note section. I20230804 14:55:54.733633 3497837 source_connector.cc:35] Initializing source connector: socket_trace_connector I20230804 14:55:54.733664 3497837 linux_headers.cc:94] Obtained Linux version string from `uname`: 5.19.0-1022-gcp I20230804 14:55:54.733670 3497837 linux_headers.cc:642] Detected kernel release (uname -r): 5.19.0-1022-gcp I20230804 14:55:54.733700 3497837 bcc_wrapper.cc:121] Using linux headers found at /lib/modules/5.19.0-1022-gcp/build for BCC runtime. I20230804 14:55:54.733760 3497837 bcc_wrapper.cc:170] Initializing BPF program ... I20230804 14:56:04.312705 3497837 scoped_timer.h:48] Timer(init_bpf_program) : 9.58 s I20230804 14:56:05.196256 3497837 socket_trace_connector.cc:437] Number of kprobes deployed = 40 I20230804 14:56:05.196297 3497837 socket_trace_connector.cc:438] Probes successfully deployed. I20230804 14:56:05.196377 3497837 socket_trace_connector.cc:373] Initializing perf buffers with ncpus=96 and scaling_factor=0.0865385 I20230804 14:56:05.196416 3497837 socket_trace_connector.cc:362] Total perf buffer usage for kData buffers across all cpus: 268038720 I20230804 14:56:05.196425 3497837 socket_trace_connector.cc:362] Total perf buffer usage for kControl buffers across all cpus: 13353216 I20230804 14:56:05.304798 3497837 socket_trace_connector.cc:442] Number of perf buffers opened = 8 W20230804 14:56:12.303573 3498584 uprobe_manager.cc:852] Cannot analyze binary /proc/3498580/root/usr/bin/python3.10 for uprobe deployment. If file is under /var/lib, container may have terminated. Message = Can't find or process ELF file /proc/3498580/root/usr/bin/python3.10 I20230804 14:56:14.672582 3498584 uprobe_manager.cc:1000] Number of uprobes deployed = 11866 F20230804 14:56:14.748164 3498588 socket_trace_connector.cc:701] Check failed: error_code == kOpenSSLTraceOk (1 vs. 0) *** Check failure stack trace: *** ``` </details> - [x] Enabling tls source debugging on 72075b6 shows the `kLibPythonSource` tls source incorrectly attributed as `kStaticallyLinkedSource` <details><summary>openssl_trace_bpf_test prometheus metrics output</summary> ``` $ git show commit 2def9a3 (HEAD, ddelnano/ddelnano/add-tls-tracing-debug-feature-flag) Author: Dom Del Nano <ddelnano@pixielabs.ai> Date: Thu Aug 3 15:35:52 2023 +0000 Add TLS tracing source debugging mode Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai> $ ./scripts/sudo_bazel_run.sh -c dbg src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test --test_arg='--gtest_filter=OpenSSLTraceTest/3.ssl_capture_curl_client' INFO: Invocation ID: 566f27c6-208f-4f86-83bb-b5e2a214f81b INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/566f27c6-208f-4f86-83bb-b5e2a214f81b INFO: Analyzed target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test (0 packages loaded, 0 targets configured). INFO: Found 1 target... Target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test up-to-date: bazel-bin/src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test INFO: Elapsed time: 4.238s, Critical Path: 3.57s INFO: 3 processes: 1 remote cache hit, 1 internal, 1 linux-sandbox. INFO: Running command line: external/bazel_tools/tools/test/test-setup.sh /bin/bash -c '"$@"' /bin/bash sudo src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test '--gtest_filter=OpenSSLTraceTest/3.ssl_capture_curl_client' INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/566f27c6-208f-4f86-83bb-b5e2a214f81b INFO: Build completed successfully, 3 total actions exec ${PAGER:-/usr/bin/less} "$0" \|\| exit 1 Executing tests from //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test ----------------------------------------------------------------------------- I20230804 14:29:07.070752 3476002 env.cc:47] Started: src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test Note: Google Test filter = OpenSSLTraceTest/3.ssl_capture_curl_client [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from OpenSSLTraceTest/3, where TypeParam = px::stirling::Python310ContainerWrapper [ RUN ] OpenSSLTraceTest/3.ssl_capture_curl_client [ ... ] # HELP openssl_tls_source_debug Records the number of times a protocol was traced along with additional debugging information # TYPE openssl_tls_source_debug counter openssl_tls_source_debug{exe="python3",name="openssl_tls_source_debug",protocol="kProtocolHTTP",ssl_source="kStaticallyLinkedSource"} 1 # HELP java_proc_crashed_during_attach Count of Java process crashes during symbolization agent attach. # TYPE java_proc_crashed_during_attach counter java_proc_crashed_during_attach{name="java_proc_crashed_during_attach"} 0 # HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed. # TYPE data_loss_bytes counter data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kStaticallyLinkedSource"} 2813 data_loss_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 90 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 0 # HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol. # TYPE conn_stats_bytes counter conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 11156 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kStaticallyLinkedSource"} 3650 conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 132148 ``` </details> - [x] Enabling tls source debugging on [e0b792d](e0b792d) shows the `kLibPythonSource` tls source correctly attributed. This will be fixed in a follow up PR but proves that this debug information has tracked down the issue. <details><summary>openssl_trace_bpf_test prometheus metrics output</summary> ``` $ git show -- commit 1c9200932f89e6bbf530f02fd607d5008aac1c44 (HEAD -> ddelnano/add-tls-tracing-debug-feature-flag) Author: Dom Del Nano <ddelnano@pixielabs.ai> Date: Thu Aug 3 23:23:11 2023 +0000 Fix bug where non statically linked binaries were attributed as statically link Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai> diff --git a/src/stirling/source_connectors/socket_tracer/uprobe_manager.cc b/src/stirling/source_connectors/socket_tracer/uprobe_manager.cc index 76850c250e..50ba85fc5b 100644 --- a/src/stirling/source_connectors/socket_tracer/uprobe_manager.cc +++ b/src/stirling/source_connectors/socket_tracer/uprobe_manager.cc @@ -654,10 +654,10 @@ int UProbeManager::DeployOpenSSLUProbes(const absl::flat_hash_set<md::UPID>& pid // before the BPF map is updated. This value is cleaned up when the upid is // terminated, so if attachment fails it will be deleted prior to the pid being // reused. - openssl_source_map_->UpdateValue(pid.pid(), kStaticallyLinkedSource); count_or = AttachOpenSSLUProbesOnStaticBinary(pid.pid()); - if (count_or.ok()) { + if (count_or.ok() && count_or.ValueOrDie() > 0) { + openssl_source_map_->UpdateValue(pid.pid(), kStaticallyLinkedSource); uprobe_count += count_or.ValueOrDie(); VLOG(1) << absl::Substitute( $ ./scripts/sudo_bazel_run.sh -c dbg src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test --test_arg='--gtest_filter=OpenSSLTraceTest/3.ssl_capture_curl_client' Starting local Bazel server and connecting to it... INFO: Invocation ID: 2d279cd6-45f7-4939-92d2-9af76a262954 INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/2d279cd6-45f7-4939-92d2-9af76a262954 INFO: Analyzed target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test (368 packages loaded, 38224 targets configured). INFO: Found 1 target... Target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test up-to-date: bazel-bin/src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test INFO: Elapsed time: 8.733s, Critical Path: 0.90s INFO: 1 process: 1 internal. INFO: Running command line: external/bazel_tools/tools/test/test-setup.sh /bin/bash -c '"$@"' /bin/bash sudo src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test '--gtest_filter=OpenSSLTraceTest/3.ssl_capture_curl_client' INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/2d279cd6-45f7-4939-92d2-9af76a262954 INFO: Build completed successfully, 1 total action exec ${PAGER:-/usr/bin/less} "$0" \|\| exit 1 Executing tests from //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test ----------------------------------------------------------------------------- I20230804 14:26:54.002213 3474370 env.cc:47] Started: src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test Note: Google Test filter = OpenSSLTraceTest/3.ssl_capture_curl_client [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from OpenSSLTraceTest/3, where TypeParam = px::stirling::Python310ContainerWrapper [ RUN ] OpenSSLTraceTest/3.ssl_capture_curl_client I20230804 14:26:54.321714 3474370 container_runner.cc:36] Loaded image: localhost/bazel/src/stirling/source_connectors/socket_tracer/testing/containers/ssl:python_min_310_https_server [ ... ] # HELP openssl_tls_source_debug Records the number of times a protocol was traced along with additional debugging information # TYPE openssl_tls_source_debug counter openssl_tls_source_debug{exe="python3",name="openssl_tls_source_debug",protocol="kProtocolHTTP",ssl_source="kLibPythonSource"} 1 # HELP java_proc_crashed_during_attach Count of Java process crashes during symbolization agent attach. # TYPE java_proc_crashed_during_attach counter java_proc_crashed_during_attach{name="java_proc_crashed_during_attach"} 0 # HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed. # TYPE data_loss_bytes counter data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 2813 data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0 data_loss_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 90 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 0 # HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol. # TYPE conn_stats_bytes counter conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 3650 conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 11156 conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 272395 I20230804 14:27:16.048719 3474370 container_runner.cc:53] podman rm -f curl_6372094307402945 &>/dev/null I20230804 14:27:23.091778 3474370 container_runner.cc:53] podman rm -f python_min_310_https_server_6372073067154288 &>/dev/null [ OK ] OpenSSLTraceTest/3.ssl_capture_curl_client (29434 ms) [----------] 1 test from OpenSSLTraceTest/3 (29434 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (29434 ms total) [ PASSED ] 1 test. I20230804 14:27:23.436729 3474370 env.cc:51] Shutting down ``` </details> --------- Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
- Loading branch information