-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add protocol tracing support for applications using BoringSSL #692
Labels
area/datacollector
Issues related to Stirling (datacollector)
Comments
I'm in the process of drafting a design document for this work. Once the first draft is finished I'll be sharing the document here. |
2 tasks
aimichelle
pushed a commit
that referenced
this issue
Feb 23, 2023
…intext metrics (#903) Summary: This updates the `SocketTracerMetrics` class to differentiate between plaintext and tls metrics. Since we are working to instrument BoringSSL based applications more broadly (#692), this will allow us to experiment with the underlying tls tracing implementation and verify that there aren't protocol parsing issues introduced (indicated by more data loss). Relevant Issues: #692 Type of change: /kind feature Test Plan: Updated the `mux_trace_bpf_test` and `netty_tls_trace_bpf_test` with the following diff to verify that they increment the correct counter ([P317](https://phab.corp.pixielabs.ai/P317)) - [x] Inspect the counters values and dimensions despite end to end test mentioned above - [x] Verified that this data was not used for any previous purposes (new dimension would likely cause breakage) --------- Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
1 task
JamesMBartlett
pushed a commit
that referenced
this issue
Feb 28, 2023
…he existing stats interface to it (#907) Summary: Expose conn tracker creation through prometheus metrics and migrate the existing stats interface to it As part of an upcoming change to expand pixie's tls support for BoringSSL (#692), we want to track conn tracker's lifecycle to have a proxy measurement for if a connection's socket file descriptor is identified correctly. Our TLS tracing requires this since it is fundamental to a connection's identity. When our tls tracing implementation is changed, we expect the `conn_tracker_created` metric will stay at its existing baseline. Any increases in this metric (assuming our protocol tracing support remains constant) would indicate that the file descriptors are inferred incorrectly. Relevant Issues: #692 Type of change: /kind feature Test Plan: Existing conn_tracker tests pass which rely on our counters being correct - [x] Verify that metrics are visible in testing --------- Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
RagalahariP
pushed a commit
to RagalahariP/pixie
that referenced
this issue
Mar 23, 2023
…intext metrics (pixie-io#903) Summary: This updates the `SocketTracerMetrics` class to differentiate between plaintext and tls metrics. Since we are working to instrument BoringSSL based applications more broadly (pixie-io#692), this will allow us to experiment with the underlying tls tracing implementation and verify that there aren't protocol parsing issues introduced (indicated by more data loss). Relevant Issues: pixie-io#692 Type of change: /kind feature Test Plan: Updated the `mux_trace_bpf_test` and `netty_tls_trace_bpf_test` with the following diff to verify that they increment the correct counter ([P317](https://phab.corp.pixielabs.ai/P317)) - [x] Inspect the counters values and dimensions despite end to end test mentioned above - [x] Verified that this data was not used for any previous purposes (new dimension would likely cause breakage) --------- Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
RagalahariP
pushed a commit
to RagalahariP/pixie
that referenced
this issue
Mar 23, 2023
…he existing stats interface to it (pixie-io#907) Summary: Expose conn tracker creation through prometheus metrics and migrate the existing stats interface to it As part of an upcoming change to expand pixie's tls support for BoringSSL (pixie-io#692), we want to track conn tracker's lifecycle to have a proxy measurement for if a connection's socket file descriptor is identified correctly. Our TLS tracing requires this since it is fundamental to a connection's identity. When our tls tracing implementation is changed, we expect the `conn_tracker_created` metric will stay at its existing baseline. Any increases in this metric (assuming our protocol tracing support remains constant) would indicate that the file descriptors are inferred incorrectly. Relevant Issues: pixie-io#692 Type of change: /kind feature Test Plan: Existing conn_tracker tests pass which rely on our counters being correct - [x] Verify that metrics are visible in testing --------- Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
aimichelle
pushed a commit
that referenced
this issue
Mar 24, 2023
…#1089) Summary: Add nginx container image with OpenSSL v3 for future tls tracing test The tls tracing method developed for #692 will support dynamically linked OpenSSL v3 in addition to BoringSSL. In order to validate that the new tracing method works (in addition to its feature flag), I wanted to a test case that would prove the new implementation is in use. The mirrored upstream nginx image added in this PR has a different `/index.html` compared to our existing images. This change adds a container layer containing the expected `index.html` so that the future test assertions can remain the same. Relevant Issues: #692 Type of change: /kind test-infra Test Plan: Verified that the resulting nginx container returns the same html file as our [existing](https://github.com/pixie-io/pixie/blob/86dfb11dcbf605fbec1a317192886e52416fa4aa/src/stirling/source_connectors/socket_tracer/testing/containers/BUILD.bazel#L70) nginx [containers](https://github.com/pixie-io/pixie/blob/86dfb11dcbf605fbec1a317192886e52416fa4aa/src/stirling/source_connectors/socket_tracer/testing/containers/BUILD.bazel#L79) ``` # Run the new nginx image and an existing one built from bazel ddelnano@turing:~/code/pixie$ docker run -d -p 80:80 bazel/src/stirling/source_connectors/socket_tracer/testing/containers:nginx_alpine_openssl_3_0_7_image ddelnano@turing:~/code/pixie$ docker run -d -p 81:80 bazel/src/stirling/source_connectors/socket_tracer/testing/containers:nginx_openssl_1_1_1_image 48277f0656b2327de29917e063e80705c4f14e95d89081a0c8b9b4177846ca29 # Run the mirroed upstream image (used later to validate that the index.html is different) ddelnano@turing:~/code/pixie$ docker run -d -p 82:80 gcr.io/pixie-oss/pixie-dev-public/docker-deps/library/nginx@sha256:3eb380b81387e9f2a49cb6e5e18db016e33d62c37ea0e9be2339e9f0b3e26170 # Verify that requests to `/` result in the same response ddelnano@turing:~/code/pixie$ curl -s localhost:80 | sha256sum 38ffd4972ae513a0c79a8be4573403edcd709f0f572105362b08ff50cf6de521 - ddelnano@turing:~/code/pixie$ curl -s localhost:81 | sha256sum 38ffd4972ae513a0c79a8be4573403edcd709f0f572105362b08ff50cf6de521 - # Verify that the stock nginx 1.23 image returns a different index.html ddelnano@turing:~/code/pixie$ curl -s localhost:82 | sha256sum fb47468a2cd3953c7131431991afcc6a2703f14640520102eea0a685a7e8d6de - ``` --------- Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
This was referenced Mar 28, 2023
2 tasks
JamesMBartlett
pushed a commit
that referenced
this issue
Apr 6, 2023
…tls tracing method (#1161) Summary: Instrument OpenSSL tracing to detect validity of assumptions for new tls tracing method After discussing #1123 with @oazizi000 and @etep, we decided that invest in stronger validation that the assumptions for that PR and our future tls tracing are valid. Instead of relying on struct offsets of user space data structures, the new tracing method will access a connection's socket fd from the underlying socket syscalls during `SSL_write` / `SSL_read` calls. This technique should only be compatible with "BIO native" OpenSSL use cases, which are the OpenSSL use cases Pixie supports today. BIO native means that a compatible application uses a [BIO](https://wiki.openssl.org/index.php/BIO) provided by OpenSSL (via [SSL_set_fd](https://www.openssl.org/docs/manmaster/man3/SSL_set_fd.html)) and results in OpenSSL issuing read/write syscalls on your behalf for the underlying socket. There are two situations we wanted to understand prior to proceeding with #1123: how custom BIO implementations behave (netty, nodejs, etc) and detecting unrelated (non socket) syscalls while `SSL_write` and `SSL_read` are on the stack. The former was verified with an experiment based on this change and is described in the Test Plan section below. We believe the latter should not occur, but the changes in this PR instrument this situation in order to detect if doest occur. Assuming this condition isn't encountered, we will proceed with #1132 (with the minor changes learned from this experiment). Relevant Issues: #692 Type of change: /kind feature Test Plan: Verified the following: - [x] `DCHECK` added to `socket_trace_connector` does not detect mismatched file descriptors for `openssl_trace_bpf_test` and `netty_tls_trace_bpf_test` ([P345](https://phab.corp.pixielabs.ai/P345) -- openssl_trace_bpf_test is still disabled - #699) - [x] Verified with a BPF histogram that the number of active syscalls within SSL_write/SSL_read are non zero for BIO native use cases and zero for non BIO native cases (nodejs, netty). See more details in #1160 --------- Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
aimichelle
pushed a commit
that referenced
this issue
Apr 6, 2023
…anager (#1188) Summary: Add `access-tls-socket-fd-via-syscall` feature flag to cloud config manager This feature flag will be used to opt internal clusters into the new tls tracing implementation developed in #1120. We may want a more sophisticated toggle for gradually opting in more clusters, but this will suffice for the initial set of testing (validating some hand picked clusters). Relevant Issues: #692 Type of change: /kind feature Test Plan: Verified the following: - [x] Deploying a new pixie install without launch darkly credentials results in the default, `false`, value ([P354](https://phab.corp.pixielabs.ai/P354)) - [x] Deploying a new pixie install with a cloud with launch darkly credentials results in a `true` value ([P355](https://phab.corp.pixielabs.ai/P355)) Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
ddelnano
added a commit
to ddelnano/pixie
that referenced
this issue
May 1, 2023
…tls tracing method (pixie-io#1161) Summary: Instrument OpenSSL tracing to detect validity of assumptions for new tls tracing method After discussing pixie-io#1123 with @oazizi000 and @etep, we decided that invest in stronger validation that the assumptions for that PR and our future tls tracing are valid. Instead of relying on struct offsets of user space data structures, the new tracing method will access a connection's socket fd from the underlying socket syscalls during `SSL_write` / `SSL_read` calls. This technique should only be compatible with "BIO native" OpenSSL use cases, which are the OpenSSL use cases Pixie supports today. BIO native means that a compatible application uses a [BIO](https://wiki.openssl.org/index.php/BIO) provided by OpenSSL (via [SSL_set_fd](https://www.openssl.org/docs/manmaster/man3/SSL_set_fd.html)) and results in OpenSSL issuing read/write syscalls on your behalf for the underlying socket. There are two situations we wanted to understand prior to proceeding with pixie-io#1123: how custom BIO implementations behave (netty, nodejs, etc) and detecting unrelated (non socket) syscalls while `SSL_write` and `SSL_read` are on the stack. The former was verified with an experiment based on this change and is described in the Test Plan section below. We believe the latter should not occur, but the changes in this PR instrument this situation in order to detect if doest occur. Assuming this condition isn't encountered, we will proceed with pixie-io#1132 (with the minor changes learned from this experiment). Relevant Issues: pixie-io#692 Type of change: /kind feature Test Plan: Verified the following: - [x] `DCHECK` added to `socket_trace_connector` does not detect mismatched file descriptors for `openssl_trace_bpf_test` and `netty_tls_trace_bpf_test` ([P345](https://phab.corp.pixielabs.ai/P345) -- openssl_trace_bpf_test is still disabled - pixie-io#699) - [x] Verified with a BPF histogram that the number of active syscalls within SSL_write/SSL_read are non zero for BIO native use cases and zero for non BIO native cases (nodejs, netty). See more details in pixie-io#1160 --------- Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
ddelnano
added a commit
to ddelnano/pixie
that referenced
this issue
May 1, 2023
…anager (pixie-io#1188) Summary: Add `access-tls-socket-fd-via-syscall` feature flag to cloud config manager This feature flag will be used to opt internal clusters into the new tls tracing implementation developed in pixie-io#1120. We may want a more sophisticated toggle for gradually opting in more clusters, but this will suffice for the initial set of testing (validating some hand picked clusters). Relevant Issues: pixie-io#692 Type of change: /kind feature Test Plan: Verified the following: - [x] Deploying a new pixie install without launch darkly credentials results in the default, `false`, value ([P354](https://phab.corp.pixielabs.ai/P354)) - [x] Deploying a new pixie install with a cloud with launch darkly credentials results in a `true` value ([P355](https://phab.corp.pixielabs.ai/P355)) Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
JamesMBartlett
pushed a commit
that referenced
this issue
May 1, 2023
…ggle) (#1123) Summary: Implement new tls tracing method behind stirling cli flag (feature toggle) The tls tracing method added in this PR determines a connection's identity (socket fd) through a different mechanism from our existing tracing. Instead of relying on struct offsets of user space data structures, it accesses the socket fd via the underlying socket syscalls while `SSL_write` / `SSL_read` calls occur. This is a prerequisite to support BoringSSL because its rolling release style makes the previous method of user space offsets untenable. This has the added benefit of reducing our maintenance cost for our existing OpenSSL tracing. Assuming future versions of OpenSSL maintain the same contract, we will not require any code changes to support them -- Pixie will gain OpenSSL v3 support once this new tracing is the default (as mentioned in the later testing). Our assumption is that the applications that explicitly set the socket fd on the SSL struct (the applications supported by our previous tracing technique) -- [nginx](https://github.com/nginx/nginx/blob/dfe70f74a3558f05142fb552cea239add123d414/src/event/ngx_event_openssl.c#L1696), [python](https://github.com/python/cpython/blob/e375bff03736f809fbc234010c087ef9d7e0d384/Modules/_ssl.c#L836) all use Openssl with its native BIO interface. In order to verify that assumption, this change will be feature flagged and monitored carefully as its enabled on internal clusters. The existing [conn_stats_bytes metric](https://github.com/pixie-io/pixie/blob/f45ced1803e6e44406f20f1171c15a24f4d5a17a/src/stirling/source_connectors/socket_tracer/metrics.cc#L62) will be monitored for volume of tls traffic traced to verify there is no loss of instrumentation coverage between the new and old method. Relevant Issues: #692 Type of change: /kind feature Test Plan: Verified the following - [x] New test passes with `enable_openssl_v3_testing` bool flag enabled ([P336](https://phab.corp.pixielabs.ai/P336)). This verifies that the new tracing technique and the feature toggle are functional since our previous tracing does not support OpenSSL v3 - [x] Existing `openssl_trace_bpf_test` and `netty_tls_trace_bpf_test` passes ([P335](https://phab.corp.pixielabs.ai/P335)). This is explicitly mentioned since `openssl_trace_bpf_test` is still disabled #699 - [x] Verified the `bool_flag` added from PR feedback conditionally enables the OpenSSL v3 testing ([P337](https://phab.corp.pixielabs.ai/P337)) - [x] Validate metrics from #1161 to verify our assumptions are correct --------- Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai> Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
JamesMBartlett
pushed a commit
that referenced
this issue
May 3, 2023
…ng assumptions (mismatched fds) (#1270) Summary: Add dimensional metrics to record applications that violate tls tracing assumptions (mismatched fds) Relevant Issues: #692 Type of change: /kind feature Test Plan: Verified that the metrics added do not clash with the non dimensional metrics. Please see e83443a for the code that was used to produce this output ([P363](https://phab.corp.pixielabs.ai/P363)) --------- Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
vihangm
pushed a commit
that referenced
this issue
May 8, 2023
Summary: Remove update to BPF map not used in status quo tls tracing In order to vet the new style of tls tracing, we introduced a mechanism for detecting mismatched fds (#1161). This instrumented all of our tls tracing when it was first developed. When the new method of tls tracing was introduced, we removed the mismatched fd detection from the status quo tls tracing (#1123), however, this BPF map update was missed in that refactor (#1123). Relevant Issues: #692 Type of change: /kind bug Test Plan: Existing tests pass Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
aimichelle
pushed a commit
that referenced
this issue
May 15, 2023
…v3 (#1337) Summary: Enable TLS tracing for applications using dynamically linked OpenSSL v3 Relevant Issues: #692 Type of change: /kind feature Test Plan: Existing tests provide the necessary coverage Changelog Message: ```release-note TLS tracing now supports applications using OpenSSL v3 ``` --------- Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
This was referenced Jun 6, 2023
vihangm
pushed a commit
that referenced
this issue
Jun 8, 2023
…1449) Summary: Refactor Go SDK label templating to support future boringcrypto SDK This PR adds the scaffolding needed to add a boringcrypto go SDK. This SDK will be used in a future change to add TLS tracing tests for binaries using boringcrypto, which addresses #597. It wasn't known that boringcrypto was supported at the time, but we should still validate that it is functional. `rules_go` does not support go SDKs that use the same version with different `GOEXPERIMENT`s enabled (will be following up to create a GitHub issue on the project). This is an issue because boringcrypto is enabled by setting `GOEXPERIMENT=boringcrypto` as mentioned [here](https://go.googlesource.com/go/+/refs/heads/dev.boringcrypto/README.boringcrypto.md). Until `rules_go` supports this, the proposed plan is to maintain a previous patch version of our latest supported version of go as the "boringcrypto go SDK". The description below should explain the process: ``` # rules_go doesn't support using multiple SDKs with the same version and differing # GOEXPERIMENTs. Until this is addressed, go_sdk_boringcrypto is meant to be 1 bug fix # version behind our latest go release. In the event our primary toolchain is upgraded # to the first release of a new major version (i.e. 1.20.0) an rc suffixed build should # be used for go_sdk_boringcrypto (1.20rcX) until the first minor release is available (1.20.1). ``` Relevant Issues: #597 #692 Type of change: /kind test-infra Test Plan: Existing tests pass and verified this supports the boringcrypto tests on a branch with the full set of changes --------- Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
This was referenced Jun 16, 2023
vihangm
pushed a commit
that referenced
this issue
Jun 21, 2023
…s metrics (#1518) Summary: Record TLS library source in conn tracker and mismatched fd prometheus metrics This will allow us to identify which TLS library is in use for a given ConnTracker. When TLS tracing probes are added for statically linked binaries (BoringSSL), this will allow us to discern if future mismatched fd cases are known cases or from expanding our coverage. This replaces the existing `tls` dimension and models the plaintext case with `kSSLNone` rather than maintaining two sources of the information. Relevant Issues: #692 Type of change: /kind feature Test Plan: Verified the following to ensure the appropriate metrics are populated. The following [revert](713c532) shows what was needed to print out these metrics. - [x] `netty_tls_trace_bpf_test` increments the netty specific counter <details> <summary>netty_tls_trace_bpf_test output</summary> ``` $ ./scripts/sudo_bazel_run.sh -c dbg src/stirling/source_connectors/socket_tracer:netty_tls_trace_bpf_test [ ... ] # HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed. # TYPE data_loss_bytes counter data_loss_bytes{protocol="kProtocolMux",tls_source="kLibNettyTcnativeSource"} 220 data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 0 # HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol. # TYPE conn_stats_bytes counter conn_stats_bytes{protocol="kProtocolMux",tls_source="kLibNettyTcnativeSource"} 618 conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 598970 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 131291 I20230616 16:34:10.592751 51190 container_runner.cc:53] podman rm -f thriftmux_server_2146055842888549 &>/dev/null [ OK ] NettyTLSTraceTest/0.mtls_thriftmux_client (55081 ms) [----------] 1 test from NettyTLSTraceTest/0 (55081 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (55081 ms total) [ PASSED ] 1 test. I20230616 16:34:11.655771 51190 env.cc:51] Shutting down ``` </details> - [x] `openssl_trace_bpf_test` increments the counters for OpenSSL v1.1, v3, NodeJS and libpython <details> <summary>openssl_trace_bpf_test output</summary> ``` ddelnano@vigenere:~/code/pixie (ddelnano/trace-boringssl-linked-applications) $ ./scripts/sudo_bazel_run.sh -c dbg src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test --test_arg='--gtest_filter=OpenSSLTraceTest*.ssl_capture_curl_client' -- INFO: Invocation ID: d9486478-0df6-4bc1-9566-d452cb99d3d0 INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/d9486478-0df6-4bc1-9566-d452cb99d3d0 INFO: Analyzed target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test (0 packages loaded, 0 targets configured). INFO: Found 1 target... Target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test up-to-date: bazel-bin/src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test INFO: Elapsed time: 10.987s, Critical Path: 10.46s INFO: 3 processes: 1 internal, 2 linux-sandbox. INFO: Running command line: external/bazel_tools/tools/test/test-setup.sh /bin/bash -c '"$@"' /bin/bash sudo src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test '--gtest_filter=OpenSSLTraceTest*.ssl_capture_curl_client' INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/d9486478-0df6-4bc1-9566-d452cb99d3d0 INFO: Build completed successfully, 3 total actions exec ${PAGER:-/usr/bin/less} "$0" \|\| exit 1 Executing tests from //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test ----------------------------------------------------------------------------- I20230616 16:54:05.120615 70678 env.cc:47] Started: src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test Note: Google Test filter = OpenSSLTraceTest*.ssl_capture_curl_client [==========] Running 6 tests from 6 test suites. [----------] Global test environment set-up. [----------] 1 test from OpenSSLTraceTest/0, where TypeParam = px::stirling::NginxOpenSSL_1_1_0_ContainerWrapper [ RUN ] OpenSSLTraceTest/0.ssl_capture_curl_client [ ... ] # HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed. # TYPE data_loss_bytes counter data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 2254 data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0 # HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol. # TYPE conn_stats_bytes counter conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 3191 conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 1223780 I20230616 16:54:38.904987 70678 container_runner.cc:53] podman rm -f curl_2147337100716919 &>/dev/null I20230616 16:54:39.552585 70991 go_syms.cc:66] Falling back to the runtime.buildVersion symbol for go version detection W20230616 16:54:39.609521 70991 uprobe_manager.cc:870] Failed to attach HTTP2 Uprobes to /proc/70948/root/usr/bin/podman: Internal : Unable to find offset for binary /proc/70948/root/usr/bin/podman symbol github.com/containers/podman/vendor/google.golang.org/grpc/internal/transport.(*http2Client).operateHeaders address 0 I20230616 16:54:39.626680 70991 uprobe_manager.cc:965] Number of uprobes deployed = 9 I20230616 16:54:52.648070 70678 container_runner.cc:53] podman rm -f nginx_2147304013395620 &>/dev/null [ OK ] OpenSSLTraceTest/0.ssl_capture_curl_client (47937 ms) [----------] 1 test from OpenSSLTraceTest/0 (47937 ms total) [----------] 1 test from OpenSSLTraceTest/1, where TypeParam = px::stirling::NginxOpenSSL_1_1_1_ContainerWrapper [ RUN ] OpenSSLTraceTest/1.ssl_capture_curl_client [ ... ] # HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed. # TYPE data_loss_bytes counter data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 0 data_loss_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 4508 data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0 # HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol. # TYPE conn_stats_bytes counter conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 133916 conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 596 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6382 conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 2472737 I20230616 16:55:27.096170 70678 container_runner.cc:53] podman rm -f curl_2147385287171660 &>/dev/null I20230616 16:55:27.743304 71611 go_syms.cc:66] Falling back to the runtime.buildVersion symbol for go version detection W20230616 16:55:27.800357 71611 uprobe_manager.cc:870] Failed to attach HTTP2 Uprobes to /proc/71567/root/usr/bin/podman: Internal : Unable to find offset for binary /proc/71567/root/usr/bin/podman symbol github.com/containers/podman/vendor/google.golang.org/grpc/internal/transport.(*http2Client).operateHeaders address 0 I20230616 16:55:27.817342 71611 uprobe_manager.cc:965] Number of uprobes deployed = 9 I20230616 16:55:42.472515 70678 container_runner.cc:53] podman rm -f nginx_2147351974323900 &>/dev/null [ OK ] OpenSSLTraceTest/1.ssl_capture_curl_client (49829 ms) [----------] 1 test from OpenSSLTraceTest/1 (49829 ms total) [----------] 1 test from OpenSSLTraceTest/2, where TypeParam = px::stirling::NginxOpenSSL_3_0_8_ContainerWrapper [ RUN ] OpenSSLTraceTest/2.ssl_capture_curl_client [ ... ] # HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed. # TYPE data_loss_bytes counter data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 2254 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLUnspecified"} 4015 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 875776 data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0 data_loss_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 4508 # HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol. # TYPE conn_stats_bytes counter conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 3184 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLUnspecified"} 167458 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 1582307 conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 2497556 conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 7036 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6382 I20230616 16:56:17.009613 70678 container_runner.cc:53] podman rm -f curl_2147435183916507 &>/dev/null I20230616 16:56:17.652369 72745 go_syms.cc:66] Falling back to the runtime.buildVersion symbol for go version detection W20230616 16:56:17.709861 72745 uprobe_manager.cc:870] Failed to attach HTTP2 Uprobes to /proc/72701/root/usr/bin/podman: Internal : Unable to find offset for binary /proc/72701/root/usr/bin/podman symbol github.com/containers/podman/vendor/google.golang.org/grpc/internal/transport.(*http2Client).operateHeaders address 0 I20230616 16:56:17.727325 72745 uprobe_manager.cc:965] Number of uprobes deployed = 9 I20230616 16:56:31.172713 70678 container_runner.cc:53] podman rm -f nginx_2147401729646825 &>/dev/null [ OK ] OpenSSLTraceTest/2.ssl_capture_curl_client (48670 ms) [----------] 1 test from OpenSSLTraceTest/2 (48670 ms total) [----------] 1 test from OpenSSLTraceTest/3, where TypeParam = px::stirling::Python310ContainerWrapper [ RUN ] OpenSSLTraceTest/3.ssl_capture_curl_client [ ... ] # HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed. # TYPE data_loss_bytes counter data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 2813 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 2254 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLUnspecified"} 4015 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 875776 data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0 data_loss_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 4508 # HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol. # TYPE conn_stats_bytes counter conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 3650 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 3184 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLUnspecified"} 167458 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 1582307 conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 2501517 conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 7343 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6382 I20230616 16:57:06.748605 70678 container_runner.cc:53] podman rm -f curl_2147484954706117 &>/dev/null I20230616 16:57:07.425081 73342 go_syms.cc:66] Falling back to the runtime.buildVersion symbol for go version detection W20230616 16:57:07.481707 73342 uprobe_manager.cc:870] Failed to attach HTTP2 Uprobes to /proc/73297/root/usr/bin/podman: Internal : Unable to find offset for binary /proc/73297/root/usr/bin/podman symbol github.com/containers/podman/vendor/google.golang.org/grpc/internal/transport.(*http2Client).operateHeaders address 0 I20230616 16:57:07.498756 73342 uprobe_manager.cc:965] Number of uprobes deployed = 9 I20230616 16:57:20.024674 70678 container_runner.cc:53] podman rm -f python_min_310_https_server_2147450620127316 &>/dev/null [ OK ] OpenSSLTraceTest/3.ssl_capture_curl_client (48825 ms) [----------] 1 test from OpenSSLTraceTest/3 (48825 ms total) [----------] 1 test from OpenSSLTraceTest/4, where TypeParam = px::stirling::Node12_3_1ContainerWrapper [ RUN ] OpenSSLTraceTest/4.ssl_capture_curl_client [ ... ] # HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed. # TYPE data_loss_bytes counter data_loss_bytes{protocol="kProtocolHTTP",tls_source="kNodeJSSource"} 517 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 2813 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 2254 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLUnspecified"} 4015 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 875776 data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0 data_loss_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 4508 # HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol. # TYPE conn_stats_bytes counter conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kNodeJSSource"} 1360 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 3650 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 3184 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLUnspecified"} 167458 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 1713598 conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 2505532 conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 7343 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6382 I20230616 16:57:54.744580 70678 container_runner.cc:53] podman rm -f curl_2147532942349787 &>/dev/null I20230616 16:57:55.404923 73893 go_syms.cc:66] Falling back to the runtime.buildVersion symbol for go version detection W20230616 16:57:55.462363 73893 uprobe_manager.cc:870] Failed to attach HTTP2 Uprobes to /proc/73846/root/usr/bin/podman: Internal : Unable to find offset for binary /proc/73846/root/usr/bin/podman symbol github.com/containers/podman/vendor/google.golang.org/grpc/internal/transport.(*http2Client).operateHeaders address 0 I20230616 16:57:55.480280 73893 uprobe_manager.cc:965] Number of uprobes deployed = 9 I20230616 16:58:08.744537 70678 container_runner.cc:53] podman rm -f node_server_2147499313108620 &>/dev/null [ OK ] OpenSSLTraceTest/4.ssl_capture_curl_client (48721 ms) [----------] 1 test from OpenSSLTraceTest/4 (48722 ms total) [----------] 1 test from OpenSSLTraceTest/5, where TypeParam = px::stirling::Node14_18_1AlpineContainerWrapper [ RUN ] OpenSSLTraceTest/5.ssl_capture_curl_client [ ... ] # HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed. # TYPE data_loss_bytes counter data_loss_bytes{protocol="kProtocolHTTP",tls_source="kNodeJSSource"} 1034 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 2813 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 2254 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLUnspecified"} 4015 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 875776 data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0 data_loss_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 4508 # HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol. # TYPE conn_stats_bytes counter conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kNodeJSSource"} 2743 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 3650 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 3184 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLUnspecified"} 167458 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 1713598 conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 3746554 conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 7343 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6382 I20230616 16:58:43.613726 70678 container_runner.cc:53] podman rm -f curl_2147581809225276 &>/dev/null I20230616 16:58:44.279163 74434 go_syms.cc:66] Falling back to the runtime.buildVersion symbol for go version detection W20230616 16:58:44.335726 74434 uprobe_manager.cc:870] Failed to attach HTTP2 Uprobes to /proc/74389/root/usr/bin/podman: Internal : Unable to find offset for binary /proc/74389/root/usr/bin/podman symbol github.com/containers/podman/vendor/google.golang.org/grpc/internal/transport.(*http2Client).operateHeaders address 0 I20230616 16:58:44.353601 74434 uprobe_manager.cc:965] Number of uprobes deployed = 9 I20230616 16:58:57.204933 70678 container_runner.cc:53] podman rm -f node_server_2147548012141854 &>/dev/null [ OK ] OpenSSLTraceTest/5.ssl_capture_curl_client (48439 ms) [----------] 1 test from OpenSSLTraceTest/5 (48439 ms total) [----------] Global test environment tear-down [==========] 6 tests from 6 test suites ran. (292423 ms total) [ PASSED ] 6 tests. I20230616 16:58:57.544296 70678 env.cc:51] Shutting down ``` </details> - [x] `DCHECK` is triggered during a test run if the `ssl_source_t` is not found for the given ssl library matcher. ``` [ .. ] I20230616 16:10:34.742772 8830 socket_trace_connector.cc:427] Number of perf buffers opened = 8 W20230616 16:10:34.903129 9366 uprobe_symaddrs.cc:621] Unable to find openssl symbol 'OpenSSL_version_num' using dlopen/dlsym. Attempting to find address manually for pid 7416 F20230616 16:10:34.974184 9366 uprobe_manager.cc:336] Check failed: false Unable to find matching ssl_source_t for library matcher libnetty_tcnative_linux_x86 ``` --------- Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
2 tasks
vihangm
pushed a commit
that referenced
this issue
Jun 22, 2023
Summary: Remove legacy TLS tracing feature toggle and transitional code The new style of TLS tracing has been rolled out since April 11th (vizier release v0.12.19). We believe that it is performing well and it has already allowed Pixie's TLS tracing to cover more libraries (OpenSSL v3) with BoringSSL coming soon. This includes changes from #1518 and must be rebased once that is merged. Relevant Issues: #692 Type of change: /kind cleanup Test Plan: Existing test coverage --------- Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
vihangm
pushed a commit
that referenced
this issue
Jun 23, 2023
Summary: Trace statically linked OpenSSL compatible TLS libraries This adds a feature toggle for enabling TLS tracing applications statically linked with an OpenSSL compatible library. The plan is to enable this for internal clusters for 1 week before enabling it for all users (and making it the default). We originally intended to add these probes for BoringSSL applications, however, detecting BoringSSL is more difficult than anticipated. One of the reliable indicators, checking for BoringSSL's [magic tag](google/boringssl@89386ac), is only available if an application is using a BoringSSL from Oct 2021 or later and from checking the following dependencies (different envoy distributions, Clickhouse, the Mono runtime, [cloudflare/boring](https://github.com/cloudflare/boring) applications and Go binaries with `boringcrypto` enabled) that wasn't the case. In addition, checking for the magic tag could cause performance issues. Since our latest TLS tracing implementation provides a broad set of coverage, we opted to trace any application that contains one of the OpenSSL compatible symbols necessary for tracing (`SSL_write`). For BIO native applications, we will successfully trace the traffic. For non BIO native cases (envoy), the uprobes will trigger but won't capture any data. This was deemed an acceptable trade off since detecting BoringSSL was challenging and any indicator would likely involve a long tail of upstream adoption. Relevant Issues: #692 Type of change: /kind feature Test Plan: Verified this change through the following: - [x] `boringssl_trace_bpf_test` verifies tracing is successful - [x] Verified that new tls source (`kStaticallyLinkedSource`) is identified during `boringssl_trace_bpf_test` (when prometheus metrics are logged out) ``` $ ./scripts/sudo_bazel_run.sh src/stirling/source_connectors/socket_tracer:boringssl_trace_bpf_test data_loss_bytes{protocol="kProtocolHTTP",tls_source="kStaticallyLinkedSource"} 1244 data_loss_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 90 data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0 # HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol. # TYPE conn_stats_bytes counter conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kStaticallyLinkedSource"} 1638 conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0 conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 76521 ``` Changelog Message: ```release-note Add support for tracing encrypted traffic for statically linked OpenSSL/BoringSSL applications. This functionality is currently disabled but will be enabled by default in an upcoming release. ``` --------- Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
vihangm
pushed a commit
that referenced
this issue
Jun 26, 2023
Summary: Fix misspelling of `PX_TRACE_STATIC_TLS_BINARIES` PEM flag Relevant Issues: #692 Type of change: /kind bug Test Plan: grep'ed to make sure the flag is consistent with stirling's flag ``` ddelnano@vigenere:~/code/pixie (ddelnano/attempt-to-reproduce-amqproxy-issue) $ git grep PX_TRACE_STATIC_TLS_BINARIES src/stirling/source_connectors/socket_tracer/socket_trace_connector.cc: stirling_trace_static_tls_binaries, gflags::BoolFromEnv("PX_TRACE_STATIC_TLS_BINARIES", false), ``` Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
etep
pushed a commit
to etep/pixie
that referenced
this issue
Jul 25, 2023
… by default (pixie-io#1625) Summary: Opt statically linked OpenSSL/BoringSSL applications into TLS tracing by default Relevant Issues: pixie-io#692 Type of change: /kind feature Test Plan: Feature flag was used for Pixie owned clusters. Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
3 tasks
aimichelle
pushed a commit
that referenced
this issue
Aug 4, 2023
) Summary: Add TLS tracing source debugging mode and associated feature flag This debugging feature flag is intended for adhoc investigations for determining why a particular TLS source (libpython, statically linked open/boringssl, OpenSSL v1.x, etc) was traced for a given protocol. BoringSSL tracing was recently rolled out and it appeared this functionality was covering every protocol that Pixie supports. This was unexpected based on the open source / popular projects we believe should be covered by this tracing. As I tested this change, I found a bug in how a TLS source is attributed. The libpython TLS source was identified as statically linked (see Test Plan below for details). I believe this is likely the discrepancy causing the statically linked source to show up for every supported protocol. That bug will be addressed in a follow up change. When reviewing the Test Plan, please note that the changes in 5a23004 were needed to produce the prometheus output and to verify that the TLS source misattribution is present. Relevant Issues: #692 Type of change: /kind feature Test Plan: Verified the following - [x] Simulating the mismatched fd case still causes the `DCHECK` to occur <details><summary>openssl_trace_bpf_test with mismatched fd error</summary> ``` # Introduce change to simulate mismatched fd case. The openssl_trace_bpf_test cases are known to require calling send/read multiple times for the same fd. This change will cause the DCHECK to occur on the second read/write call -- $ git diff diff --git a/src/stirling/source_connectors/socket_tracer/bcc_bpf/socket_trace.c b/src/stirling/source_connectors/socket_tracer/bcc_bpf/socket_trace.c index ba5df955dd..e1c64a1a7d 100644 --- a/src/stirling/source_connectors/socket_tracer/bcc_bpf/socket_trace.c +++ b/src/stirling/source_connectors/socket_tracer/bcc_bpf/socket_trace.c @@ -185,7 +185,7 @@ static __inline void propagate_fd_to_user_space_call(uint64_t pid_tgid, int fd) int current_fd = nested_syscall_fd_ptr->fd; if (current_fd == kInvalidFD) { nested_syscall_fd_ptr->fd = fd; - } else if (current_fd != fd) { + } else /*if (current_fd != fd)*/ { // Found two different fds during a single SSL_write/SSL_read call. This invalidates // our tls tracing assumptions and must be recorded. nested_syscall_fd_ptr->mismatched_fds = true; # Verify that the mismatched fd case still causes a crash from the DCHECK $ ./scripts/sudo_bazel_run.sh -c dbg src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test INFO: Invocation ID: 7c4390fd-e585-49fe-ac2a-22e17122567e INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/7c4390fd-e585-49fe-ac2a-22e17122567e INFO: Analyzed target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test (0 packages loaded, 0 targets configured). INFO: Found 1 target... Target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test up-to-date: bazel-bin/src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test INFO: Elapsed time: 3.826s, Critical Path: 3.39s INFO: 4 processes: 1 internal, 3 linux-sandbox. INFO: Running command line: external/bazel_tools/tools/test/test-setup.sh /bin/bash -c '"$@"' /bin/bash sudo src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/7c4390fd-e585-49fe-ac2a-22e17122567e INFO: Build completed successfully, 4 total actions exec ${PAGER:-/usr/bin/less} "$0" \|\| exit 1 Executing tests from //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test ----------------------------------------------------------------------------- I20230804 14:55:53.269637 3497837 env.cc:47] Started: src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test [==========] Running 18 tests from 6 test suites. [----------] Global test environment set-up. [----------] 3 tests from OpenSSLTraceTest/0, where TypeParam = px::stirling::NginxOpenSSL_1_1_0_ContainerWrapper [ RUN ] OpenSSLTraceTest/0.ssl_capture_curl_client I20230804 14:55:53.423983 3497837 container_runner.cc:36] Loaded image: localhost/bazel/src/stirling/source_connectors/socket_tracer/testing/containers:nginx_openssl_1_1_0_image I20230804 14:55:53.424044 3497837 container_runner.cc:114] podman run --timeout=3600 --rm -q --pid=host --name=nginx_6373812169417785 localhost/bazel/src/stirling/source_connectors/socket_tracer/testing/containers:nginx_openssl_1_1_0_image I20230804 14:55:53.664819 3497837 container_runner.cc:144] Container nginx_6373812169417785 status: running I20230804 14:55:53.698279 3497837 container_runner.cc:175] Container nginx_6373812169417785 process PID: 3497990 I20230804 14:55:53.698315 3497837 container_runner.cc:177] Container nginx_6373812169417785 waiting for log message: I20230804 14:55:53.732992 3497837 container_runner.cc:189] Container nginx_6373812169417785 status: running I20230804 14:55:53.733016 3497837 container_runner.cc:225] Container nginx_6373812169417785 is ready. I20230804 14:55:54.733592 3497837 linux_headers.cc:211] Found Linux kernel version using .note section. I20230804 14:55:54.733633 3497837 source_connector.cc:35] Initializing source connector: socket_trace_connector I20230804 14:55:54.733664 3497837 linux_headers.cc:94] Obtained Linux version string from `uname`: 5.19.0-1022-gcp I20230804 14:55:54.733670 3497837 linux_headers.cc:642] Detected kernel release (uname -r): 5.19.0-1022-gcp I20230804 14:55:54.733700 3497837 bcc_wrapper.cc:121] Using linux headers found at /lib/modules/5.19.0-1022-gcp/build for BCC runtime. I20230804 14:55:54.733760 3497837 bcc_wrapper.cc:170] Initializing BPF program ... I20230804 14:56:04.312705 3497837 scoped_timer.h:48] Timer(init_bpf_program) : 9.58 s I20230804 14:56:05.196256 3497837 socket_trace_connector.cc:437] Number of kprobes deployed = 40 I20230804 14:56:05.196297 3497837 socket_trace_connector.cc:438] Probes successfully deployed. I20230804 14:56:05.196377 3497837 socket_trace_connector.cc:373] Initializing perf buffers with ncpus=96 and scaling_factor=0.0865385 I20230804 14:56:05.196416 3497837 socket_trace_connector.cc:362] Total perf buffer usage for kData buffers across all cpus: 268038720 I20230804 14:56:05.196425 3497837 socket_trace_connector.cc:362] Total perf buffer usage for kControl buffers across all cpus: 13353216 I20230804 14:56:05.304798 3497837 socket_trace_connector.cc:442] Number of perf buffers opened = 8 W20230804 14:56:12.303573 3498584 uprobe_manager.cc:852] Cannot analyze binary /proc/3498580/root/usr/bin/python3.10 for uprobe deployment. If file is under /var/lib, container may have terminated. Message = Can't find or process ELF file /proc/3498580/root/usr/bin/python3.10 I20230804 14:56:14.672582 3498584 uprobe_manager.cc:1000] Number of uprobes deployed = 11866 F20230804 14:56:14.748164 3498588 socket_trace_connector.cc:701] Check failed: error_code == kOpenSSLTraceOk (1 vs. 0) *** Check failure stack trace: *** ``` </details> - [x] Enabling tls source debugging on 72075b6 shows the `kLibPythonSource` tls source incorrectly attributed as `kStaticallyLinkedSource` <details><summary>openssl_trace_bpf_test prometheus metrics output</summary> ``` $ git show commit 2def9a3 (HEAD, ddelnano/ddelnano/add-tls-tracing-debug-feature-flag) Author: Dom Del Nano <ddelnano@pixielabs.ai> Date: Thu Aug 3 15:35:52 2023 +0000 Add TLS tracing source debugging mode Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai> $ ./scripts/sudo_bazel_run.sh -c dbg src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test --test_arg='--gtest_filter=OpenSSLTraceTest/3.ssl_capture_curl_client' INFO: Invocation ID: 566f27c6-208f-4f86-83bb-b5e2a214f81b INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/566f27c6-208f-4f86-83bb-b5e2a214f81b INFO: Analyzed target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test (0 packages loaded, 0 targets configured). INFO: Found 1 target... Target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test up-to-date: bazel-bin/src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test INFO: Elapsed time: 4.238s, Critical Path: 3.57s INFO: 3 processes: 1 remote cache hit, 1 internal, 1 linux-sandbox. INFO: Running command line: external/bazel_tools/tools/test/test-setup.sh /bin/bash -c '"$@"' /bin/bash sudo src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test '--gtest_filter=OpenSSLTraceTest/3.ssl_capture_curl_client' INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/566f27c6-208f-4f86-83bb-b5e2a214f81b INFO: Build completed successfully, 3 total actions exec ${PAGER:-/usr/bin/less} "$0" \|\| exit 1 Executing tests from //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test ----------------------------------------------------------------------------- I20230804 14:29:07.070752 3476002 env.cc:47] Started: src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test Note: Google Test filter = OpenSSLTraceTest/3.ssl_capture_curl_client [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from OpenSSLTraceTest/3, where TypeParam = px::stirling::Python310ContainerWrapper [ RUN ] OpenSSLTraceTest/3.ssl_capture_curl_client [ ... ] # HELP openssl_tls_source_debug Records the number of times a protocol was traced along with additional debugging information # TYPE openssl_tls_source_debug counter openssl_tls_source_debug{exe="python3",name="openssl_tls_source_debug",protocol="kProtocolHTTP",ssl_source="kStaticallyLinkedSource"} 1 # HELP java_proc_crashed_during_attach Count of Java process crashes during symbolization agent attach. # TYPE java_proc_crashed_during_attach counter java_proc_crashed_during_attach{name="java_proc_crashed_during_attach"} 0 # HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed. # TYPE data_loss_bytes counter data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kStaticallyLinkedSource"} 2813 data_loss_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 90 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 0 # HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol. # TYPE conn_stats_bytes counter conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 11156 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kStaticallyLinkedSource"} 3650 conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 132148 ``` </details> - [x] Enabling tls source debugging on [e0b792d](e0b792d) shows the `kLibPythonSource` tls source correctly attributed. This will be fixed in a follow up PR but proves that this debug information has tracked down the issue. <details><summary>openssl_trace_bpf_test prometheus metrics output</summary> ``` $ git show -- commit 1c9200932f89e6bbf530f02fd607d5008aac1c44 (HEAD -> ddelnano/add-tls-tracing-debug-feature-flag) Author: Dom Del Nano <ddelnano@pixielabs.ai> Date: Thu Aug 3 23:23:11 2023 +0000 Fix bug where non statically linked binaries were attributed as statically link Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai> diff --git a/src/stirling/source_connectors/socket_tracer/uprobe_manager.cc b/src/stirling/source_connectors/socket_tracer/uprobe_manager.cc index 76850c250e..50ba85fc5b 100644 --- a/src/stirling/source_connectors/socket_tracer/uprobe_manager.cc +++ b/src/stirling/source_connectors/socket_tracer/uprobe_manager.cc @@ -654,10 +654,10 @@ int UProbeManager::DeployOpenSSLUProbes(const absl::flat_hash_set<md::UPID>& pid // before the BPF map is updated. This value is cleaned up when the upid is // terminated, so if attachment fails it will be deleted prior to the pid being // reused. - openssl_source_map_->UpdateValue(pid.pid(), kStaticallyLinkedSource); count_or = AttachOpenSSLUProbesOnStaticBinary(pid.pid()); - if (count_or.ok()) { + if (count_or.ok() && count_or.ValueOrDie() > 0) { + openssl_source_map_->UpdateValue(pid.pid(), kStaticallyLinkedSource); uprobe_count += count_or.ValueOrDie(); VLOG(1) << absl::Substitute( $ ./scripts/sudo_bazel_run.sh -c dbg src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test --test_arg='--gtest_filter=OpenSSLTraceTest/3.ssl_capture_curl_client' Starting local Bazel server and connecting to it... INFO: Invocation ID: 2d279cd6-45f7-4939-92d2-9af76a262954 INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/2d279cd6-45f7-4939-92d2-9af76a262954 INFO: Analyzed target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test (368 packages loaded, 38224 targets configured). INFO: Found 1 target... Target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test up-to-date: bazel-bin/src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test INFO: Elapsed time: 8.733s, Critical Path: 0.90s INFO: 1 process: 1 internal. INFO: Running command line: external/bazel_tools/tools/test/test-setup.sh /bin/bash -c '"$@"' /bin/bash sudo src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test '--gtest_filter=OpenSSLTraceTest/3.ssl_capture_curl_client' INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/2d279cd6-45f7-4939-92d2-9af76a262954 INFO: Build completed successfully, 1 total action exec ${PAGER:-/usr/bin/less} "$0" \|\| exit 1 Executing tests from //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test ----------------------------------------------------------------------------- I20230804 14:26:54.002213 3474370 env.cc:47] Started: src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test Note: Google Test filter = OpenSSLTraceTest/3.ssl_capture_curl_client [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from OpenSSLTraceTest/3, where TypeParam = px::stirling::Python310ContainerWrapper [ RUN ] OpenSSLTraceTest/3.ssl_capture_curl_client I20230804 14:26:54.321714 3474370 container_runner.cc:36] Loaded image: localhost/bazel/src/stirling/source_connectors/socket_tracer/testing/containers/ssl:python_min_310_https_server [ ... ] # HELP openssl_tls_source_debug Records the number of times a protocol was traced along with additional debugging information # TYPE openssl_tls_source_debug counter openssl_tls_source_debug{exe="python3",name="openssl_tls_source_debug",protocol="kProtocolHTTP",ssl_source="kLibPythonSource"} 1 # HELP java_proc_crashed_during_attach Count of Java process crashes during symbolization agent attach. # TYPE java_proc_crashed_during_attach counter java_proc_crashed_during_attach{name="java_proc_crashed_during_attach"} 0 # HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed. # TYPE data_loss_bytes counter data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 2813 data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0 data_loss_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 90 data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 0 # HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol. # TYPE conn_stats_bytes counter conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 3650 conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 11156 conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 272395 I20230804 14:27:16.048719 3474370 container_runner.cc:53] podman rm -f curl_6372094307402945 &>/dev/null I20230804 14:27:23.091778 3474370 container_runner.cc:53] podman rm -f python_min_310_https_server_6372073067154288 &>/dev/null [ OK ] OpenSSLTraceTest/3.ssl_capture_curl_client (29434 ms) [----------] 1 test from OpenSSLTraceTest/3 (29434 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (29434 ms total) [ PASSED ] 1 test. I20230804 14:27:23.436729 3474370 env.cc:51] Shutting down ``` </details> --------- Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
vihangm
pushed a commit
that referenced
this issue
Aug 7, 2023
…g probes are added (#1655) Summary: Ensure `kStaticallyLinkedSource` is attributed when static TLS tracing probes are added When developing #1652, we noticed that there are situations where our prometheus metrics are annotated with `kStaticallyLinkedSource` instead of the correct tls source. This is due to the following [conditional logic](https://github.com/pixie-io/pixie/blob/44a8338a60e74564d94d474a7ed723717d2ebaaf/src/stirling/source_connectors/socket_tracer/uprobe_manager.cc#L646) using a mutable variable which was returning true even when uprobes were already attached. When tracing a dynamically linked process, the uprobe count will be recorded to a non zero value when the dynamic probes are [attached](https://github.com/pixie-io/pixie/blob/44a8338a60e74564d94d474a7ed723717d2ebaaf/src/stirling/source_connectors/socket_tracer/uprobe_manager.cc#L615). The NodeJS uprobes in [AttachNodeJsOpenSSLUprobes](https://github.com/pixie-io/pixie/blob/44a8338a60e74564d94d474a7ed723717d2ebaaf/src/stirling/source_connectors/socket_tracer/uprobe_manager.cc#L629C16-L629C42) will then reset the `count_or` integer value back to 0 (since dynamically linked applications cannot be NodeJS -- it's statically linked). This causes us to erroneously attempt (and fail) to attach the statically linked probes in addition to setting the tls source to `kStaticallyLinkedSource`. I attempted to refactor the uprobe counting logic and this bug in a single change, but I believe it will increase the scope of this. The OpenSSL probe specs are shared amongst the dynamic, statically linked and NodeJS cases, which causes complications with counting the number of attached (rather than returning the potential count). While we are confident this bug is likely the source of the confusing BoringSSL tracing results, it would be beneficial to have that validation sooner. I will follow up this change with refactoring this code, but would prefer to proceed with this fix to unblock the BoringSSL tracing validation. Relevant Issues: #692 Type of change: /kind bug Test Plan: Printed out the promtheus `conn_stats_bytes` metrics during `openssl_trace_bpf_test` and verified they match the given test case <details><summary> openssl_trace_bpf_test metric output</summary> ``` $ ./scripts/sudo_bazel_run.sh -c dbg src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test --test_arg='--gtest_filter=OpenSSLTraceTest/*.ssl_capture_curl_client' 2>&1 \| grep '^conn_stats_bytes\\|TypeParam' -- [----------] 1 test from OpenSSLTraceTest/0, where TypeParam = px::stirling::NginxOpenSSL_1_1_0_ContainerWrapper conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 3191 conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 3591 conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 416 [----------] 1 test from OpenSSLTraceTest/1, where TypeParam = px::stirling::NginxOpenSSL_1_1_1_ContainerWrapper conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6375 conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 6989 conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 832 [----------] 1 test from OpenSSLTraceTest/2, where TypeParam = px::stirling::NginxOpenSSL_3_0_8_ContainerWrapper conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 3184 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6375 conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 10484 conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 307 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 1248 [----------] 1 test from OpenSSLTraceTest/3, where TypeParam = px::stirling::Python310ContainerWrapper conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 3650 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 3184 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6375 conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 14350 conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 307 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 1664 [----------] 1 test from OpenSSLTraceTest/4, where TypeParam = px::stirling::Node12_3_1ContainerWrapper conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kNodeJSSource"} 1360 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 3650 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 3184 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6375 conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 18367 conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 307 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 133152 [----------] 1 test from OpenSSLTraceTest/5, where TypeParam = px::stirling::Node14_18_1AlpineContainerWrapper conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kNodeJSSource"} 2743 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 3650 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 3184 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6375 conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 22319 conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 307 conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 155820 ``` </details> --------- Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
vihangm
pushed a commit
that referenced
this issue
Aug 17, 2023
…d announce BoringSSL support (#1678) Summary: Update static tls (BoringSSL) tracing feature flag default to true and announce BoringSSL support Relevant Issues: #692 Type of change: /kind feature Test Plan: This tls tracing has been enabled since v0.14.3's release last month (#1625) and the metrics for internal clusters have been validated Changelog Message: ```release-notes Enhance TLS tracing to support statically linked OpenSSL and BoringSSL (OpenSSL API compatible libraries). ``` Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
This is complete as of the v0.14.3 Vizier release! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Pixie currently supports tracing protocol traffic encrypted with certain TLS libraries (OpenSSL version 1.1.0 or 1.1.1, dynamically linked and Go TLS when a binary has debug info). This gives broad coverage, but there are other popular TLS libraries that are not supported as of today (BoringSSL being one of them). The recent work to trace netty tls traffic (#407) uncovered some common challenges to support BoringSSL more broadly.
This issue will track the work to enhance Pixie's TLS protocol tracing to include applications that use BoringSSL.
The text was updated successfully, but these errors were encountered: