Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add protocol tracing support for applications using BoringSSL #692

Closed
ddelnano opened this issue Jan 17, 2023 · 2 comments
Closed

Add protocol tracing support for applications using BoringSSL #692

ddelnano opened this issue Jan 17, 2023 · 2 comments
Assignees
Labels
area/datacollector Issues related to Stirling (datacollector)

Comments

@ddelnano
Copy link
Member

Pixie currently supports tracing protocol traffic encrypted with certain TLS libraries (OpenSSL version 1.1.0 or 1.1.1, dynamically linked and Go TLS when a binary has debug info). This gives broad coverage, but there are other popular TLS libraries that are not supported as of today (BoringSSL being one of them). The recent work to trace netty tls traffic (#407) uncovered some common challenges to support BoringSSL more broadly.

This issue will track the work to enhance Pixie's TLS protocol tracing to include applications that use BoringSSL.

@ddelnano ddelnano added the area/datacollector Issues related to Stirling (datacollector) label Jan 17, 2023
@ddelnano ddelnano self-assigned this Jan 17, 2023
@ddelnano
Copy link
Member Author

I'm in the process of drafting a design document for this work. Once the first draft is finished I'll be sharing the document here.

aimichelle pushed a commit that referenced this issue Feb 23, 2023
…intext metrics (#903)

Summary: This updates the `SocketTracerMetrics` class to differentiate
between plaintext and tls metrics. Since we are working to instrument
BoringSSL based applications more broadly (#692), this will allow us to
experiment with the underlying tls tracing implementation and verify
that there aren't protocol parsing issues introduced (indicated by more
data loss).

Relevant Issues: #692

Type of change: /kind feature

Test Plan: Updated the `mux_trace_bpf_test` and
`netty_tls_trace_bpf_test` with the following diff to verify that they
increment the correct counter
([P317](https://phab.corp.pixielabs.ai/P317))
- [x] Inspect the counters values and dimensions despite end to end test
mentioned above
- [x] Verified that this data was not used for any previous purposes
(new dimension would likely cause breakage)

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
JamesMBartlett pushed a commit that referenced this issue Feb 28, 2023
…he existing stats interface to it (#907)

Summary: Expose conn tracker creation through prometheus metrics and
migrate the existing stats interface to it

As part of an upcoming change to expand pixie's tls support for
BoringSSL (#692), we want to track conn tracker's lifecycle to have a
proxy measurement for if a connection's socket file descriptor is
identified correctly. Our TLS tracing requires this since it is
fundamental to a connection's identity. When our tls tracing
implementation is changed, we expect the `conn_tracker_created` metric
will stay at its existing baseline. Any increases in this metric
(assuming our protocol tracing support remains constant) would indicate
that the file descriptors are inferred incorrectly.

Relevant Issues: #692

Type of change: /kind feature

Test Plan: Existing conn_tracker tests pass which rely on our counters
being correct
- [x] Verify that metrics are visible in testing

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
RagalahariP pushed a commit to RagalahariP/pixie that referenced this issue Mar 23, 2023
…intext metrics (pixie-io#903)

Summary: This updates the `SocketTracerMetrics` class to differentiate
between plaintext and tls metrics. Since we are working to instrument
BoringSSL based applications more broadly (pixie-io#692), this will allow us to
experiment with the underlying tls tracing implementation and verify
that there aren't protocol parsing issues introduced (indicated by more
data loss).

Relevant Issues: pixie-io#692

Type of change: /kind feature

Test Plan: Updated the `mux_trace_bpf_test` and
`netty_tls_trace_bpf_test` with the following diff to verify that they
increment the correct counter
([P317](https://phab.corp.pixielabs.ai/P317))
- [x] Inspect the counters values and dimensions despite end to end test
mentioned above
- [x] Verified that this data was not used for any previous purposes
(new dimension would likely cause breakage)

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
RagalahariP pushed a commit to RagalahariP/pixie that referenced this issue Mar 23, 2023
…he existing stats interface to it (pixie-io#907)

Summary: Expose conn tracker creation through prometheus metrics and
migrate the existing stats interface to it

As part of an upcoming change to expand pixie's tls support for
BoringSSL (pixie-io#692), we want to track conn tracker's lifecycle to have a
proxy measurement for if a connection's socket file descriptor is
identified correctly. Our TLS tracing requires this since it is
fundamental to a connection's identity. When our tls tracing
implementation is changed, we expect the `conn_tracker_created` metric
will stay at its existing baseline. Any increases in this metric
(assuming our protocol tracing support remains constant) would indicate
that the file descriptors are inferred incorrectly.

Relevant Issues: pixie-io#692

Type of change: /kind feature

Test Plan: Existing conn_tracker tests pass which rely on our counters
being correct
- [x] Verify that metrics are visible in testing

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
aimichelle pushed a commit that referenced this issue Mar 24, 2023
…#1089)

Summary: Add nginx container image with OpenSSL v3 for future tls
tracing test

The tls tracing method developed for #692 will support dynamically
linked OpenSSL v3 in addition to BoringSSL. In order to validate that
the new tracing method works (in addition to its feature flag), I wanted
to a test case that would prove the new implementation is in use.

The mirrored upstream nginx image added in this PR has a different
`/index.html` compared to our existing images. This change adds a
container layer containing the expected `index.html` so that the future
test assertions can remain the same.

Relevant Issues: #692

Type of change: /kind test-infra

Test Plan: Verified that the resulting nginx container returns the same
html file as our
[existing](https://github.com/pixie-io/pixie/blob/86dfb11dcbf605fbec1a317192886e52416fa4aa/src/stirling/source_connectors/socket_tracer/testing/containers/BUILD.bazel#L70)
nginx
[containers](https://github.com/pixie-io/pixie/blob/86dfb11dcbf605fbec1a317192886e52416fa4aa/src/stirling/source_connectors/socket_tracer/testing/containers/BUILD.bazel#L79)
```
# Run the new nginx image and an existing one built from bazel
ddelnano@turing:~/code/pixie$ docker run -d -p 80:80 bazel/src/stirling/source_connectors/socket_tracer/testing/containers:nginx_alpine_openssl_3_0_7_image
ddelnano@turing:~/code/pixie$ docker run -d -p 81:80 bazel/src/stirling/source_connectors/socket_tracer/testing/containers:nginx_openssl_1_1_1_image
48277f0656b2327de29917e063e80705c4f14e95d89081a0c8b9b4177846ca29

# Run the mirroed upstream image (used later to validate that the index.html is different)
ddelnano@turing:~/code/pixie$ docker run -d -p 82:80 gcr.io/pixie-oss/pixie-dev-public/docker-deps/library/nginx@sha256:3eb380b81387e9f2a49cb6e5e18db016e33d62c37ea0e9be2339e9f0b3e26170


# Verify that requests to `/` result in the same response
ddelnano@turing:~/code/pixie$ curl -s localhost:80 | sha256sum
38ffd4972ae513a0c79a8be4573403edcd709f0f572105362b08ff50cf6de521  -

ddelnano@turing:~/code/pixie$ curl -s localhost:81 | sha256sum
38ffd4972ae513a0c79a8be4573403edcd709f0f572105362b08ff50cf6de521  -

# Verify that the stock nginx 1.23 image returns a different index.html
ddelnano@turing:~/code/pixie$ curl -s localhost:82 | sha256sum
fb47468a2cd3953c7131431991afcc6a2703f14640520102eea0a685a7e8d6de  -
```

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
JamesMBartlett pushed a commit that referenced this issue Apr 6, 2023
…tls tracing method (#1161)

Summary: Instrument OpenSSL tracing to detect validity of assumptions
for new tls tracing method

After discussing #1123 with @oazizi000 and @etep, we decided that invest
in stronger validation that the assumptions for that PR and our future
tls tracing are valid. Instead of relying on struct offsets of user
space data structures, the new tracing method will access a connection's
socket fd from the underlying socket syscalls during `SSL_write` /
`SSL_read` calls.

This technique should only be compatible with "BIO native" OpenSSL use
cases, which are the OpenSSL use cases Pixie supports today. BIO native
means that a compatible application uses a
[BIO](https://wiki.openssl.org/index.php/BIO) provided by OpenSSL (via
[SSL_set_fd](https://www.openssl.org/docs/manmaster/man3/SSL_set_fd.html))
and results in OpenSSL issuing read/write syscalls on your behalf for
the underlying socket.

There are two situations we wanted to understand prior to proceeding
with #1123: how custom BIO implementations behave (netty, nodejs, etc)
and detecting unrelated (non socket) syscalls while `SSL_write` and
`SSL_read` are on the stack. The former was verified with an experiment
based on this change and is described in the Test Plan section below. We
believe the latter should not occur, but the changes in this PR
instrument this situation in order to detect if doest occur. Assuming
this condition isn't encountered, we will proceed with #1132 (with the
minor changes learned from this experiment).

Relevant Issues: #692

Type of change: /kind feature

Test Plan: Verified the following:
- [x] `DCHECK` added to `socket_trace_connector` does not detect
mismatched file descriptors for `openssl_trace_bpf_test` and
`netty_tls_trace_bpf_test` ([P345](https://phab.corp.pixielabs.ai/P345)
-- openssl_trace_bpf_test is still disabled - #699)
- [x] Verified with a BPF histogram that the number of active syscalls
within SSL_write/SSL_read are non zero for BIO native use cases and zero
for non BIO native cases (nodejs, netty). See more details in #1160

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
aimichelle pushed a commit that referenced this issue Apr 6, 2023
…anager (#1188)

Summary: Add `access-tls-socket-fd-via-syscall` feature flag to cloud
config manager

This feature flag will be used to opt internal clusters into the new tls
tracing implementation developed in #1120. We may want a more
sophisticated toggle for gradually opting in more clusters, but this
will suffice for the initial set of testing (validating some hand picked
clusters).

Relevant Issues: #692

Type of change: /kind feature

Test Plan: Verified the following:
- [x] Deploying a new pixie install without launch darkly credentials
results in the default, `false`, value
([P354](https://phab.corp.pixielabs.ai/P354))
- [x] Deploying a new pixie install with a cloud with launch darkly
credentials results in a `true` value
([P355](https://phab.corp.pixielabs.ai/P355))

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
ddelnano added a commit to ddelnano/pixie that referenced this issue May 1, 2023
…tls tracing method (pixie-io#1161)

Summary: Instrument OpenSSL tracing to detect validity of assumptions
for new tls tracing method

After discussing pixie-io#1123 with @oazizi000 and @etep, we decided that invest
in stronger validation that the assumptions for that PR and our future
tls tracing are valid. Instead of relying on struct offsets of user
space data structures, the new tracing method will access a connection's
socket fd from the underlying socket syscalls during `SSL_write` /
`SSL_read` calls.

This technique should only be compatible with "BIO native" OpenSSL use
cases, which are the OpenSSL use cases Pixie supports today. BIO native
means that a compatible application uses a
[BIO](https://wiki.openssl.org/index.php/BIO) provided by OpenSSL (via
[SSL_set_fd](https://www.openssl.org/docs/manmaster/man3/SSL_set_fd.html))
and results in OpenSSL issuing read/write syscalls on your behalf for
the underlying socket.

There are two situations we wanted to understand prior to proceeding
with pixie-io#1123: how custom BIO implementations behave (netty, nodejs, etc)
and detecting unrelated (non socket) syscalls while `SSL_write` and
`SSL_read` are on the stack. The former was verified with an experiment
based on this change and is described in the Test Plan section below. We
believe the latter should not occur, but the changes in this PR
instrument this situation in order to detect if doest occur. Assuming
this condition isn't encountered, we will proceed with pixie-io#1132 (with the
minor changes learned from this experiment).

Relevant Issues: pixie-io#692

Type of change: /kind feature

Test Plan: Verified the following:
- [x] `DCHECK` added to `socket_trace_connector` does not detect
mismatched file descriptors for `openssl_trace_bpf_test` and
`netty_tls_trace_bpf_test` ([P345](https://phab.corp.pixielabs.ai/P345)
-- openssl_trace_bpf_test is still disabled - pixie-io#699)
- [x] Verified with a BPF histogram that the number of active syscalls
within SSL_write/SSL_read are non zero for BIO native use cases and zero
for non BIO native cases (nodejs, netty). See more details in pixie-io#1160

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
ddelnano added a commit to ddelnano/pixie that referenced this issue May 1, 2023
…anager (pixie-io#1188)

Summary: Add `access-tls-socket-fd-via-syscall` feature flag to cloud
config manager

This feature flag will be used to opt internal clusters into the new tls
tracing implementation developed in pixie-io#1120. We may want a more
sophisticated toggle for gradually opting in more clusters, but this
will suffice for the initial set of testing (validating some hand picked
clusters).

Relevant Issues: pixie-io#692

Type of change: /kind feature

Test Plan: Verified the following:
- [x] Deploying a new pixie install without launch darkly credentials
results in the default, `false`, value
([P354](https://phab.corp.pixielabs.ai/P354))
- [x] Deploying a new pixie install with a cloud with launch darkly
credentials results in a `true` value
([P355](https://phab.corp.pixielabs.ai/P355))

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
JamesMBartlett pushed a commit that referenced this issue May 1, 2023
…ggle) (#1123)

Summary: Implement new tls tracing method behind stirling cli flag
(feature toggle)

The tls tracing method added in this PR determines a connection's
identity (socket fd) through a different mechanism from our existing
tracing. Instead of relying on struct offsets of user space data
structures, it accesses the socket fd via the underlying socket syscalls
while `SSL_write` / `SSL_read` calls occur. This is a prerequisite to
support BoringSSL because its rolling release style makes the previous
method of user space offsets untenable. This has the added benefit of
reducing our maintenance cost for our existing OpenSSL tracing. Assuming
future versions of OpenSSL maintain the same contract, we will not
require any code changes to support them -- Pixie will gain OpenSSL v3
support once this new tracing is the default (as mentioned in the later
testing).

Our assumption is that the applications that explicitly set the socket
fd on the SSL struct (the applications supported by our previous tracing
technique) --
[nginx](https://github.com/nginx/nginx/blob/dfe70f74a3558f05142fb552cea239add123d414/src/event/ngx_event_openssl.c#L1696),
[python](https://github.com/python/cpython/blob/e375bff03736f809fbc234010c087ef9d7e0d384/Modules/_ssl.c#L836)
all use Openssl with its native BIO interface. In order to verify that
assumption, this change will be feature flagged and monitored carefully
as its enabled on internal clusters. The existing [conn_stats_bytes
metric](https://github.com/pixie-io/pixie/blob/f45ced1803e6e44406f20f1171c15a24f4d5a17a/src/stirling/source_connectors/socket_tracer/metrics.cc#L62)
will be monitored for volume of tls traffic traced to verify there is no
loss of instrumentation coverage between the new and old method.

Relevant Issues: #692

Type of change: /kind feature

Test Plan: Verified the following
- [x] New test passes with `enable_openssl_v3_testing` bool flag enabled
([P336](https://phab.corp.pixielabs.ai/P336)). This verifies that the
new tracing technique and the feature toggle are functional since our
previous tracing does not support OpenSSL v3
- [x] Existing `openssl_trace_bpf_test` and `netty_tls_trace_bpf_test`
passes ([P335](https://phab.corp.pixielabs.ai/P335)). This is explicitly
mentioned since `openssl_trace_bpf_test` is still disabled #699
- [x] Verified the `bool_flag` added from PR feedback conditionally
enables the OpenSSL v3 testing
([P337](https://phab.corp.pixielabs.ai/P337))
- [x] Validate metrics from #1161 to verify our assumptions are correct

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
JamesMBartlett pushed a commit that referenced this issue May 3, 2023
…ng assumptions (mismatched fds) (#1270)

Summary: Add dimensional metrics to record applications that violate tls
tracing assumptions (mismatched fds)

Relevant Issues: #692

Type of change: /kind feature

Test Plan: Verified that the metrics added do not clash with the non
dimensional metrics. Please see
e83443a
for the code that was used to produce this output
([P363](https://phab.corp.pixielabs.ai/P363))

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
vihangm pushed a commit that referenced this issue May 8, 2023
Summary: Remove update to BPF map not used in status quo tls tracing

In order to vet the new style of tls tracing, we introduced a mechanism
for detecting mismatched fds (#1161). This instrumented all of our tls
tracing when it was first developed. When the new method of tls tracing
was introduced, we removed the mismatched fd detection from the status
quo tls tracing (#1123), however, this BPF map update was missed in that
refactor (#1123).

Relevant Issues: #692

Type of change: /kind bug

Test Plan: Existing tests pass

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
aimichelle pushed a commit that referenced this issue May 15, 2023
…v3 (#1337)

Summary: Enable TLS tracing for applications using dynamically linked
OpenSSL v3

Relevant Issues: #692

Type of change: /kind feature

Test Plan: Existing tests provide the necessary coverage

Changelog Message:
```release-note
TLS tracing now supports applications using OpenSSL v3
```

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
vihangm pushed a commit that referenced this issue Jun 8, 2023
…1449)

Summary: Refactor Go SDK label templating to support future boringcrypto
SDK

This PR adds the scaffolding needed to add a boringcrypto go SDK. This
SDK will be used in a future change to add TLS tracing tests for
binaries using boringcrypto, which addresses #597. It wasn't known that
boringcrypto was supported at the time, but we should still validate
that it is functional.

`rules_go` does not support go SDKs that use the same version with
different `GOEXPERIMENT`s enabled (will be following up to create a
GitHub issue on the project). This is an issue because boringcrypto is
enabled by setting `GOEXPERIMENT=boringcrypto` as mentioned
[here](https://go.googlesource.com/go/+/refs/heads/dev.boringcrypto/README.boringcrypto.md).
Until `rules_go` supports this, the proposed plan is to maintain a
previous patch version of our latest supported version of go as the
"boringcrypto go SDK". The description below should explain the process:

```
# rules_go doesn't support using multiple SDKs with the same version and differing
# GOEXPERIMENTs. Until this is addressed, go_sdk_boringcrypto is meant to be 1 bug fix
# version behind our latest go release. In the event our primary toolchain is upgraded
# to the first release of a new major version (i.e. 1.20.0) an rc suffixed build should
# be used for go_sdk_boringcrypto (1.20rcX) until the first minor release is available (1.20.1).
```

Relevant Issues: #597 #692

Type of change: /kind test-infra

Test Plan: Existing tests pass and verified this supports the
boringcrypto tests on a branch with the full set of changes

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
vihangm pushed a commit that referenced this issue Jun 21, 2023
…s metrics (#1518)

Summary: Record TLS library source in conn tracker and mismatched fd
prometheus metrics

This will allow us to identify which TLS library is in use for a given
ConnTracker. When TLS tracing probes are added for statically linked
binaries (BoringSSL), this will allow us to discern if future mismatched
fd cases are known cases or from expanding our coverage. This replaces
the existing `tls` dimension and models the plaintext case with
`kSSLNone` rather than maintaining two sources of the information.

Relevant Issues: #692

Type of change: /kind feature

Test Plan: Verified the following to ensure the appropriate metrics are
populated. The following
[revert](713c532)
shows what was needed to print out these metrics.
- [x] `netty_tls_trace_bpf_test` increments the netty specific counter
<details>
<summary>netty_tls_trace_bpf_test output</summary>

```
$ ./scripts/sudo_bazel_run.sh -c dbg src/stirling/source_connectors/socket_tracer:netty_tls_trace_bpf_test
[ ... ]
 
# HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed.
# TYPE data_loss_bytes counter
data_loss_bytes{protocol="kProtocolMux",tls_source="kLibNettyTcnativeSource"} 220
data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 0
# HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol.
# TYPE conn_stats_bytes counter
conn_stats_bytes{protocol="kProtocolMux",tls_source="kLibNettyTcnativeSource"} 618
conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 598970
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 131291
I20230616 16:34:10.592751 51190 container_runner.cc:53] podman rm -f thriftmux_server_2146055842888549 &>/dev/null
[       OK ] NettyTLSTraceTest/0.mtls_thriftmux_client (55081 ms)
[----------] 1 test from NettyTLSTraceTest/0 (55081 ms total)
 
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (55081 ms total)
[  PASSED  ] 1 test.
I20230616 16:34:11.655771 51190 env.cc:51] Shutting down
```

</details>

- [x] `openssl_trace_bpf_test` increments the counters for OpenSSL v1.1,
v3, NodeJS and libpython
<details>
<summary>openssl_trace_bpf_test output</summary>

```
ddelnano@vigenere:~/code/pixie (ddelnano/trace-boringssl-linked-applications) $ ./scripts/sudo_bazel_run.sh -c dbg src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test --test_arg='--gtest_filter=OpenSSLTraceTest*.ssl_capture_curl_client'
--
INFO: Invocation ID: d9486478-0df6-4bc1-9566-d452cb99d3d0
INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/d9486478-0df6-4bc1-9566-d452cb99d3d0
INFO: Analyzed target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test up-to-date:
bazel-bin/src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test
INFO: Elapsed time: 10.987s, Critical Path: 10.46s
INFO: 3 processes: 1 internal, 2 linux-sandbox.
INFO: Running command line: external/bazel_tools/tools/test/test-setup.sh /bin/bash -c '"$@"' /bin/bash sudo src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test '--gtest_filter=OpenSSLTraceTest*.ssl_capture_curl_client'
INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/d9486478-0df6-4bc1-9566-d452cb99d3d0
INFO: Build completed successfully, 3 total actions
exec ${PAGER:-/usr/bin/less} "$0" \|\| exit 1
Executing tests from //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test
-----------------------------------------------------------------------------
I20230616 16:54:05.120615 70678 env.cc:47] Started: src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test
Note: Google Test filter = OpenSSLTraceTest*.ssl_capture_curl_client
[==========] Running 6 tests from 6 test suites.
[----------] Global test environment set-up.
[----------] 1 test from OpenSSLTraceTest/0, where TypeParam = px::stirling::NginxOpenSSL_1_1_0_ContainerWrapper
[ RUN      ] OpenSSLTraceTest/0.ssl_capture_curl_client
 
[ ... ]
 
# HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed.
# TYPE data_loss_bytes counter
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 2254
data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0
# HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol.
# TYPE conn_stats_bytes counter
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 3191
conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 1223780
I20230616 16:54:38.904987 70678 container_runner.cc:53] podman rm -f curl_2147337100716919 &>/dev/null
I20230616 16:54:39.552585 70991 go_syms.cc:66] Falling back to the runtime.buildVersion symbol for go version detection
W20230616 16:54:39.609521 70991 uprobe_manager.cc:870] Failed to attach HTTP2 Uprobes to /proc/70948/root/usr/bin/podman: Internal : Unable to find offset for binary /proc/70948/root/usr/bin/podman symbol github.com/containers/podman/vendor/google.golang.org/grpc/internal/transport.(*http2Client).operateHeaders address 0
I20230616 16:54:39.626680 70991 uprobe_manager.cc:965] Number of uprobes deployed = 9
I20230616 16:54:52.648070 70678 container_runner.cc:53] podman rm -f nginx_2147304013395620 &>/dev/null
[       OK ] OpenSSLTraceTest/0.ssl_capture_curl_client (47937 ms)
[----------] 1 test from OpenSSLTraceTest/0 (47937 ms total)
 
[----------] 1 test from OpenSSLTraceTest/1, where TypeParam = px::stirling::NginxOpenSSL_1_1_1_ContainerWrapper
[ RUN      ] OpenSSLTraceTest/1.ssl_capture_curl_client
 
[ ... ]
 
# HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed.
# TYPE data_loss_bytes counter
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 0
data_loss_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 4508
data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0
# HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol.
# TYPE conn_stats_bytes counter
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 133916
conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 596
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6382
conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 2472737
I20230616 16:55:27.096170 70678 container_runner.cc:53] podman rm -f curl_2147385287171660 &>/dev/null
I20230616 16:55:27.743304 71611 go_syms.cc:66] Falling back to the runtime.buildVersion symbol for go version detection
W20230616 16:55:27.800357 71611 uprobe_manager.cc:870] Failed to attach HTTP2 Uprobes to /proc/71567/root/usr/bin/podman: Internal : Unable to find offset for binary /proc/71567/root/usr/bin/podman symbol github.com/containers/podman/vendor/google.golang.org/grpc/internal/transport.(*http2Client).operateHeaders address 0
I20230616 16:55:27.817342 71611 uprobe_manager.cc:965] Number of uprobes deployed = 9
I20230616 16:55:42.472515 70678 container_runner.cc:53] podman rm -f nginx_2147351974323900 &>/dev/null
[       OK ] OpenSSLTraceTest/1.ssl_capture_curl_client (49829 ms)
[----------] 1 test from OpenSSLTraceTest/1 (49829 ms total)
 
[----------] 1 test from OpenSSLTraceTest/2, where TypeParam = px::stirling::NginxOpenSSL_3_0_8_ContainerWrapper
[ RUN      ] OpenSSLTraceTest/2.ssl_capture_curl_client
 
[ ... ]
 
# HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed.
# TYPE data_loss_bytes counter
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 2254
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLUnspecified"} 4015
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 875776
data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0
data_loss_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 4508
# HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol.
# TYPE conn_stats_bytes counter
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 3184
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLUnspecified"} 167458
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 1582307
conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 2497556
conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 7036
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6382
I20230616 16:56:17.009613 70678 container_runner.cc:53] podman rm -f curl_2147435183916507 &>/dev/null
I20230616 16:56:17.652369 72745 go_syms.cc:66] Falling back to the runtime.buildVersion symbol for go version detection
W20230616 16:56:17.709861 72745 uprobe_manager.cc:870] Failed to attach HTTP2 Uprobes to /proc/72701/root/usr/bin/podman: Internal : Unable to find offset for binary /proc/72701/root/usr/bin/podman symbol github.com/containers/podman/vendor/google.golang.org/grpc/internal/transport.(*http2Client).operateHeaders address 0
I20230616 16:56:17.727325 72745 uprobe_manager.cc:965] Number of uprobes deployed = 9
I20230616 16:56:31.172713 70678 container_runner.cc:53] podman rm -f nginx_2147401729646825 &>/dev/null
[       OK ] OpenSSLTraceTest/2.ssl_capture_curl_client (48670 ms)
[----------] 1 test from OpenSSLTraceTest/2 (48670 ms total)
 
[----------] 1 test from OpenSSLTraceTest/3, where TypeParam = px::stirling::Python310ContainerWrapper
[ RUN      ] OpenSSLTraceTest/3.ssl_capture_curl_client
 
[ ... ]
 
# HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed.
# TYPE data_loss_bytes counter
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 2813
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 2254
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLUnspecified"} 4015
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 875776
data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0
data_loss_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 4508
# HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol.
# TYPE conn_stats_bytes counter
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 3650
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 3184
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLUnspecified"} 167458
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 1582307
conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 2501517
conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 7343
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6382
I20230616 16:57:06.748605 70678 container_runner.cc:53] podman rm -f curl_2147484954706117 &>/dev/null
I20230616 16:57:07.425081 73342 go_syms.cc:66] Falling back to the runtime.buildVersion symbol for go version detection
W20230616 16:57:07.481707 73342 uprobe_manager.cc:870] Failed to attach HTTP2 Uprobes to /proc/73297/root/usr/bin/podman: Internal : Unable to find offset for binary /proc/73297/root/usr/bin/podman symbol github.com/containers/podman/vendor/google.golang.org/grpc/internal/transport.(*http2Client).operateHeaders address 0
I20230616 16:57:07.498756 73342 uprobe_manager.cc:965] Number of uprobes deployed = 9
I20230616 16:57:20.024674 70678 container_runner.cc:53] podman rm -f python_min_310_https_server_2147450620127316 &>/dev/null
[       OK ] OpenSSLTraceTest/3.ssl_capture_curl_client (48825 ms)
[----------] 1 test from OpenSSLTraceTest/3 (48825 ms total)
 
[----------] 1 test from OpenSSLTraceTest/4, where TypeParam = px::stirling::Node12_3_1ContainerWrapper
[ RUN      ] OpenSSLTraceTest/4.ssl_capture_curl_client
 
[ ... ]
 
# HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed.
# TYPE data_loss_bytes counter
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kNodeJSSource"} 517
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 2813
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 2254
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLUnspecified"} 4015
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 875776
data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0
data_loss_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 4508
# HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol.
# TYPE conn_stats_bytes counter
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kNodeJSSource"} 1360
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 3650
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 3184
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLUnspecified"} 167458
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 1713598
conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 2505532
conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 7343
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6382
I20230616 16:57:54.744580 70678 container_runner.cc:53] podman rm -f curl_2147532942349787 &>/dev/null
I20230616 16:57:55.404923 73893 go_syms.cc:66] Falling back to the runtime.buildVersion symbol for go version detection
W20230616 16:57:55.462363 73893 uprobe_manager.cc:870] Failed to attach HTTP2 Uprobes to /proc/73846/root/usr/bin/podman: Internal : Unable to find offset for binary /proc/73846/root/usr/bin/podman symbol github.com/containers/podman/vendor/google.golang.org/grpc/internal/transport.(*http2Client).operateHeaders address 0
I20230616 16:57:55.480280 73893 uprobe_manager.cc:965] Number of uprobes deployed = 9
I20230616 16:58:08.744537 70678 container_runner.cc:53] podman rm -f node_server_2147499313108620 &>/dev/null
[       OK ] OpenSSLTraceTest/4.ssl_capture_curl_client (48721 ms)
[----------] 1 test from OpenSSLTraceTest/4 (48722 ms total)
 
[----------] 1 test from OpenSSLTraceTest/5, where TypeParam = px::stirling::Node14_18_1AlpineContainerWrapper
[ RUN      ] OpenSSLTraceTest/5.ssl_capture_curl_client
 
[ ... ]
 
# HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed.
# TYPE data_loss_bytes counter
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kNodeJSSource"} 1034
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 2813
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 2254
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLUnspecified"} 4015
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 875776
data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0
data_loss_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 4508
# HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol.
# TYPE conn_stats_bytes counter
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kNodeJSSource"} 2743
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 3650
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 3184
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLUnspecified"} 167458
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 1713598
conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 3746554
conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 7343
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6382
I20230616 16:58:43.613726 70678 container_runner.cc:53] podman rm -f curl_2147581809225276 &>/dev/null
I20230616 16:58:44.279163 74434 go_syms.cc:66] Falling back to the runtime.buildVersion symbol for go version detection
W20230616 16:58:44.335726 74434 uprobe_manager.cc:870] Failed to attach HTTP2 Uprobes to /proc/74389/root/usr/bin/podman: Internal : Unable to find offset for binary /proc/74389/root/usr/bin/podman symbol github.com/containers/podman/vendor/google.golang.org/grpc/internal/transport.(*http2Client).operateHeaders address 0
I20230616 16:58:44.353601 74434 uprobe_manager.cc:965] Number of uprobes deployed = 9
I20230616 16:58:57.204933 70678 container_runner.cc:53] podman rm -f node_server_2147548012141854 &>/dev/null
[       OK ] OpenSSLTraceTest/5.ssl_capture_curl_client (48439 ms)
[----------] 1 test from OpenSSLTraceTest/5 (48439 ms total)
 
[----------] Global test environment tear-down
[==========] 6 tests from 6 test suites ran. (292423 ms total)
[  PASSED  ] 6 tests.
I20230616 16:58:57.544296 70678 env.cc:51] Shutting down
```

</details>

- [x] `DCHECK` is triggered during a test run if the `ssl_source_t` is
not found for the given ssl library matcher.
```
[ .. ]
I20230616 16:10:34.742772  8830 socket_trace_connector.cc:427] Number of perf buffers opened = 8
W20230616 16:10:34.903129  9366 uprobe_symaddrs.cc:621] Unable to find openssl symbol 'OpenSSL_version_num' using dlopen/dlsym. Attempting to find address manually for pid 7416
F20230616 16:10:34.974184  9366 uprobe_manager.cc:336] Check failed: false Unable to find matching ssl_source_t for library matcher libnetty_tcnative_linux_x86
```

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
vihangm pushed a commit that referenced this issue Jun 22, 2023
Summary: Remove legacy TLS tracing feature toggle and transitional code

The new style of TLS tracing has been rolled out since April 11th
(vizier release v0.12.19). We believe that it is performing well and it
has already allowed Pixie's TLS tracing to cover more libraries (OpenSSL
v3) with BoringSSL coming soon.

This includes changes from #1518 and must be rebased once that is
merged.

Relevant Issues: #692

Type of change: /kind cleanup

Test Plan: Existing test coverage

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
vihangm pushed a commit that referenced this issue Jun 23, 2023
Summary: Trace statically linked OpenSSL compatible TLS libraries

This adds a feature toggle for enabling TLS tracing applications
statically linked with an OpenSSL compatible library. The plan is to
enable this for internal clusters for 1 week before enabling it for all
users (and making it the default).

We originally intended to add these probes for BoringSSL applications,
however, detecting BoringSSL is more difficult than anticipated. One of
the reliable indicators, checking for BoringSSL's [magic
tag](google/boringssl@89386ac),
is only available if an application is using a BoringSSL from Oct 2021
or later and from checking the following dependencies (different envoy
distributions, Clickhouse, the Mono runtime,
[cloudflare/boring](https://github.com/cloudflare/boring) applications
and Go binaries with `boringcrypto` enabled) that wasn't the case. In
addition, checking for the magic tag could cause performance issues.

Since our latest TLS tracing implementation provides a broad set of
coverage, we opted to trace any application that contains one of the
OpenSSL compatible symbols necessary for tracing (`SSL_write`). For BIO
native applications, we will successfully trace the traffic. For non BIO
native cases (envoy), the uprobes will trigger but won't capture any
data. This was deemed an acceptable trade off since detecting BoringSSL
was challenging and any indicator would likely involve a long tail of
upstream adoption.

Relevant Issues: #692

Type of change: /kind feature

Test Plan: Verified this change through the following:
- [x] `boringssl_trace_bpf_test` verifies tracing is successful
- [x] Verified that new tls source (`kStaticallyLinkedSource`) is
identified during `boringssl_trace_bpf_test` (when prometheus metrics
are logged out)
```
$ ./scripts/sudo_bazel_run.sh src/stirling/source_connectors/socket_tracer:boringssl_trace_bpf_test
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kStaticallyLinkedSource"} 1244
data_loss_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 90
data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0
# HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol.
# TYPE conn_stats_bytes counter
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kStaticallyLinkedSource"} 1638
conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0
conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 76521
```

Changelog Message:
```release-note
Add support for tracing encrypted traffic for statically linked OpenSSL/BoringSSL applications. This functionality is currently disabled but will be enabled by default in an upcoming release.
```

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
vihangm pushed a commit that referenced this issue Jun 26, 2023
Summary: Fix misspelling of `PX_TRACE_STATIC_TLS_BINARIES` PEM flag

Relevant Issues: #692

Type of change: /kind bug

Test Plan: grep'ed to make sure the flag is consistent with stirling's
flag
```
ddelnano@vigenere:~/code/pixie (ddelnano/attempt-to-reproduce-amqproxy-issue) $ git grep PX_TRACE_STATIC_TLS_BINARIES
src/stirling/source_connectors/socket_tracer/socket_trace_connector.cc:    stirling_trace_static_tls_binaries, gflags::BoolFromEnv("PX_TRACE_STATIC_TLS_BINARIES", false),
```

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
aimichelle pushed a commit that referenced this issue Jul 20, 2023
… by default (#1625)

Summary: Opt statically linked OpenSSL/BoringSSL applications into TLS
tracing by default

Relevant Issues: #692

Type of change: /kind feature

Test Plan: Feature flag was used for Pixie owned clusters.

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
etep pushed a commit to etep/pixie that referenced this issue Jul 25, 2023
… by default (pixie-io#1625)

Summary: Opt statically linked OpenSSL/BoringSSL applications into TLS
tracing by default

Relevant Issues: pixie-io#692

Type of change: /kind feature

Test Plan: Feature flag was used for Pixie owned clusters.

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
aimichelle pushed a commit that referenced this issue Aug 4, 2023
)

Summary: Add TLS tracing source debugging mode and associated feature
flag

This debugging feature flag is intended for adhoc investigations for
determining why a particular TLS source (libpython, statically linked
open/boringssl, OpenSSL v1.x, etc) was traced for a given protocol.
BoringSSL tracing was recently rolled out and it appeared this
functionality was covering every protocol that Pixie supports. This was
unexpected based on the open source / popular projects we believe should
be covered by this tracing.

As I tested this change, I found a bug in how a TLS source is
attributed. The libpython TLS source was identified as statically linked
(see Test Plan below for details). I believe this is likely the
discrepancy causing the statically linked source to show up for every
supported protocol. That bug will be addressed in a follow up change.

When reviewing the Test Plan, please note that the changes in
5a23004 were needed to produce the
prometheus output and to verify that the TLS source misattribution is
present.

Relevant Issues: #692

Type of change: /kind feature

Test Plan: Verified the following
- [x] Simulating the mismatched fd case still causes the `DCHECK` to
occur
<details><summary>openssl_trace_bpf_test with mismatched fd
error</summary>

```
# Introduce change to simulate mismatched fd case.
The openssl_trace_bpf_test cases are known to require calling send/read multiple times for the same fd. This change will cause the DCHECK to occur on the second read/write call
--
$ git diff
diff --git a/src/stirling/source_connectors/socket_tracer/bcc_bpf/socket_trace.c b/src/stirling/source_connectors/socket_tracer/bcc_bpf/socket_trace.c
index ba5df955dd..e1c64a1a7d 100644
--- a/src/stirling/source_connectors/socket_tracer/bcc_bpf/socket_trace.c
+++ b/src/stirling/source_connectors/socket_tracer/bcc_bpf/socket_trace.c
@@ -185,7 +185,7 @@ static __inline void propagate_fd_to_user_space_call(uint64_t pid_tgid, int fd)
int current_fd = nested_syscall_fd_ptr->fd;
if (current_fd == kInvalidFD) {
nested_syscall_fd_ptr->fd = fd;
-    } else if (current_fd != fd) {
+    } else /*if (current_fd != fd)*/ {
// Found two different fds during a single SSL_write/SSL_read call. This invalidates
// our tls tracing assumptions and must be recorded.
nested_syscall_fd_ptr->mismatched_fds = true;
 
# Verify that the mismatched fd case still causes a crash from the DCHECK
 
$ ./scripts/sudo_bazel_run.sh -c dbg  src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test
INFO: Invocation ID: 7c4390fd-e585-49fe-ac2a-22e17122567e
INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/7c4390fd-e585-49fe-ac2a-22e17122567e
INFO: Analyzed target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test up-to-date:
bazel-bin/src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test
INFO: Elapsed time: 3.826s, Critical Path: 3.39s
INFO: 4 processes: 1 internal, 3 linux-sandbox.
INFO: Running command line: external/bazel_tools/tools/test/test-setup.sh /bin/bash -c '"$@"' /bin/bash sudo src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test
INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/7c4390fd-e585-49fe-ac2a-22e17122567e
INFO: Build completed successfully, 4 total actions
exec ${PAGER:-/usr/bin/less} "$0" \|\| exit 1
Executing tests from //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test
-----------------------------------------------------------------------------
I20230804 14:55:53.269637 3497837 env.cc:47] Started: src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test
[==========] Running 18 tests from 6 test suites.
[----------] Global test environment set-up.
[----------] 3 tests from OpenSSLTraceTest/0, where TypeParam = px::stirling::NginxOpenSSL_1_1_0_ContainerWrapper
[ RUN      ] OpenSSLTraceTest/0.ssl_capture_curl_client
I20230804 14:55:53.423983 3497837 container_runner.cc:36] Loaded image: localhost/bazel/src/stirling/source_connectors/socket_tracer/testing/containers:nginx_openssl_1_1_0_image
I20230804 14:55:53.424044 3497837 container_runner.cc:114] podman run --timeout=3600 --rm -q --pid=host --name=nginx_6373812169417785 localhost/bazel/src/stirling/source_connectors/socket_tracer/testing/containers:nginx_openssl_1_1_0_image
I20230804 14:55:53.664819 3497837 container_runner.cc:144] Container nginx_6373812169417785 status: running
I20230804 14:55:53.698279 3497837 container_runner.cc:175] Container nginx_6373812169417785 process PID: 3497990
I20230804 14:55:53.698315 3497837 container_runner.cc:177] Container nginx_6373812169417785 waiting for log message:
I20230804 14:55:53.732992 3497837 container_runner.cc:189] Container nginx_6373812169417785 status: running
I20230804 14:55:53.733016 3497837 container_runner.cc:225] Container nginx_6373812169417785 is ready.
I20230804 14:55:54.733592 3497837 linux_headers.cc:211] Found Linux kernel version using .note section.
I20230804 14:55:54.733633 3497837 source_connector.cc:35] Initializing source connector: socket_trace_connector
I20230804 14:55:54.733664 3497837 linux_headers.cc:94] Obtained Linux version string from `uname`: 5.19.0-1022-gcp
I20230804 14:55:54.733670 3497837 linux_headers.cc:642] Detected kernel release (uname -r): 5.19.0-1022-gcp
I20230804 14:55:54.733700 3497837 bcc_wrapper.cc:121] Using linux headers found at /lib/modules/5.19.0-1022-gcp/build for BCC runtime.
I20230804 14:55:54.733760 3497837 bcc_wrapper.cc:170] Initializing BPF program ...
I20230804 14:56:04.312705 3497837 scoped_timer.h:48] Timer(init_bpf_program) : 9.58 s
I20230804 14:56:05.196256 3497837 socket_trace_connector.cc:437] Number of kprobes deployed = 40
I20230804 14:56:05.196297 3497837 socket_trace_connector.cc:438] Probes successfully deployed.
I20230804 14:56:05.196377 3497837 socket_trace_connector.cc:373] Initializing perf buffers with ncpus=96 and scaling_factor=0.0865385
I20230804 14:56:05.196416 3497837 socket_trace_connector.cc:362] Total perf buffer usage for kData buffers across all cpus: 268038720
I20230804 14:56:05.196425 3497837 socket_trace_connector.cc:362] Total perf buffer usage for kControl buffers across all cpus: 13353216
I20230804 14:56:05.304798 3497837 socket_trace_connector.cc:442] Number of perf buffers opened = 8
W20230804 14:56:12.303573 3498584 uprobe_manager.cc:852] Cannot analyze binary /proc/3498580/root/usr/bin/python3.10 for uprobe deployment. If file is under /var/lib, container may have terminated. Message = Can't find or process ELF file /proc/3498580/root/usr/bin/python3.10
I20230804 14:56:14.672582 3498584 uprobe_manager.cc:1000] Number of uprobes deployed = 11866
F20230804 14:56:14.748164 3498588 socket_trace_connector.cc:701] Check failed: error_code == kOpenSSLTraceOk (1 vs. 0)
*** Check failure stack trace: ***
```

</details>

- [x] Enabling tls source debugging on
72075b6 shows the `kLibPythonSource`
tls source incorrectly attributed as `kStaticallyLinkedSource`
<details><summary>openssl_trace_bpf_test prometheus metrics
output</summary>

```
$ git show
commit 2def9a3 (HEAD, ddelnano/ddelnano/add-tls-tracing-debug-feature-flag)
Author: Dom Del Nano <ddelnano@pixielabs.ai>
Date:   Thu Aug 3 15:35:52 2023 +0000
 
Add TLS tracing source debugging mode
 
Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
 
$ ./scripts/sudo_bazel_run.sh -c dbg  src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test --test_arg='--gtest_filter=OpenSSLTraceTest/3.ssl_capture_curl_client'
INFO: Invocation ID: 566f27c6-208f-4f86-83bb-b5e2a214f81b
INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/566f27c6-208f-4f86-83bb-b5e2a214f81b
INFO: Analyzed target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test up-to-date:
bazel-bin/src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test
INFO: Elapsed time: 4.238s, Critical Path: 3.57s
INFO: 3 processes: 1 remote cache hit, 1 internal, 1 linux-sandbox.
INFO: Running command line: external/bazel_tools/tools/test/test-setup.sh /bin/bash -c '"$@"' /bin/bash sudo src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test '--gtest_filter=OpenSSLTraceTest/3.ssl_capture_curl_client'
INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/566f27c6-208f-4f86-83bb-b5e2a214f81b
INFO: Build completed successfully, 3 total actions
exec ${PAGER:-/usr/bin/less} "$0" \|\| exit 1
Executing tests from //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test
-----------------------------------------------------------------------------
I20230804 14:29:07.070752 3476002 env.cc:47] Started: src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test
Note: Google Test filter = OpenSSLTraceTest/3.ssl_capture_curl_client
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from OpenSSLTraceTest/3, where TypeParam = px::stirling::Python310ContainerWrapper
[ RUN      ] OpenSSLTraceTest/3.ssl_capture_curl_client
 
[ ... ]
 
# HELP openssl_tls_source_debug Records the number of times a protocol was traced along with additional debugging information
# TYPE openssl_tls_source_debug counter
openssl_tls_source_debug{exe="python3",name="openssl_tls_source_debug",protocol="kProtocolHTTP",ssl_source="kStaticallyLinkedSource"} 1
# HELP java_proc_crashed_during_attach Count of Java process crashes during symbolization agent attach.
# TYPE java_proc_crashed_during_attach counter
java_proc_crashed_during_attach{name="java_proc_crashed_during_attach"} 0
# HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed.
# TYPE data_loss_bytes counter
data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kStaticallyLinkedSource"} 2813
data_loss_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 90
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 0
# HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol.
# TYPE conn_stats_bytes counter
conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 11156
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kStaticallyLinkedSource"} 3650
conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 132148
```

</details>

- [x] Enabling tls source debugging on
[e0b792d](e0b792d)
shows the `kLibPythonSource` tls source correctly attributed. This will
be fixed in a follow up PR but proves that this debug information has
tracked down the issue.
<details><summary>openssl_trace_bpf_test prometheus metrics
output</summary>

```
$ git show
--
commit 1c9200932f89e6bbf530f02fd607d5008aac1c44 (HEAD -> ddelnano/add-tls-tracing-debug-feature-flag)
Author: Dom Del Nano <ddelnano@pixielabs.ai>
Date:   Thu Aug 3 23:23:11 2023 +0000
 
Fix bug where non statically linked binaries were attributed as statically link
 
Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
 
diff --git a/src/stirling/source_connectors/socket_tracer/uprobe_manager.cc b/src/stirling/source_connectors/socket_tracer/uprobe_manager.cc
index 76850c250e..50ba85fc5b 100644
--- a/src/stirling/source_connectors/socket_tracer/uprobe_manager.cc
+++ b/src/stirling/source_connectors/socket_tracer/uprobe_manager.cc
@@ -654,10 +654,10 @@ int UProbeManager::DeployOpenSSLUProbes(const absl::flat_hash_set<md::UPID>& pid
// before the BPF map is updated. This value is cleaned up when the upid is
// terminated, so if attachment fails it will be deleted prior to the pid being
// reused.
-      openssl_source_map_->UpdateValue(pid.pid(), kStaticallyLinkedSource);
count_or = AttachOpenSSLUProbesOnStaticBinary(pid.pid());
 
-      if (count_or.ok()) {
+      if (count_or.ok() && count_or.ValueOrDie() > 0) {
+        openssl_source_map_->UpdateValue(pid.pid(), kStaticallyLinkedSource);
uprobe_count += count_or.ValueOrDie();
 
VLOG(1) << absl::Substitute(
 
$ ./scripts/sudo_bazel_run.sh -c dbg  src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test --test_arg='--gtest_filter=OpenSSLTraceTest/3.ssl_capture_curl_client'
Starting local Bazel server and connecting to it...
INFO: Invocation ID: 2d279cd6-45f7-4939-92d2-9af76a262954
INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/2d279cd6-45f7-4939-92d2-9af76a262954
INFO: Analyzed target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test (368 packages loaded, 38224 targets configured).
INFO: Found 1 target...
Target //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test up-to-date:
bazel-bin/src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test
INFO: Elapsed time: 8.733s, Critical Path: 0.90s
INFO: 1 process: 1 internal.
INFO: Running command line: external/bazel_tools/tools/test/test-setup.sh /bin/bash -c '"$@"' /bin/bash sudo src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test '--gtest_filter=OpenSSLTraceTest/3.ssl_capture_curl_client'
INFO: Streaming build results to: https://bb.corp.pixielabs.ai/invocation/2d279cd6-45f7-4939-92d2-9af76a262954
INFO: Build completed successfully, 1 total action
exec ${PAGER:-/usr/bin/less} "$0" \|\| exit 1
Executing tests from //src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test
-----------------------------------------------------------------------------
I20230804 14:26:54.002213 3474370 env.cc:47] Started: src/stirling/source_connectors/socket_tracer/openssl_trace_bpf_test
Note: Google Test filter = OpenSSLTraceTest/3.ssl_capture_curl_client
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from OpenSSLTraceTest/3, where TypeParam = px::stirling::Python310ContainerWrapper
[ RUN      ] OpenSSLTraceTest/3.ssl_capture_curl_client
I20230804 14:26:54.321714 3474370 container_runner.cc:36] Loaded image: localhost/bazel/src/stirling/source_connectors/socket_tracer/testing/containers/ssl:python_min_310_https_server
 
[ ... ]
 
# HELP openssl_tls_source_debug Records the number of times a protocol was traced along with additional debugging information
# TYPE openssl_tls_source_debug counter
openssl_tls_source_debug{exe="python3",name="openssl_tls_source_debug",protocol="kProtocolHTTP",ssl_source="kLibPythonSource"} 1
# HELP java_proc_crashed_during_attach Count of Java process crashes during symbolization agent attach.
# TYPE java_proc_crashed_during_attach counter
java_proc_crashed_during_attach{name="java_proc_crashed_during_attach"} 0
# HELP data_loss_bytes Total bytes of data loss for this protocol. Measured by bytes that weren't successfully parsed.
# TYPE data_loss_bytes counter
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 2813
data_loss_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 0
data_loss_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 90
data_loss_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 0
# HELP conn_stats_bytes Total bytes of data tracked by conn stats for this protocol.
# TYPE conn_stats_bytes counter
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 3650
conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 11156
conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 272395
 
I20230804 14:27:16.048719 3474370 container_runner.cc:53] podman rm -f curl_6372094307402945 &>/dev/null
I20230804 14:27:23.091778 3474370 container_runner.cc:53] podman rm -f python_min_310_https_server_6372073067154288 &>/dev/null
[       OK ] OpenSSLTraceTest/3.ssl_capture_curl_client (29434 ms)
[----------] 1 test from OpenSSLTraceTest/3 (29434 ms total)
 
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (29434 ms total)
[  PASSED  ] 1 test.
I20230804 14:27:23.436729 3474370 env.cc:51] Shutting down
```

</details>

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
vihangm pushed a commit that referenced this issue Aug 7, 2023
…g probes are added (#1655)

Summary: Ensure `kStaticallyLinkedSource` is attributed when static TLS
tracing probes are added

When developing #1652, we noticed that there are situations where our
prometheus metrics are annotated with `kStaticallyLinkedSource` instead
of the correct tls source. This is due to the following [conditional
logic](https://github.com/pixie-io/pixie/blob/44a8338a60e74564d94d474a7ed723717d2ebaaf/src/stirling/source_connectors/socket_tracer/uprobe_manager.cc#L646)
using a mutable variable which was returning true even when uprobes were
already attached. When tracing a dynamically linked process, the uprobe
count will be recorded to a non zero value when the dynamic probes are
[attached](https://github.com/pixie-io/pixie/blob/44a8338a60e74564d94d474a7ed723717d2ebaaf/src/stirling/source_connectors/socket_tracer/uprobe_manager.cc#L615).
The NodeJS uprobes in
[AttachNodeJsOpenSSLUprobes](https://github.com/pixie-io/pixie/blob/44a8338a60e74564d94d474a7ed723717d2ebaaf/src/stirling/source_connectors/socket_tracer/uprobe_manager.cc#L629C16-L629C42)
will then reset the `count_or` integer value back to 0 (since
dynamically linked applications cannot be NodeJS -- it's statically
linked). This causes us to erroneously attempt (and fail) to attach the
statically linked probes in addition to setting the tls source to
`kStaticallyLinkedSource`.

I attempted to refactor the uprobe counting logic and this bug in a
single change, but I believe it will increase the scope of this. The
OpenSSL probe specs are shared amongst the dynamic, statically linked
and NodeJS cases, which causes complications with counting the number of
attached (rather than returning the potential count). While we are
confident this bug is likely the source of the confusing BoringSSL
tracing results, it would be beneficial to have that validation sooner.

I will follow up this change with refactoring this code, but would
prefer to proceed with this fix to unblock the BoringSSL tracing
validation.

Relevant Issues: #692

Type of change: /kind bug

Test Plan: Printed out the promtheus `conn_stats_bytes` metrics during
`openssl_trace_bpf_test` and verified they match the given test case
<details><summary> openssl_trace_bpf_test metric output</summary>

```
$ ./scripts/sudo_bazel_run.sh -c dbg  src/stirling/source_connectors/socket_tracer:openssl_trace_bpf_test   --test_arg='--gtest_filter=OpenSSLTraceTest/*.ssl_capture_curl_client' 2>&1 \| grep '^conn_stats_bytes\\|TypeParam'
--
[----------] 1 test from OpenSSLTraceTest/0, where TypeParam = px::stirling::NginxOpenSSL_1_1_0_ContainerWrapper
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 3191
conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 3591
conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 416
[----------] 1 test from OpenSSLTraceTest/1, where TypeParam = px::stirling::NginxOpenSSL_1_1_1_ContainerWrapper
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6375
conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 6989
conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 0
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 832
[----------] 1 test from OpenSSLTraceTest/2, where TypeParam = px::stirling::NginxOpenSSL_3_0_8_ContainerWrapper
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 3184
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6375
conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 10484
conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 307
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 1248
[----------] 1 test from OpenSSLTraceTest/3, where TypeParam = px::stirling::Python310ContainerWrapper
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 3650
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 3184
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6375
conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 14350
conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 307
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 1664
[----------] 1 test from OpenSSLTraceTest/4, where TypeParam = px::stirling::Node12_3_1ContainerWrapper
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kNodeJSSource"} 1360
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 3650
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 3184
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6375
conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 18367
conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 307
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 133152
[----------] 1 test from OpenSSLTraceTest/5, where TypeParam = px::stirling::Node14_18_1AlpineContainerWrapper
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kNodeJSSource"} 2743
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibPythonSource"} 3650
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_3_Source"} 3184
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kLibSSL_1_1_Source"} 6375
conn_stats_bytes{protocol="kProtocolUnknown",tls_source="kSSLNone"} 22319
conn_stats_bytes{protocol="kProtocolDNS",tls_source="kSSLNone"} 307
conn_stats_bytes{protocol="kProtocolHTTP",tls_source="kSSLNone"} 155820
```
</details>

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
vihangm pushed a commit that referenced this issue Aug 17, 2023
…d announce BoringSSL support (#1678)

Summary: Update static tls (BoringSSL) tracing feature flag default to
true and announce BoringSSL support

Relevant Issues: #692

Type of change: /kind feature

Test Plan: This tls tracing has been enabled since v0.14.3's release
last month (#1625) and the metrics for internal clusters have been
validated

Changelog Message:
```release-notes
Enhance TLS tracing to support statically linked OpenSSL and BoringSSL (OpenSSL API compatible libraries).
```

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
@ddelnano
Copy link
Member Author

This is complete as of the v0.14.3 Vizier release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datacollector Issues related to Stirling (datacollector)
Projects
None yet
Development

No branches or pull requests

1 participant