Skip to content

Conversation

@benkilimnik
Copy link
Member

@benkilimnik benkilimnik commented Dec 21, 2023

Summary: Capture local address from socket if present for accept syscalls. This will support standalone pem entity relationships.

Note that for syscalls without socket information (e.g. connect), we are currently unable to trace the local IP from bpf. For cases where we fail to trace connect/accept calls, we do try to parse the local address from socket information via InferConnInfo.

Type of change: /kind feature

Test Plan: Ran standalone pem on dev cluster and tracked local address inference from bpf. Ran PxL script for http events.

Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
@benkilimnik benkilimnik requested a review from a team December 21, 2023 21:34
@etep etep self-requested a review December 22, 2023 20:10
@etep

This comment was marked as outdated.

@benkilimnik

This comment was marked as outdated.

Comment on lines +384 to 388
if (socket != NULL) {
read_sockaddr_kernel(&conn_info, socket);
} else if (addr != NULL) {
conn_info.raddr = *((union sockaddr_t*)addr);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this reordering indicate that there was a bug with our previous implementation? I believe I understand that this wouldn't work with the new logic, but it seems like our previous code would be susceptible as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benkilimnik any thoughts on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ddelnano @JamesMBartlett thanks for the review and apologies for the delayed response!

For accept syscalls via

process_syscall_accept -> submit_new_conn(args->addr, args->sock_alloc_socket...)

our previous logic only looked at the kernel vars when the IP of the connecting client addr is NULL. Since we only cared about finding the remote address, there was no need to go looking in the socket if we already got it from the args.

The reordering in this PR has us look at the network socket to find the local address even when the remote address is provided.

Note that for connect syscalls via

process_syscall_connect -> submit_new_conn(args->addr, /*socket*/ NULL...)

we never set the local address from kernel vars because socket is always NULL

@oazizi000
Copy link
Contributor

@benkilimnik Do you have any stats on what percentage of connections have local address resolved in eBPF, what percentage require inference, and what percentage are never discovered?

@benkilimnik
Copy link
Member Author

benkilimnik commented Jan 17, 2024

Thanks for the comment @oazizi000
I've run a quick test to approximate those percentages for local address discovery:

  • Running the standalone pem on a dev cluster with the px-sock-shop and px-kafka demos for ~20min shows an overall eBPF resolve rate of ~38%, with 0.01% successfully inferred from the socket FD via InferConnInfo, and ~62% not discovered.

I believe that most of the connections with unresolved local addresses come from connect syscalls which don’t have a socket for us to parse a local address from.

Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
@benkilimnik benkilimnik changed the title [Add Local Addr & Port 2/3] Capture local addr from socket for accept syscalls [Add Local Addr & Port 1/3] Capture local addr from socket for accept syscalls Jan 18, 2024
@benkilimnik benkilimnik marked this pull request as ready for review January 18, 2024 02:52
@oazizi000
Copy link
Contributor

Thanks for the comment @oazizi000 I've run a quick test to approximate those percentages for local address discovery:

* Running the standalone pem on a dev cluster with the `px-sock-shop` and `px-kafka` demos for ~20min shows an overall eBPF resolve rate of ~38%, with 0.01% successfully inferred from the socket FD via `InferConnInfo`, and ~62% not discovered.

I believe that most of the connections with unresolved local addresses come from connect syscalls which don’t have a socket for us to parse a local address from.

Interesting. In that case, it might be interesting to break down the stats by trace_role. We primarily do server-side tracing, so depending on the traffic pattern, connect calls being the majority is a little surprising.

Other than those curiosities, code looks good.

@benkilimnik
Copy link
Member Author

benkilimnik commented Jan 23, 2024

Here the stats broken down by trace_role for a cluster running px-kafka and px-sock-shop.

kRoleClient: 0% resolved
kRoleServer: 99.76% resolved

Note that for this test I disabled parsing local addresses from socket information via InferConnInfo. With this enabled, the resolve rates increase marginally. This behavior is consistent with the hypothesis that client-side connect syscalls are the primary source of unresolved local addresses.

These percentages were computed in two ways:
(1) Adding logs to ConnTracker::AddConnOpenEvent and
(2) Modifying the http_data and kafka_data PxL scripts to aggregate by the new local_addr column and trace_role

df = df.groupby(['local_addr', 'trace_role']).agg(count=('local_addr', px.count))

… to find local_addr if accept/connect not traced

Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
@oazizi000
Copy link
Contributor

Thanks for the data. Looks pretty reliable for the server-side tracing capabilities.

As we discussed in the meeting, the main concern is around expectations for Pixie users, especially since there's an alternate way to get the local IP from the pid when in a Kubernetes context. But given the reliability of this approach for server-side tracing, I think this is good to move forward.

As a follow-up, we should investigate if there's a way to collect the local address in BPF space with client-side tracing. I will create an issue for that.

Will merge this.

@oazizi000 oazizi000 merged commit 0ec2c3a into pixie-io:main Jan 24, 2024
@oazizi000
Copy link
Contributor

Follow-up issue filed here: #1829

oazizi000 pushed a commit that referenced this pull request Jan 25, 2024
Summary: Adds data columns for local IP address and port to the socket
tracer, which are populated by #1808 and #1809. This will support
standalone pem entity relationships.

Type of change: /kind feature

Test Plan: Existing targets

Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants