-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stirling self eBPF probes fail on certain instances due to VMA mapping #1630
Comments
We were able to pinpoint this problematic behavior with a more simplistic test (without running stirling). The issue occurs when a process's binary is not the first entry in the After investigating more, we realized that running
Now that we understand what leads to this unexpected mapping, we will be able to more confidently address the issue in addition to creating a test for it. |
…mmon virtual memory mapping (#1637) Summary: Fix virtual to binary addr conversion for processes that have an uncommon virtual memory mapping Our previous virtual to binary address conversion logic assumed that the first offset within `/proc/$PID/maps` was the correct offset to apply for PIE binaries. There are certain cases, such as when an unlimited stack ulimit is applied, where this assumption doesn't hold true (see the linked issue before for more details). This change adjusts our conversion logic to take into account the correct `/proc/$PID/maps` entry so address conversion works in all known cases. Relevant Issues: #1630 Type of change: /kind bug Test Plan: Verified the following: - [x] New test verifies the status quo case as well as the situation reported in #1630 - [x] Verified `perf_profiler_bpf_test` passes when the perf profiler uses the ELF symbolizer --------- Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
This will be fixed in the next release (v0.14.4). |
We received a report of a 150 node cluster that is experiencing the following crash on all of its PEMs. The error is originating from stirling's inability to set a BPF probe on its ConnInfoMapCleanupTrigger. Since this probe performs garbage collection and prevents a slow memory leak, stirling intentionally exits if this attachment fails.
The 0x414c9215e9f0 address above means our address converter utility believes the symbol exists at that position in stirling's binary. This cannot be the case since the stirling is much smaller than the ~65TiB (0x414c9215e9f0 / 1024^4) that the value corresponds to.
To debug this, we collected the
/proc/$pid/maps
entries from the working and failing cases:After comparing the memory maps between the two cases, it appears that the
ElfAddressConverter::VirtualAddrToBinaryAddr
should be identifying the offset as 0x55c583d2a000 instead of 0x152d04eb8000. This is occurring because the address translation function always returns the first virtual memory map found in/proc/$pid/maps
. This seems to indicate that this assumption (copied below) is not always valid and the binary path must be used to find the correct map entry. It appears the dynamic loader is mapping the stirling executable segments to very different VMAs compared to other Pixie users.The following patch was tested against the end user's cluster and got their PEM working. I would have expected this to be a product of a custom environment (custom kernel, etc), however, that doesn't seem to be the case. This address translation logic has been in place since we supported PIE binaries and has been released for ~7 months, so it's surprising this hasn't surface sooner.
App information:
Additional context
The full details of this debugging can be found on this community slack thread.
The text was updated successfully, but these errors were encountered: