Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stirling self eBPF probes fail on certain instances due to VMA mapping #1630

Closed
ddelnano opened this issue Jul 20, 2023 · 2 comments
Closed
Assignees
Labels
area/datacollector Issues related to Stirling (datacollector) kind/bug Something isn't working

Comments

@ddelnano
Copy link
Member

ddelnano commented Jul 20, 2023

We received a report of a 150 node cluster that is experiencing the following crash on all of its PEMs. The error is originating from stirling's inability to set a BPF probe on its ConnInfoMapCleanupTrigger. Since this probe performs garbage collection and prevents a slow memory leak, stirling intentionally exits if this attachment fails.

Defaulted container "pem" out of: pem, qb-wait (init)                                                                                                                                                                                       
I20230712 21:41:44.971129 2689937 env.cc:47] Started: /app/src/vizier/services/agent/pem/pem                                                                                                                                                
Started external stacktrace collection signal processor thread                                                        
I20230712 21:41:44.971361 2689937 pem_main.cc:57] Pixie PEM. Version: v0.13.6+Distribution.11536bf.202305312046.1.RELEASE.jenkins, id: 10c6ae49-c72d-4bd7-abd5-2245ec106f99                                                                 I20230712 21:41:44.971405 2689937 stirling.cc:958] Creating Stirling, registered sources: [process_stats, network_stats, jvm_stats, socket_tracer, perf_profiler, proc_exit_tracer, stirling_error]                                         
I20230712 21:41:44.971427 2689937 system_info.cc:42] Location of proc: /proc                                          
I20230712 21:41:44.971432 2689937 system_info.cc:43] Location of sysfs: /sys/fs                                       
I20230712 21:41:44.971437 2689937 system_info.cc:44] Number of CPUs: 48                                               
I20230712 21:41:44.971472 2689937 system_info.cc:35] /proc/version:                                                   
Linux version 5.15.0-1037-aws (buildd@lcy02-amd64-019) (gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #41~20.04.1-Ubuntu SMP Mon May 22 18:18:00 UTC 2023                                               
I20230712 21:41:44.971575 2689937 system_info.cc:35] /host/etc/os-release:                                            
NAME="Ubuntu"                                                                                                         
VERSION="20.04.6 LTS (Focal Fossa)"                                                                                   
ID=ubuntu                                                                                                             
ID_LIKE=debian                                                                                                        
PRETTY_NAME="Ubuntu 20.04.6 LTS"                                                                                      
VERSION_ID="20.04"                                                                                                    
HOME_URL="https://www.ubuntu.com/"                                                                                    
SUPPORT_URL="https://help.ubuntu.com/"                                                                                
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"                                                                   
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"                                   
VERSION_CODENAME=focal                                                                                                
UBUNTU_CODENAME=focal                                 
I20230712 21:41:44.971611 2689937 system_info.cc:35] /host/etc/lsb-release:                                           
DISTRIB_ID=Ubuntu         
DISTRIB_RELEASE=20.04                                                                                                 
DISTRIB_CODENAME=focal    
DISTRIB_DESCRIPTION="Ubuntu 20.04.6 LTS"                                                                                                                                                                                                    
I20230712 21:41:44.971642 2689937 probe_cleaner.cc:102] Cleaning probes from /sys/kernel/debug/tracing/kprobe_events with the following marker: __pixie__                                                                                   
I20230712 21:41:46.317469 2689937 probe_cleaner.cc:117] All Stirling probes removed (count=40)                        
I20230712 21:41:46.317507 2689937 probe_cleaner.cc:102] Cleaning probes from /sys/kernel/debug/tracing/uprobe_events with the following marker: __pixie__                                                                                   
I20230712 21:41:46.317525 2689937 probe_cleaner.cc:117] All Stirling probes removed (count=0)                         
I20230712 21:41:46.317533 2689937 source_connector.cc:35] Initializing source connector: process_stats                
I20230712 21:41:46.317538 2689937 stirling.cc:438] Adding info class: [process_stats/process_stats]
I20230712 21:41:46.317545 2689937 source_connector.cc:35] Initializing source connector: network_stats                
I20230712 21:41:46.317550 2689937 stirling.cc:438] Adding info class: [network_stats/network_stats]                   
I20230712 21:41:46.317556 2689937 source_connector.cc:35] Initializing source connector: jvm_stats                    
I20230712 21:41:46.317561 2689937 stirling.cc:438] Adding info class: [jvm_stats/jvm_stats]                           
I20230712 21:41:46.317723 2689937 linux_headers.cc:211] Found Linux kernel version using .note section.               
I20230712 21:41:46.317742 2689937 source_connector.cc:35] Initializing source connector: socket_tracer
I20230712 21:41:46.317755 2689937 linux_headers.cc:94] Obtained Linux version string from `uname`: 5.15.0-1037-aws    
I20230712 21:41:46.317761 2689937 linux_headers.cc:642] Detected kernel release (uname -r): 5.15.0-1037-aws
I20230712 21:41:46.317909 2689937 linux_headers.cc:448] Looking for host mounted headers at /host/lib/modules/5.15.0-1037-aws                                                                                                               
I20230712 21:41:46.317952 2689937 linux_headers.cc:493] Linked linux headers found at /host/usr/src/linux-headers-5.15.0-1037-aws to symlink at /lib/modules/5.15.0-1037-aws/build                                                          
I20230712 21:41:46.317962 2689937 bcc_wrapper.cc:121] Using linux headers found at /lib/modules/5.15.0-1037-aws/build for BCC runtime.                                                                                                      
I20230712 21:41:46.317992 2689937 bcc_wrapper.cc:170] Initializing BPF program ...
I20230712 21:41:56.260188 2689937 scoped_timer.h:48] Timer(init_bpf_program) : 9.94 s                                 
cannot create /var/tmp/bcc                                                                                            
WARNING: cannot get prog tag, ignore saving source with program tag                                                   
cannot create /var/tmp/bcc                                                                                            
WARNING: cannot get prog tag, ignore saving source with program tag
…
WARNING: cannot get prog tag, ignore saving source with program tag
cannot create /var/tmp/bcc
WARNING: cannot get prog tag, ignore saving source with program tag
cannot create /var/tmp/bcc
WARNING: cannot get prog tag, ignore saving source with program tag
I20230712 21:38:28.807008 3744569 socket_trace_connector.cc:427] Number of kprobes deployed = 40
I20230712 21:38:28.807057 3744569 socket_trace_connector.cc:428] Probes successfully deployed.
I20230712 21:38:28.807106 3744569 socket_trace_connector.cc:362] Initializing perf buffers with ncpus=16 and scaling_factor=0.375
I20230712 21:38:28.807139 3744569 socket_trace_connector.cc:351] Total perf buffer usage for kData buffers across all cpus: 267721600
I20230712 21:38:28.807169 3744569 socket_trace_connector.cc:351] Total perf buffer usage for kControl buffers across all cpus: 13212048
I20230712 21:38:28.876703 3744569 socket_trace_connector.cc:432] Number of perf buffers opened = 8
F20230712 21:38:28.917058 3744569 socket_trace_bpf_tables.cc:77] Check failed: _s.ok() Bad Status: Internal : Unable to find offset for binary /app/src/vizier/services/agent/pem/pem.runfiles/px/src/vizier/services/agent/pem/pem symbol  address 414c9215e9f0

The 0x414c9215e9f0 address above means our address converter utility believes the symbol exists at that position in stirling's binary. This cannot be the case since the stirling is much smaller than the ~65TiB (0x414c9215e9f0 / 1024^4) that the value corresponds to.

To debug this, we collected the /proc/$pid/maps entries from the working and failing cases:

After comparing the memory maps between the two cases, it appears that the ElfAddressConverter::VirtualAddrToBinaryAddr should be identifying the offset as 0x55c583d2a000 instead of 0x152d04eb8000. This is occurring because the address translation function always returns the first virtual memory map found in /proc/$pid/maps. This seems to indicate that this assumption (copied below) is not always valid and the binary path must be used to find the correct map entry. It appears the dynamic loader is mapping the stirling executable segments to very different VMAs compared to other Pixie users.

   * at the first loadable segment in the ELF file and compare it to the first entry in the
   *  /proc/PID/maps file to see how the loader changed the virtual address. This works because the
   * loader guarantees that the relative offsets of the different segments remain the same, regardless
   * of where in virtual address space it ends up putting the segment.```

The following patch was tested against the end user's cluster and got their PEM working. I would have expected this to be a product of a custom environment (custom kernel, etc), however, that doesn't seem to be the case. This address translation logic has been in place since we supported PIE binaries and has been released for ~7 months, so it's surprising this hasn't surface sooner.

App information:

  • Pixie version -- v0.13.6
  • K8s cluster version -- 1.24.15
  • Node Kernel version -- 5.15.0-1039-aws
  • Container runtime -- containerd 1.6.18
  • CNI plugin -- calico 3.26.1 running in eBPF mode
  • OS -- Ubuntu focal

Additional context
The full details of this debugging can be found on this community slack thread.

@ddelnano ddelnano added kind/bug Something isn't working area/datacollector Issues related to Stirling (datacollector) labels Jul 20, 2023
@ddelnano ddelnano self-assigned this Jul 20, 2023
@ddelnano ddelnano changed the title Stirling self eBPF probes fail on certain instances due to process memory mapping Stirling self eBPF probes fail on certain instances due to VMA mapping Jul 20, 2023
@ddelnano
Copy link
Member Author

We were able to pinpoint this problematic behavior with a more simplistic test (without running stirling). The issue occurs when a process's binary is not the first entry in the /proc/$pid/maps file, therefore running cat /proc/self/maps can indicate if the bug will occur or not.

After investigating more, we realized that running cat /proc/self/maps outside of a container on an impacted node resulted in the expected VMAs, while running it inside k8s experienced the bug. We narrowed down this behavior difference to systemd unit configuration that changed containerd's stack size ulimit:

# Verify that VMA mapping is as expected with no configuration changes
$ sudo docker run -it --rm busybox cat /proc/self/maps
55bc64ccc000-55bc64cda000 r--p 00000000 00:6a 1831547                    /bin/cat
55bc64cda000-55bc64d98000 r-xp 0000e000 00:6a 1831547                    /bin/cat
55bc64d98000-55bc64dc1000 r--p 000cc000 00:6a 1831547                    /bin/cat
55bc64dc1000-55bc64dc5000 r--p 000f4000 00:6a 1831547                    /bin/cat
55bc64dc5000-55bc64dc6000 rw-p 000f8000 00:6a 1831547                    /bin/cat
55bc64dc6000-55bc64dc7000 rw-p 00000000 00:00 0
7fc066a95000-7fc066a97000 rw-p 00000000 00:00 0
7fc066a97000-7fc066abd000 r--p 00000000 00:6a 4387477                    /lib/libc.so.6
7fc066abd000-7fc066c12000 r-xp 00026000 00:6a 4387477                    /lib/libc.so.6
7fc066c12000-7fc066c65000 r--p 0017b000 00:6a 4387477                    /lib/libc.so.6
7fc066c65000-7fc066c69000 r--p 001ce000 00:6a 4387477                    /lib/libc.so.6
7fc066c69000-7fc066c6b000 rw-p 001d2000 00:6a 4387477                    /lib/libc.so.6
7fc066c6b000-7fc066c78000 rw-p 00000000 00:00 0
7fc066c78000-7fc066c7b000 r--p 00000000 00:6a 4387500                    /lib/libresolv.so.2
7fc066c7b000-7fc066c83000 r-xp 00003000 00:6a 4387500                    /lib/libresolv.so.2
7fc066c83000-7fc066c85000 r--p 0000b000 00:6a 4387500                    /lib/libresolv.so.2
7fc066c85000-7fc066c86000 r--p 0000d000 00:6a 4387500                    /lib/libresolv.so.2
7fc066c86000-7fc066c87000 rw-p 0000e000 00:6a 4387500                    /lib/libresolv.so.2
7fc066c87000-7fc066c89000 rw-p 00000000 00:00 0
7fc066c89000-7fc066c99000 r--p 00000000 00:6a 4387478                    /lib/libm.so.6
7fc066c99000-7fc066d0c000 r-xp 00010000 00:6a 4387478                    /lib/libm.so.6
7fc066d0c000-7fc066d66000 r--p 00083000 00:6a 4387478                    /lib/libm.so.6
7fc066d66000-7fc066d67000 r--p 000dc000 00:6a 4387478                    /lib/libm.so.6
7fc066d67000-7fc066d68000 rw-p 000dd000 00:6a 4387478                    /lib/libm.so.6
7fc066d68000-7fc066d6a000 rw-p 00000000 00:00 0
7fc066d6a000-7fc066d6b000 r--p 00000000 00:6a 4387472                    /lib/ld-linux-x86-64.so.2
7fc066d6b000-7fc066d90000 r-xp 00001000 00:6a 4387472                    /lib/ld-linux-x86-64.so.2
7fc066d90000-7fc066d9a000 r--p 00026000 00:6a 4387472                    /lib/ld-linux-x86-64.so.2
7fc066d9a000-7fc066d9c000 r--p 00030000 00:6a 4387472                    /lib/ld-linux-x86-64.so.2
7fc066d9c000-7fc066d9e000 rw-p 00032000 00:6a 4387472                    /lib/ld-linux-x86-64.so.2
7ffdd78d5000-7ffdd78f6000 rw-p 00000000 00:00 0                          [stack]
7ffdd798e000-7ffdd7992000 r--p 00000000 00:00 0                          [vvar]
7ffdd7992000-7ffdd7994000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0                  [vsyscall]

# Update unit file and verify that LimitSTACK=infinity is set
$ sudo vim /lib/systemd/system/containerd.service
$ sudo grep STACK /lib/systemd/system/containerd.service
LimitSTACK=infinity

$ sudo systemctl daemon-reload
$ sudo systemctl restart containerd

# Verify that the unexpected VMA occurs after the restart
$ sudo docker run -it --rm busybox cat /proc/self/maps
14aa1aa8e000-14aa1aa90000 rw-p 00000000 00:00 0
14aa1aa90000-14aa1aab6000 r--p 00000000 00:6a 4387477                    /lib/libc.so.6
14aa1aab6000-14aa1ac0b000 r-xp 00026000 00:6a 4387477                    /lib/libc.so.6
14aa1ac0b000-14aa1ac5e000 r--p 0017b000 00:6a 4387477                    /lib/libc.so.6
14aa1ac5e000-14aa1ac62000 r--p 001ce000 00:6a 4387477                    /lib/libc.so.6
14aa1ac62000-14aa1ac64000 rw-p 001d2000 00:6a 4387477                    /lib/libc.so.6
14aa1ac64000-14aa1ac71000 rw-p 00000000 00:00 0
14aa1ac71000-14aa1ac74000 r--p 00000000 00:6a 4387500                    /lib/libresolv.so.2
14aa1ac74000-14aa1ac7c000 r-xp 00003000 00:6a 4387500                    /lib/libresolv.so.2
14aa1ac7c000-14aa1ac7e000 r--p 0000b000 00:6a 4387500                    /lib/libresolv.so.2
14aa1ac7e000-14aa1ac7f000 r--p 0000d000 00:6a 4387500                    /lib/libresolv.so.2
14aa1ac7f000-14aa1ac80000 rw-p 0000e000 00:6a 4387500                    /lib/libresolv.so.2
14aa1ac80000-14aa1ac82000 rw-p 00000000 00:00 0
14aa1ac82000-14aa1ac92000 r--p 00000000 00:6a 4387478                    /lib/libm.so.6
14aa1ac92000-14aa1ad05000 r-xp 00010000 00:6a 4387478                    /lib/libm.so.6
14aa1ad05000-14aa1ad5f000 r--p 00083000 00:6a 4387478                    /lib/libm.so.6
14aa1ad5f000-14aa1ad60000 r--p 000dc000 00:6a 4387478                    /lib/libm.so.6
14aa1ad60000-14aa1ad61000 rw-p 000dd000 00:6a 4387478                    /lib/libm.so.6
14aa1ad61000-14aa1ad63000 rw-p 00000000 00:00 0
14aa1ad63000-14aa1ad64000 r--p 00000000 00:6a 4387472                    /lib/ld-linux-x86-64.so.2
14aa1ad64000-14aa1ad89000 r-xp 00001000 00:6a 4387472                    /lib/ld-linux-x86-64.so.2
14aa1ad89000-14aa1ad93000 r--p 00026000 00:6a 4387472                    /lib/ld-linux-x86-64.so.2
14aa1ad93000-14aa1ad95000 r--p 00030000 00:6a 4387472                    /lib/ld-linux-x86-64.so.2
14aa1ad95000-14aa1ad97000 rw-p 00032000 00:6a 4387472                    /lib/ld-linux-x86-64.so.2
55f06d927000-55f06d935000 r--p 00000000 00:6a 1831547                    /bin/cat
55f06d935000-55f06d9f3000 r-xp 0000e000 00:6a 1831547                    /bin/cat
55f06d9f3000-55f06da1c000 r--p 000cc000 00:6a 1831547                    /bin/cat
55f06da1c000-55f06da20000 r--p 000f4000 00:6a 1831547                    /bin/cat
55f06da20000-55f06da21000 rw-p 000f8000 00:6a 1831547                    /bin/cat
55f06da21000-55f06da22000 rw-p 00000000 00:00 0
7ffe36ee2000-7ffe36f03000 rw-p 00000000 00:00 0                          [stack]
7ffe36ff5000-7ffe36ff9000 r--p 00000000 00:00 0                          [vvar]
7ffe36ff9000-7ffe36ffb000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0                  [vsyscall]

Now that we understand what leads to this unexpected mapping, we will be able to more confidently address the issue in addition to creating a test for it.

vihangm pushed a commit that referenced this issue Aug 1, 2023
…mmon virtual memory mapping (#1637)

Summary: Fix virtual to binary addr conversion for processes that have
an uncommon virtual memory mapping

Our previous virtual to binary address conversion logic assumed that the
first offset within `/proc/$PID/maps` was the correct offset to apply
for PIE binaries. There are certain cases, such as when an unlimited
stack ulimit is applied, where this assumption doesn't hold true (see
the linked issue before for more details). This change adjusts our
conversion logic to take into account the correct `/proc/$PID/maps`
entry so address conversion works in all known cases.

Relevant Issues: #1630

Type of change: /kind bug

Test Plan: Verified the following:
- [x] New test verifies the status quo case as well as the situation
reported in #1630
- [x] Verified `perf_profiler_bpf_test` passes when the perf profiler
uses the ELF symbolizer

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
@ddelnano
Copy link
Member Author

ddelnano commented Aug 1, 2023

This will be fixed in the next release (v0.14.4).

@ddelnano ddelnano closed this as completed Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datacollector Issues related to Stirling (datacollector) kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant