parca-agent triggers kernel bug because it calls bpf_probe_read_user() in the perf_event IRQ #1675

luisgerhorst · 2023-05-19T16:34:30Z

Describe the bug

Unfortunately, on some systems parca-agent seems to trigger a rare upstream kernel BUG because it calls bpf_probe_read_user() inside the perf_event IRQ. This is because bpf_probe_read_user() will call copy_from_user_nofault > access_ok > ... > find_vmap_area with some kernel configs (i.e., CONFIG_HARDENED_USERCOPY) which will attempt to acquire vmap_area_lock. If the interrupt occurred while the lock is held (e.g., during alloc_vmap_area() in the clone() syscall) find_vmap_area() will never return. This causes the lock held by clone() to never be released and any other CPU attempting to acquire it is locked up in an infinite loop. Ultimately, this happens on all CPUs and the whole machine is locked up.

To Reproduce

Start a machine using the affected upstream kernel code (tested with v6.1 but I believe the bug is also present in most other kernels). To reproduce it, you can for example use an AWS EC2 c6a.large (64 vCPUs) instance with the AMI al2023-ami-2023.0.20230503.0-kernel-6.1-x86_64. Having more CPUs allows the bug to be triggered more quickly.

$ curl -sL https://github.com/parca-dev/parca-agent/releases/download/v0.19.0/parca-agent_0.19.0_`uname -s`_`uname -m`.tar.gz | tar xvfz -
$ sudo ./parca-agent --node=test --remote-store-address=localhost:7070 --remote-store-insecure

To trigger the bug quickly, execute some code that will also use vmap_area_lock. For example, the clone() syscall:

$ while true ; do
ls -al > /dev/null # do not use true which is a shell builtin
done

Within 10 minutes, the CPU soft lockup messages should appear on the serial console.

Expected behavior

The machine is not locked up. BPF should not be able to lock up the machine but because of the kernel bug this happens anyway.

Logs

Here's an annotated log from the serial console. Other traces are also printed (from the other CPUs attempting to acquire the lock), however, this is the root cause I believe:

[253905.544838] Sending NMI from CPU 27 to CPUs 55:
[253905.545371] NMI backtrace for cpu 55
[253905.545375] CPU: 55 PID: 3316 Comm: spawn Tainted: G             L     6.1.25-37.47.amzn2023.x86_64 #1
[253905.545377] Hardware name: Amazon EC2 c6a.16xlarge/, BIOS 0 10/16/2017
[253905.545378] RIP: 0010:native_queued_spin_lock_slowpath+0x32/0x2c0
[253905.545384] Code: 54 55 48 89 fd 53 66 90 ba 01 00 00 00 8b 45 00 85 c0 75 14 f0 0f b1 55 00 85 c0 75 f0 5b 5d 41 5c 41 5d c3 cc cc cc cc f3 90 <eb> e1 81 fe 00 01 00 00 74 50 40 30 f6 85 f6 75 73 f0 0f ba 6d 00
[253905.545385] RSP: 0018:ffffc3edc6e68bc0 EFLAGS: 00000002
[253905.545387] RAX: 0000000000000001 RBX: ffffffffa1777ccc RCX: 0000000000000010
[253905.545388] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffa1777ccc
[253905.545388] RBP: ffffffffa1777ccc R08: 0000000000000001 R09: 000004c6af4181a9
[253905.545389] R10: 0000000000000000 R11: ffffc3edc6e68ff8 R12: 0000000000000008
[253905.545390] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000080000000
[253905.545393] FS:  00007fd4a28d8600(0000) GS:ffffa057e99c0000(0000) knlGS:0000000000000000
[253905.545394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[253905.545395] CR2: 00000000004040b0 CR3: 00000002461a8001 CR4: 00000000003706e0
[253905.545398] Call Trace:
[253905.545399]  <IRQ>
#
#
# https://elixir.bootlin.com/linux/latest/source/mm/vmalloc.c#L1861
#
[253905.545401]  _raw_spin_lock+0x30/0x40
[253905.545403]  find_vmap_area+0x17/0x60
#
#
# Likely requires https://elixir.bootlin.com/linux/v6.1.28/K/ident/CONFIG_HARDENED_USERCOPY
#
[253905.545407]  check_heap_object+0xd4/0x150
[253905.545409]  __check_object_size.part.0+0x47/0xd0
#
#
# This does pagefault_disable() (like perf_callchain_user()), which should make the actual copy IRQ-safe.
#
# But it calls access_ok() before pagefault_disable(), which is apparently not IRQ-safe.
# https://elixir.bootlin.com/linux/v6.1.28/source/arch/x86/include/asm/uaccess.h#L41
#
[253905.545411]  copy_from_user_nofault+0x65/0x90
[253905.545413]  bpf_probe_read_user+0x18/0x50
[253905.545416]  bpf_prog_2448819a7219e528_profile_cpu+0x354/0x9fd
[253905.545421]  bpf_overflow_handler+0xad/0x170
[253905.545424]  __perf_event_overflow+0x102/0x1e0
[253905.545426]  ? __perf_event_overflow+0x1e0/0x1e0
[253905.545427]  perf_swevent_hrtimer+0x12b/0x140
[253905.545430]  ? update_load_avg+0x7e/0x740
[253905.545433]  ? enqueue_entity+0x1b2/0x520
[253905.545435]  __hrtimer_run_queues+0x112/0x2b0
[253905.545439]  hrtimer_interrupt+0x106/0x220
[253905.545442]  __sysvec_apic_timer_interrupt+0x7f/0x170
[253905.545445]  sysvec_apic_timer_interrupt+0x9d/0xd0
[253905.545448]  </IRQ>
[253905.545449]  <TASK>
[253905.545449]  asm_sysvec_apic_timer_interrupt+0x16/0x20
[253905.545452] RIP: 0010:insert_vmap_area.constprop.0+0x34/0x120
[253905.545453] Code: 4b 03 41 55 41 54 55 53 48 89 fb 48 85 c0 0f 84 d3 00 00 00 4c 8b 4f 08 eb 10 48 8b 48 10 48 8d 50 10 48 85 c9 74 29 48 8b 02 <48> 8b 48 f0 49 39 c9 76 e7 48 8b 33 40 f8 4c 39 c6 0f 82 88
[253905.545454] RSP: 0018:ffffc3ede3b23bf8 EFLAGS: 00000282
[253905.545455] RAX: ffffa039ec903d10 RBX: ffffa039ec9030c0 RCX: ffffa039ec903d10
[253905.545456] RDX: ffffa0492d825520 RSI: ffffc3edf0f08000 RDI: ffffa039ec9030c0
[253905.545456] RBP: ffffa048c77e8400 R08: ffffc3edf0efd000 R09: ffffc3edf0f0d000
[253905.545457] R10: ffffc3edf0f05000 R11: 0000000000036b00 R12: 0000000000005000
[253905.545458] R13: 0000000000003fff R14: ffffa039ec9030c0 R15: ffffc3edc0000000
#
#
# https://elixir.bootlin.com/linux/latest/source/mm/vmalloc.c#L1634
#
[253905.545460]  alloc_vmap_area+0x330/0x820
[253905.545463]  __get_vm_area_node+0xb8/0x170
[253905.545464]  __vmalloc_node_range+0xa6/0x220
[253905.545466]  ? dup_task_struct+0x57/0x1a0
[253905.545470]  alloc_thread_stack_node+0xcd/0x130
[253905.545472]  ? dup_task_struct+0x57/0x1a0
[253905.545474]  dup_task_struct+0x57/0x1a0
[253905.545476]  copy_process+0x1bd/0x15c0
[253905.545479]  kernel_clone+0x9b/0x3b0
[253905.545482]  __do_sys_clone+0x66/0x90
[253905.545485]  do_syscall_64+0x3b/0x90
[253905.545487]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[253905.545489] RIP: 0033:0x7fd4a2718a27
[253905.545490] Code: 00 00 00 f3 0f 1e fa 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 39 41 89 c0 85 c0 75 2a 64 48 8b 04 25 10 00
[253905.545491] RSP: 002b:00007ffc648c1158 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[253905.545492] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fd4a2718a27
[253905.545493] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[253905.545494] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[253905.545494] R10: 00007fd4a28d88d0 R11: 0000000000000246 R12: 0000000000000000
[253905.545495] R13: 00000000004010b0 R14: 0000000000403e00 R15: 00007fd4a2914000
[253905.545497]  </TASK>

Software (please complete the following information):

Parca Agent Version: v0.19.0, also tested git tree from last week
Parca Server Version (if applicable): NA

Workload (please complete the following information):

Runtime (if applicable):
Compiler (if applicable):

Environment (please complete the following information):

Linux Distribution (tested on the following, others are likely also affected):

$ cat /etc/*-release
Amazon Linux release 2023 (Amazon Linux)
NAME="Amazon Linux"
VERSION="2023"
ID="amzn"
ID_LIKE="fedora"
VERSION_ID="2023"
PLATFORM_ID="platform:al2023"
PRETTY_NAME="Amazon Linux 2023"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2023"
HOME_URL="https://aws.amazon.com/linux/"
BUG_REPORT_URL="https://github.com/amazonlinux/amazon-linux-2023"
SUPPORT_END="2028-03-01"
Amazon Linux release 2023 (Amazon Linux)

Linux Version: 6.1.25-37.47.amzn2023.x86_64
Arch: x86_64
Kubernetes Version (if applicable): NA
Container Runtime (if applicable): NA

Additional context

I believe this is neither a bug in Amazon Linux nor in Parca, but a upstream kernel bug. I have not reported it upstream yet (you are free to do it yourself, it would be great if you CC gerhorst@amazon.de and linux-kernel@luisgerhorst.de if you do). I was not able to find an existing report on LKML. I am reporting this here because parca-agent is affected and you will likely want to change your BPF program even if the bug is fixed upstream (as it will take time for the fix to propagate).

The best fix for you is likely to stop using the BPF helper for now. Maybe you can also detect the specific conditions that trigger the bug and only avoid calling the helper when these are present.

To fix the kernel bug, it's maybe possible to disable IRQs during alloc_vmap_area() and similar or to make access_ok() IRQ-safe.

The text was updated successfully, but these errors were encountered:

kakkoyun · 2023-05-22T09:33:09Z

@luisgerhorst Thanks for reporting 👍 Let us discuss our options, and we will update here.

javierhonduco · 2023-05-30T11:05:46Z

Thanks for the detailed bug report! I agree with you that this issue lies in the kernel. BPF execution should always be safe, so it should never lead to kernel panics / oops.

We can't stop using bpf_probe_read_user as it's at the heart of what we need to do -- reading memory locations so we can unwind different runtimes.

Found a recent patch (https://lore.kernel.org/bpf/202301190848.D0543F7CE@keescook/T/#mf4a2a97bb0a4cdc13eff7a1f8f5d25ea594263c2
) that mentions exactly this issue we are seeing:

__copy_from_user_inatomic() under CONFIG_HARDENED_USERCOPY is calling
check_object_size()->__check_object_size()->check_heap_object()->find_vmap_area()->spin_lock()
which is not safe to do from BPF, [ke]probe and perf due to potential deadlock.

Let us know if you can give it a try!

javierhonduco · 2023-05-31T10:54:36Z

Will leave this issue to track backports of the fix / bugs opened in different distros:

Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1033398
Fedora/CentOS: https://bugzilla.redhat.com/show_bug.cgi?id=2211455

Releases >=5.19 && <6.1 have a pretty bad kernel bug that can result in whole sytem lock ups that can only be fixed with a reboot (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1033398). The fix got backported to -stable (https://www.spinics.net/lists/stable/msg662452.html, and https://www.spinics.net/lists/stable/msg662218.html for 6.1 and 6.3 respectively). Let's not run the Agent in these kernels, but provide a flag to bypass this check. Note that running a buggy kernel can result in your machine going down. Related issue: #1675 Test Plan ========= Tested locally + added unit tests

szuecs · 2023-10-11T19:02:35Z

I am pretty sure we hit the same bug. I could not get a console output from AWS, but we run c6g.8xlarge in a test and get every 10-30m a machine freeze.

Kernel: 6.2.0-1009-aws

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:        22.04
Codename:       jammy

javierhonduco · 2024-01-11T14:58:08Z

Closing as now we have a check to not run on these kernels by default. Thanks a lot!

luisgerhorst changed the title ~~parca-agent triggers kernel BUG because it calls bpf_probe_read_user() in the perf_event IRQ~~ parca-agent triggers kernel bug because it calls bpf_probe_read_user() in the perf_event IRQ May 19, 2023

kakkoyun added the area/eBPF Something involving eBPF label May 22, 2023

kakkoyun assigned javierhonduco May 27, 2023

javierhonduco mentioned this issue Sep 27, 2023

Add check for known kernel issues #2070

Merged

gerhard mentioned this issue Oct 13, 2023

Upgrade Linux to 6.4 LTS siderolabs/talos#7856

Closed

javierhonduco closed this as completed Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parca-agent triggers kernel bug because it calls bpf_probe_read_user() in the perf_event IRQ #1675

parca-agent triggers kernel bug because it calls bpf_probe_read_user() in the perf_event IRQ #1675

luisgerhorst commented May 19, 2023 •

edited

Loading

kakkoyun commented May 22, 2023 •

edited

Loading

javierhonduco commented May 30, 2023

javierhonduco commented May 31, 2023 •

edited

Loading

szuecs commented Oct 11, 2023

javierhonduco commented Jan 11, 2024

parca-agent triggers kernel bug because it calls bpf_probe_read_user() in the perf_event IRQ #1675

parca-agent triggers kernel bug because it calls bpf_probe_read_user() in the perf_event IRQ #1675

Comments

luisgerhorst commented May 19, 2023 • edited Loading

kakkoyun commented May 22, 2023 • edited Loading

javierhonduco commented May 30, 2023

javierhonduco commented May 31, 2023 • edited Loading

szuecs commented Oct 11, 2023

javierhonduco commented Jan 11, 2024

luisgerhorst commented May 19, 2023 •

edited

Loading

kakkoyun commented May 22, 2023 •

edited

Loading

javierhonduco commented May 31, 2023 •

edited

Loading