Skip to content

Conversation

@jfarthing84
Copy link

No description provided.

@jfarthing84
Copy link
Author

Looks like Github pull requests aren't accepted.

@jfarthing84 jfarthing84 closed this Nov 6, 2016
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Apr 12, 2017
Peter,

I hope Thomas didn't lend you his frozen shark thrower. ;-)

Let me explain the issue we have. A while back, I asked Paul if he
could add a special RCU case, where the quiescent states are if a task
goes into user land (it's fine if it's there), is sleeping (off the run
queue) or voluntarily schedules. He implemented this for me where after
calling synchronize_rcu_tasks() all tasks is in the quiescent state
(user space or not on a run queue) or has done a voluntary schedule
(calling __schedule(false)).

The one issue is that this ignores the idle task, as the idle task is
never in user space and is always on the run queue. The only thing we
could hope for is if it calls schedule(false). Thus
synchronize_rcu_tasks() ignores it.

The reason I need this is to be able to have ftrace hooks that are
created dynamically (like what perf does when it uses function
tracing), to be able to use the optimize dynamic trampoline. What that
means is, if there's a single callback for a function hook, a
trampoline is created dynamically and that hook (fentry) calls the
dynamic trampoline directly. That dynamic trampoline calls the function
callback directly without caring about any other function hook that may
be registered to other functions.

If perf does function tracing and there's no other code doing any
function tracing, then all the functions it traces will call this
dynamically allocated trampoline which will call the perf function
tracer directly. Otherwise it will call the default trampoline that
calls a loop function to iterate over any registered function callbacks
with ftrace.

The issue occurs when perf is done tracing. Now we need to free this
dynamically created trampoline. But in order to do that, we need to
make sure all tasks are not executing on it. One thing ftrace does
before freeing the trampoline after detaching the hook from the
functions, is to schedule on every CPU, which will make sure everything
is out of a preempt section. This is more severe than
synchronize_sched() because ftrace is recorded in locations that RCU is
not active (like going to idle), and synchronize_sched() is not good
enough. But even scheduling on all CPUs is not good enough when the
kernel itself is preemptive. In that case we use the
synchronize_rcu_tasks() that covers those cases where a task has been
preempted. Except for one!

This brings us back to synchronize_rcu_tasks() ignoring the idle case.
Where I trigger this:

 BUG: unable to handle kernel paging request at ffffffffa0230077
 IP: 0xffffffffa0230077
 PGD 2414067
 PUD 2415063
 PMD c463c067
 PTE 0

 Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
 Modules linked in: ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core x86_pkg_temp_thermal kvm_intel i915 snd_seq kvm snd_seq_device snd_pcm i2c_algo_bit snd_timer drm_kms_helper irqbypass syscopyarea sysfillrect sysimgblt fb_sys_fops drm i2c_i801 snd soundcore wmi i2c_core video e1000e ptp pps_core
 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.11.0-rc3-test+ torvalds#356
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
 task: ffff8800cebbd0c0 task.stack: ffff8800cebd8000
 RIP: 0010:0xffffffffa0230077
 RSP: 0018:ffff8800cebdfd80 EFLAGS: 00010286
 RAX: 0000000000000000 RBX: ffff8800cebbd0c0 RCX: ffffffff8121391e
 RDX: dffffc0000000000 RSI: ffffffff81ce7c78 RDI: ffff8800cebbf298
 RBP: ffff8800cebdfe28 R08: 0000000000000003 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800cebbd0c0
 R13: ffffffff828fbe90 R14: ffff8800d392c480 R15: ffffffff826cc380
 FS:  0000000000000000(0000) GS:ffff8800d3900000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffa0230077 CR3: 0000000002413000 CR4: 00000000001406e0
 Call Trace:
  ? debug_smp_processor_id+0x17/0x20
  ? sched_ttwu_pending+0x79/0x190
  ? schedule+0x5/0xe0
  ? trace_hardirqs_on_caller+0x182/0x280
  schedule+0x5/0xe0
  schedule_preempt_disabled+0x18/0x30
  ? schedule+0x5/0xe0
  ? schedule_preempt_disabled+0x18/0x30
  do_idle+0x172/0x220

That RIP of ffffffffa0230077 is on the trampoline. The return address
is schedule+0x5 which happens to be from a call to fentry. Notice the
Comm: swapper/2.

I can trigger this with a few runs of my test case, pretty
consistently. When I looked at the code, I don't see any reason for
idle to ever enable preemption. It currently calls
schedule_preempt_disabled(), which enables preemption, calls schedule()
and then disables preemption.

The schedule() calls sched_submit_work() which immediately returns
because of the check if the task is running, and idle is always
running. Then it does a loop of disabling preemption calling
__schedule() then enabling preemption again, and checking
need_resched().

The above bug happens in schedule_preempt_disabled, where preemption is
enabled, then schedule() is called, which is traced by ftrace. But then
an interrupt came in while the idle task was on the ftrace trampoline,
and when the interrupt returned, it scheduled via the interrupt preempt
scheduling and not the call to schedule itself, which I feel is
inefficient anyway, as idle is going to call schedule again anyway.

The trampoline is freed (because none of the synchronizations were able
to detect idle was preempted on a trampoline), and when idle gets to
run again, it crashes (it's executing code that no longer exists).

The solution I'm proposing here is to create a schedule_idle() call,
that is local to kernel/sched/ which do_idle() calls instead and this
basically does exactly the same thing that schedule() does, except that
it doesn't enable preemption. It calls __schedule() in a loop that
checks for need_resched(). Not only does this solve the bug I'm
triggering with trying to free dynamically created ftrace trampolines,
it also makes the idle code a bit more efficient without having these
spurious schedule calls when interrupts occur in this small window when
preemption is enabled.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Apr 12, 2017
I finally got around to creating trampolines for dynamically allocated
ftrace_ops with using synchronize_rcu_tasks(). For users of the ftrace
function hook callbacks, like perf, that allocate the ftrace_ops
descriptor via kmalloc() and friends, ftrace was not able to optimize
the functions being traced to use a trampoline because they would also
need to be allocated dynamically. The problem is that they cannot be
freed when CONFIG_PREEMPT is set, as there's no way to tell if a task
was preempted on the trampoline. That was before Paul McKenney
implemented synchronize_rcu_tasks() that would make sure all tasks
(except idle) have scheduled out or have entered user space.

While testing this, I triggered this bug:

 BUG: unable to handle kernel paging request at ffffffffa0230077
 IP: 0xffffffffa0230077
 PGD 2414067
 PUD 2415063
 PMD c463c067
 PTE 0

 Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
 Modules linked in: ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core x86_pkg_temp_thermal kvm_intel i915 snd_seq kvm snd_seq_device snd_pcm i2c_algo_bit snd_timer drm_kms_helper irqbypass syscopyarea sysfillrect sysimgblt fb_sys_fops drm i2c_i801 snd soundcore wmi i2c_core video e1000e ptp pps_core
 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.11.0-rc3-test+ torvalds#356
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
 task: ffff8800cebbd0c0 task.stack: ffff8800cebd8000
 RIP: 0010:0xffffffffa0230077
 RSP: 0018:ffff8800cebdfd80 EFLAGS: 00010286
 RAX: 0000000000000000 RBX: ffff8800cebbd0c0 RCX: ffffffff8121391e
 RDX: dffffc0000000000 RSI: ffffffff81ce7c78 RDI: ffff8800cebbf298
 RBP: ffff8800cebdfe28 R08: 0000000000000003 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800cebbd0c0
 R13: ffffffff828fbe90 R14: ffff8800d392c480 R15: ffffffff826cc380
 FS:  0000000000000000(0000) GS:ffff8800d3900000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffa0230077 CR3: 0000000002413000 CR4: 00000000001406e0
 Call Trace:
  ? debug_smp_processor_id+0x17/0x20
  ? sched_ttwu_pending+0x79/0x190
  ? schedule+0x5/0xe0
  ? trace_hardirqs_on_caller+0x182/0x280
  schedule+0x5/0xe0
  schedule_preempt_disabled+0x18/0x30
  ? schedule+0x5/0xe0
  ? schedule_preempt_disabled+0x18/0x30
  do_idle+0x172/0x220

What happened was that the idle task was preempted on the trampoline.
As synchronize_rcu_tasks() ignores the idle thread, there's nothing
that lets ftrace know that the idle task was preempted on a trampoline.

The idle task shouldn't need to ever enable preemption. The idle task
is simply a loop that calls schedule or places the cpu into idle mode.
In fact, having preemption enabled is inefficient, because it can
happen when idle is just about to call schedule anyway, which would
cause schedule to be called twice. Once for when the interrupt came in
and was returning back to normal context, and then again in the normal
path that the idle loop is running in, which would be pointless, as it
had already scheduled.

Adding a new function local to kernel/sched/ that allows idle to call
the scheduler without enabling preemption, fixes the
synchronize_rcu_tasks) issue, as well as removes the pointless spurious
scheduled caused by interrupts happening in the brief window where
preemption is enabled just before it calls schedule.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Apr 13, 2017
I finally got around to creating trampolines for dynamically allocated
ftrace_ops with using synchronize_rcu_tasks(). For users of the ftrace
function hook callbacks, like perf, that allocate the ftrace_ops
descriptor via kmalloc() and friends, ftrace was not able to optimize
the functions being traced to use a trampoline because they would also
need to be allocated dynamically. The problem is that they cannot be
freed when CONFIG_PREEMPT is set, as there's no way to tell if a task
was preempted on the trampoline. That was before Paul McKenney
implemented synchronize_rcu_tasks() that would make sure all tasks
(except idle) have scheduled out or have entered user space.

While testing this, I triggered this bug:

 BUG: unable to handle kernel paging request at ffffffffa0230077
 IP: 0xffffffffa0230077
 PGD 2414067
 PUD 2415063
 PMD c463c067
 PTE 0

 Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
 Modules linked in: ip6table_filter ip6_tables snd_hda_codec_hdmi
 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel
 snd_hda_codec snd_hwdep snd_hda_core x86_pkg_temp_thermal kvm_intel
 i915 snd_seq kvm snd_seq_device snd_pcm i2c_algo_bit snd_timer
 drm_kms_helper irqbypass syscopyarea sysfillrect sysimgblt fb_sys_fops
 drm i2c_i801 snd soundcore wmi i2c_core video e1000e ptp pps_core CPU:
 2 PID: 0 Comm: swapper/2 Not tainted 4.11.0-rc3-test+ torvalds#356 Hardware
 name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05
 05/07/2012 task: ffff8800cebbd0c0 task.stack: ffff8800cebd8000 RIP:
 0010:0xffffffffa0230077 RSP: 0018:ffff8800cebdfd80 EFLAGS: 00010286
 RAX: 0000000000000000 RBX: ffff8800cebbd0c0 RCX: ffffffff8121391e RDX:
 dffffc0000000000 RSI: ffffffff81ce7c78 RDI: ffff8800cebbf298 RBP:
 ffff8800cebdfe28 R08: 0000000000000003 R09: 0000000000000000 R10:
 0000000000000000 R11: 0000000000000000 R12: ffff8800cebbd0c0 R13:
 ffffffff828fbe90 R14: ffff8800d392c480 R15: ffffffff826cc380 FS:
 0000000000000000(0000) GS:ffff8800d3900000(0000)
 knlGS:0000000000000000 CS:  0010 DS: 0000 ES: 0000 CR0:
 0000000080050033 CR2: ffffffffa0230077 CR3: 0000000002413000 CR4:
 00000000001406e0 Call Trace: ? debug_smp_processor_id+0x17/0x20 ?
 sched_ttwu_pending+0x79/0x190 ? schedule+0x5/0xe0 ?
 trace_hardirqs_on_caller+0x182/0x280 schedule+0x5/0xe0
 schedule_preempt_disabled+0x18/0x30 ? schedule+0x5/0xe0
  ? schedule_preempt_disabled+0x18/0x30
  do_idle+0x172/0x220

What happened was that the idle task was preempted on the trampoline.
As synchronize_rcu_tasks() ignores the idle thread, there's nothing
that lets ftrace know that the idle task was preempted on a trampoline.

The idle task shouldn't need to ever enable preemption. The idle task
is simply a loop that calls schedule or places the cpu into idle mode.
In fact, having preemption enabled is inefficient, because it can
happen when idle is just about to call schedule anyway, which would
cause schedule to be called twice. Once for when the interrupt came in
and was returning back to normal context, and then again in the normal
path that the idle loop is running in, which would be pointless, as it
had already scheduled.

Adding a new function local to kernel/sched/ that allows idle to call
the scheduler without enabling preemption, fixes the
synchronize_rcu_tasks) issue, as well as removes the pointless spurious
scheduled caused by interrupts happening in the brief window where
preemption is enabled just before it calls schedule.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
nathanchance referenced this pull request in ClangBuiltLinux/linux Dec 21, 2018
xfrm6_policy_check() might have re-allocated skb->head, we need
to reload ipv6 header pointer.

sysbot reported :

BUG: KASAN: use-after-free in __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40
Read of size 4 at addr ffff888191b8cb70 by task syz-executor2/1304

CPU: 0 PID: 1304 Comm: syz-executor2 Not tainted 4.20.0-rc7+ #356
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x244/0x39d lib/dump_stack.c:113
 print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
 __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40
 ipv6_addr_type include/net/ipv6.h:403 [inline]
 ip6_tnl_get_cap+0x27/0x190 net/ipv6/ip6_tunnel.c:727
 ip6_tnl_rcv_ctl+0xdb/0x2a0 net/ipv6/ip6_tunnel.c:757
 vti6_rcv+0x336/0x8f3 net/ipv6/ip6_vti.c:321
 xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132
 ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394
 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443
IPVS: ftp: loaded support on port[0] = 21
 ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537
 dst_input include/net/dst.h:450 [inline]
 ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272
 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083
 process_backlog+0x24e/0x7a0 net/core/dev.c:5923
 napi_poll net/core/dev.c:6346 [inline]
 net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412
 __do_softirq+0x308/0xb7e kernel/softirq.c:292
 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1027
 </IRQ>
 do_softirq.part.14+0x126/0x160 kernel/softirq.c:337
 do_softirq+0x19/0x20 kernel/softirq.c:340
 netif_rx_ni+0x521/0x860 net/core/dev.c:4569
 dev_loopback_xmit+0x287/0x8c0 net/core/dev.c:3576
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_finish_output2+0x193a/0x2930 net/ipv6/ip6_output.c:84
 ip6_fragment+0x2b06/0x3850 net/ipv6/ip6_output.c:727
 ip6_finish_output+0x6b7/0xc50 net/ipv6/ip6_output.c:152
 NF_HOOK_COND include/linux/netfilter.h:278 [inline]
 ip6_output+0x232/0x9d0 net/ipv6/ip6_output.c:171
 dst_output include/net/dst.h:444 [inline]
 ip6_local_out+0xc5/0x1b0 net/ipv6/output_core.c:176
 ip6_send_skb+0xbc/0x340 net/ipv6/ip6_output.c:1727
 ip6_push_pending_frames+0xc5/0xf0 net/ipv6/ip6_output.c:1747
 rawv6_push_pending_frames net/ipv6/raw.c:615 [inline]
 rawv6_sendmsg+0x3a3e/0x4b40 net/ipv6/raw.c:945
kobject: 'queues' (0000000089e6eea2): kobject_add_internal: parent: 'tunl0', set: '<NULL>'
kobject: 'queues' (0000000089e6eea2): kobject_uevent_env
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
kobject: 'queues' (0000000089e6eea2): kobject_uevent_env: filter function caused the event to drop!
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:631
 sock_write_iter+0x35e/0x5c0 net/socket.c:900
 call_write_iter include/linux/fs.h:1857 [inline]
 new_sync_write fs/read_write.c:474 [inline]
 __vfs_write+0x6b8/0x9f0 fs/read_write.c:487
kobject: 'rx-0' (00000000e2d902d9): kobject_add_internal: parent: 'queues', set: 'queues'
kobject: 'rx-0' (00000000e2d902d9): kobject_uevent_env
 vfs_write+0x1fc/0x560 fs/read_write.c:549
 ksys_write+0x101/0x260 fs/read_write.c:598
kobject: 'rx-0' (00000000e2d902d9): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/rx-0'
 __do_sys_write fs/read_write.c:610 [inline]
 __se_sys_write fs/read_write.c:607 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:607
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
kobject: 'tx-0' (00000000443b70ac): kobject_add_internal: parent: 'queues', set: 'queues'
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457669
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f9bd200bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457669
RDX: 000000000000058f RSI: 00000000200033c0 RDI: 0000000000000003
kobject: 'tx-0' (00000000443b70ac): kobject_uevent_env
RBP: 000000000072bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9bd200c6d4
R13: 00000000004c2dcc R14: 00000000004da398 R15: 00000000ffffffff

Allocated by task 1304:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
 __do_kmalloc_node mm/slab.c:3684 [inline]
 __kmalloc_node_track_caller+0x50/0x70 mm/slab.c:3698
 __kmalloc_reserve.isra.41+0x41/0xe0 net/core/skbuff.c:140
 __alloc_skb+0x155/0x760 net/core/skbuff.c:208
kobject: 'tx-0' (00000000443b70ac): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/tx-0'
 alloc_skb include/linux/skbuff.h:1011 [inline]
 __ip6_append_data.isra.49+0x2f1a/0x3f50 net/ipv6/ip6_output.c:1450
 ip6_append_data+0x1bc/0x2d0 net/ipv6/ip6_output.c:1619
 rawv6_sendmsg+0x15ab/0x4b40 net/ipv6/raw.c:938
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:631
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2116
 __sys_sendmsg+0x11d/0x280 net/socket.c:2154
 __do_sys_sendmsg net/socket.c:2163 [inline]
 __se_sys_sendmsg net/socket.c:2161 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2161
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
kobject: 'gre0' (00000000cb1b2d7b): kobject_add_internal: parent: 'net', set: 'devices'

Freed by task 1304:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kfree+0xcf/0x230 mm/slab.c:3817
 skb_free_head+0x93/0xb0 net/core/skbuff.c:553
 pskb_expand_head+0x3b2/0x10d0 net/core/skbuff.c:1498
 __pskb_pull_tail+0x156/0x18a0 net/core/skbuff.c:1896
 pskb_may_pull include/linux/skbuff.h:2188 [inline]
 _decode_session6+0xd11/0x14d0 net/ipv6/xfrm6_policy.c:150
 __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:3272
kobject: 'gre0' (00000000cb1b2d7b): kobject_uevent_env
 __xfrm_policy_check+0x380/0x2c40 net/xfrm/xfrm_policy.c:3322
 __xfrm_policy_check2 include/net/xfrm.h:1170 [inline]
 xfrm_policy_check include/net/xfrm.h:1175 [inline]
 xfrm6_policy_check include/net/xfrm.h:1185 [inline]
 vti6_rcv+0x4bd/0x8f3 net/ipv6/ip6_vti.c:316
 xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132
 ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394
 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443
 ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537
 dst_input include/net/dst.h:450 [inline]
 ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272
 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083
 process_backlog+0x24e/0x7a0 net/core/dev.c:5923
kobject: 'gre0' (00000000cb1b2d7b): fill_kobj_path: path = '/devices/virtual/net/gre0'
 napi_poll net/core/dev.c:6346 [inline]
 net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412
 __do_softirq+0x308/0xb7e kernel/softirq.c:292

The buggy address belongs to the object at ffff888191b8cac0
 which belongs to the cache kmalloc-512 of size 512
The buggy address is located 176 bytes inside of
 512-byte region [ffff888191b8cac0, ffff888191b8ccc0)
The buggy address belongs to the page:
page:ffffea000646e300 count:1 mapcount:0 mapping:ffff8881da800940 index:0x0
flags: 0x2fffc0000000200(slab)
raw: 02fffc0000000200 ffffea0006eaaa48 ffffea00065356c8 ffff8881da800940
raw: 0000000000000000 ffff888191b8c0c0 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected
kobject: 'queues' (000000005fd6226e): kobject_add_internal: parent: 'gre0', set: '<NULL>'

Memory state around the buggy address:
 ffff888191b8ca00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888191b8ca80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>ffff888191b8cb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
 ffff888191b8cb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888191b8cc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 0d3c703 ("ipv6: Cleanup IPv6 tunnel receive path")
Fixes: ed1efb2 ("ipv6: Add support for IPsec virtual tunnel interfaces")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Noltari pushed a commit to Noltari/linux that referenced this pull request Jan 9, 2019
[ Upstream commit cbb4969 ]

xfrm6_policy_check() might have re-allocated skb->head, we need
to reload ipv6 header pointer.

sysbot reported :

BUG: KASAN: use-after-free in __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40
Read of size 4 at addr ffff888191b8cb70 by task syz-executor2/1304

CPU: 0 PID: 1304 Comm: syz-executor2 Not tainted 4.20.0-rc7+ torvalds#356
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x244/0x39d lib/dump_stack.c:113
 print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
 __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40
 ipv6_addr_type include/net/ipv6.h:403 [inline]
 ip6_tnl_get_cap+0x27/0x190 net/ipv6/ip6_tunnel.c:727
 ip6_tnl_rcv_ctl+0xdb/0x2a0 net/ipv6/ip6_tunnel.c:757
 vti6_rcv+0x336/0x8f3 net/ipv6/ip6_vti.c:321
 xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132
 ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394
 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443
IPVS: ftp: loaded support on port[0] = 21
 ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537
 dst_input include/net/dst.h:450 [inline]
 ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272
 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083
 process_backlog+0x24e/0x7a0 net/core/dev.c:5923
 napi_poll net/core/dev.c:6346 [inline]
 net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412
 __do_softirq+0x308/0xb7e kernel/softirq.c:292
 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1027
 </IRQ>
 do_softirq.part.14+0x126/0x160 kernel/softirq.c:337
 do_softirq+0x19/0x20 kernel/softirq.c:340
 netif_rx_ni+0x521/0x860 net/core/dev.c:4569
 dev_loopback_xmit+0x287/0x8c0 net/core/dev.c:3576
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_finish_output2+0x193a/0x2930 net/ipv6/ip6_output.c:84
 ip6_fragment+0x2b06/0x3850 net/ipv6/ip6_output.c:727
 ip6_finish_output+0x6b7/0xc50 net/ipv6/ip6_output.c:152
 NF_HOOK_COND include/linux/netfilter.h:278 [inline]
 ip6_output+0x232/0x9d0 net/ipv6/ip6_output.c:171
 dst_output include/net/dst.h:444 [inline]
 ip6_local_out+0xc5/0x1b0 net/ipv6/output_core.c:176
 ip6_send_skb+0xbc/0x340 net/ipv6/ip6_output.c:1727
 ip6_push_pending_frames+0xc5/0xf0 net/ipv6/ip6_output.c:1747
 rawv6_push_pending_frames net/ipv6/raw.c:615 [inline]
 rawv6_sendmsg+0x3a3e/0x4b40 net/ipv6/raw.c:945
kobject: 'queues' (0000000089e6eea2): kobject_add_internal: parent: 'tunl0', set: '<NULL>'
kobject: 'queues' (0000000089e6eea2): kobject_uevent_env
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
kobject: 'queues' (0000000089e6eea2): kobject_uevent_env: filter function caused the event to drop!
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:631
 sock_write_iter+0x35e/0x5c0 net/socket.c:900
 call_write_iter include/linux/fs.h:1857 [inline]
 new_sync_write fs/read_write.c:474 [inline]
 __vfs_write+0x6b8/0x9f0 fs/read_write.c:487
kobject: 'rx-0' (00000000e2d902d9): kobject_add_internal: parent: 'queues', set: 'queues'
kobject: 'rx-0' (00000000e2d902d9): kobject_uevent_env
 vfs_write+0x1fc/0x560 fs/read_write.c:549
 ksys_write+0x101/0x260 fs/read_write.c:598
kobject: 'rx-0' (00000000e2d902d9): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/rx-0'
 __do_sys_write fs/read_write.c:610 [inline]
 __se_sys_write fs/read_write.c:607 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:607
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
kobject: 'tx-0' (00000000443b70ac): kobject_add_internal: parent: 'queues', set: 'queues'
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457669
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f9bd200bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457669
RDX: 000000000000058f RSI: 00000000200033c0 RDI: 0000000000000003
kobject: 'tx-0' (00000000443b70ac): kobject_uevent_env
RBP: 000000000072bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9bd200c6d4
R13: 00000000004c2dcc R14: 00000000004da398 R15: 00000000ffffffff

Allocated by task 1304:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
 __do_kmalloc_node mm/slab.c:3684 [inline]
 __kmalloc_node_track_caller+0x50/0x70 mm/slab.c:3698
 __kmalloc_reserve.isra.41+0x41/0xe0 net/core/skbuff.c:140
 __alloc_skb+0x155/0x760 net/core/skbuff.c:208
kobject: 'tx-0' (00000000443b70ac): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/tx-0'
 alloc_skb include/linux/skbuff.h:1011 [inline]
 __ip6_append_data.isra.49+0x2f1a/0x3f50 net/ipv6/ip6_output.c:1450
 ip6_append_data+0x1bc/0x2d0 net/ipv6/ip6_output.c:1619
 rawv6_sendmsg+0x15ab/0x4b40 net/ipv6/raw.c:938
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:631
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2116
 __sys_sendmsg+0x11d/0x280 net/socket.c:2154
 __do_sys_sendmsg net/socket.c:2163 [inline]
 __se_sys_sendmsg net/socket.c:2161 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2161
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
kobject: 'gre0' (00000000cb1b2d7b): kobject_add_internal: parent: 'net', set: 'devices'

Freed by task 1304:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kfree+0xcf/0x230 mm/slab.c:3817
 skb_free_head+0x93/0xb0 net/core/skbuff.c:553
 pskb_expand_head+0x3b2/0x10d0 net/core/skbuff.c:1498
 __pskb_pull_tail+0x156/0x18a0 net/core/skbuff.c:1896
 pskb_may_pull include/linux/skbuff.h:2188 [inline]
 _decode_session6+0xd11/0x14d0 net/ipv6/xfrm6_policy.c:150
 __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:3272
kobject: 'gre0' (00000000cb1b2d7b): kobject_uevent_env
 __xfrm_policy_check+0x380/0x2c40 net/xfrm/xfrm_policy.c:3322
 __xfrm_policy_check2 include/net/xfrm.h:1170 [inline]
 xfrm_policy_check include/net/xfrm.h:1175 [inline]
 xfrm6_policy_check include/net/xfrm.h:1185 [inline]
 vti6_rcv+0x4bd/0x8f3 net/ipv6/ip6_vti.c:316
 xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132
 ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394
 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443
 ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537
 dst_input include/net/dst.h:450 [inline]
 ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272
 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083
 process_backlog+0x24e/0x7a0 net/core/dev.c:5923
kobject: 'gre0' (00000000cb1b2d7b): fill_kobj_path: path = '/devices/virtual/net/gre0'
 napi_poll net/core/dev.c:6346 [inline]
 net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412
 __do_softirq+0x308/0xb7e kernel/softirq.c:292

The buggy address belongs to the object at ffff888191b8cac0
 which belongs to the cache kmalloc-512 of size 512
The buggy address is located 176 bytes inside of
 512-byte region [ffff888191b8cac0, ffff888191b8ccc0)
The buggy address belongs to the page:
page:ffffea000646e300 count:1 mapcount:0 mapping:ffff8881da800940 index:0x0
flags: 0x2fffc0000000200(slab)
raw: 02fffc0000000200 ffffea0006eaaa48 ffffea00065356c8 ffff8881da800940
raw: 0000000000000000 ffff888191b8c0c0 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected
kobject: 'queues' (000000005fd6226e): kobject_add_internal: parent: 'gre0', set: '<NULL>'

Memory state around the buggy address:
 ffff888191b8ca00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888191b8ca80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>ffff888191b8cb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
 ffff888191b8cb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888191b8cc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 0d3c703 ("ipv6: Cleanup IPv6 tunnel receive path")
Fixes: ed1efb2 ("ipv6: Add support for IPsec virtual tunnel interfaces")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dm0- referenced this pull request in coreos/linux Jan 9, 2019
[ Upstream commit cbb4969 ]

xfrm6_policy_check() might have re-allocated skb->head, we need
to reload ipv6 header pointer.

sysbot reported :

BUG: KASAN: use-after-free in __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40
Read of size 4 at addr ffff888191b8cb70 by task syz-executor2/1304

CPU: 0 PID: 1304 Comm: syz-executor2 Not tainted 4.20.0-rc7+ #356
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x244/0x39d lib/dump_stack.c:113
 print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
 __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40
 ipv6_addr_type include/net/ipv6.h:403 [inline]
 ip6_tnl_get_cap+0x27/0x190 net/ipv6/ip6_tunnel.c:727
 ip6_tnl_rcv_ctl+0xdb/0x2a0 net/ipv6/ip6_tunnel.c:757
 vti6_rcv+0x336/0x8f3 net/ipv6/ip6_vti.c:321
 xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132
 ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394
 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443
IPVS: ftp: loaded support on port[0] = 21
 ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537
 dst_input include/net/dst.h:450 [inline]
 ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272
 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083
 process_backlog+0x24e/0x7a0 net/core/dev.c:5923
 napi_poll net/core/dev.c:6346 [inline]
 net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412
 __do_softirq+0x308/0xb7e kernel/softirq.c:292
 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1027
 </IRQ>
 do_softirq.part.14+0x126/0x160 kernel/softirq.c:337
 do_softirq+0x19/0x20 kernel/softirq.c:340
 netif_rx_ni+0x521/0x860 net/core/dev.c:4569
 dev_loopback_xmit+0x287/0x8c0 net/core/dev.c:3576
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_finish_output2+0x193a/0x2930 net/ipv6/ip6_output.c:84
 ip6_fragment+0x2b06/0x3850 net/ipv6/ip6_output.c:727
 ip6_finish_output+0x6b7/0xc50 net/ipv6/ip6_output.c:152
 NF_HOOK_COND include/linux/netfilter.h:278 [inline]
 ip6_output+0x232/0x9d0 net/ipv6/ip6_output.c:171
 dst_output include/net/dst.h:444 [inline]
 ip6_local_out+0xc5/0x1b0 net/ipv6/output_core.c:176
 ip6_send_skb+0xbc/0x340 net/ipv6/ip6_output.c:1727
 ip6_push_pending_frames+0xc5/0xf0 net/ipv6/ip6_output.c:1747
 rawv6_push_pending_frames net/ipv6/raw.c:615 [inline]
 rawv6_sendmsg+0x3a3e/0x4b40 net/ipv6/raw.c:945
kobject: 'queues' (0000000089e6eea2): kobject_add_internal: parent: 'tunl0', set: '<NULL>'
kobject: 'queues' (0000000089e6eea2): kobject_uevent_env
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
kobject: 'queues' (0000000089e6eea2): kobject_uevent_env: filter function caused the event to drop!
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:631
 sock_write_iter+0x35e/0x5c0 net/socket.c:900
 call_write_iter include/linux/fs.h:1857 [inline]
 new_sync_write fs/read_write.c:474 [inline]
 __vfs_write+0x6b8/0x9f0 fs/read_write.c:487
kobject: 'rx-0' (00000000e2d902d9): kobject_add_internal: parent: 'queues', set: 'queues'
kobject: 'rx-0' (00000000e2d902d9): kobject_uevent_env
 vfs_write+0x1fc/0x560 fs/read_write.c:549
 ksys_write+0x101/0x260 fs/read_write.c:598
kobject: 'rx-0' (00000000e2d902d9): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/rx-0'
 __do_sys_write fs/read_write.c:610 [inline]
 __se_sys_write fs/read_write.c:607 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:607
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
kobject: 'tx-0' (00000000443b70ac): kobject_add_internal: parent: 'queues', set: 'queues'
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457669
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f9bd200bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457669
RDX: 000000000000058f RSI: 00000000200033c0 RDI: 0000000000000003
kobject: 'tx-0' (00000000443b70ac): kobject_uevent_env
RBP: 000000000072bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9bd200c6d4
R13: 00000000004c2dcc R14: 00000000004da398 R15: 00000000ffffffff

Allocated by task 1304:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
 __do_kmalloc_node mm/slab.c:3684 [inline]
 __kmalloc_node_track_caller+0x50/0x70 mm/slab.c:3698
 __kmalloc_reserve.isra.41+0x41/0xe0 net/core/skbuff.c:140
 __alloc_skb+0x155/0x760 net/core/skbuff.c:208
kobject: 'tx-0' (00000000443b70ac): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/tx-0'
 alloc_skb include/linux/skbuff.h:1011 [inline]
 __ip6_append_data.isra.49+0x2f1a/0x3f50 net/ipv6/ip6_output.c:1450
 ip6_append_data+0x1bc/0x2d0 net/ipv6/ip6_output.c:1619
 rawv6_sendmsg+0x15ab/0x4b40 net/ipv6/raw.c:938
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:631
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2116
 __sys_sendmsg+0x11d/0x280 net/socket.c:2154
 __do_sys_sendmsg net/socket.c:2163 [inline]
 __se_sys_sendmsg net/socket.c:2161 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2161
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
kobject: 'gre0' (00000000cb1b2d7b): kobject_add_internal: parent: 'net', set: 'devices'

Freed by task 1304:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kfree+0xcf/0x230 mm/slab.c:3817
 skb_free_head+0x93/0xb0 net/core/skbuff.c:553
 pskb_expand_head+0x3b2/0x10d0 net/core/skbuff.c:1498
 __pskb_pull_tail+0x156/0x18a0 net/core/skbuff.c:1896
 pskb_may_pull include/linux/skbuff.h:2188 [inline]
 _decode_session6+0xd11/0x14d0 net/ipv6/xfrm6_policy.c:150
 __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:3272
kobject: 'gre0' (00000000cb1b2d7b): kobject_uevent_env
 __xfrm_policy_check+0x380/0x2c40 net/xfrm/xfrm_policy.c:3322
 __xfrm_policy_check2 include/net/xfrm.h:1170 [inline]
 xfrm_policy_check include/net/xfrm.h:1175 [inline]
 xfrm6_policy_check include/net/xfrm.h:1185 [inline]
 vti6_rcv+0x4bd/0x8f3 net/ipv6/ip6_vti.c:316
 xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132
 ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394
 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443
 ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537
 dst_input include/net/dst.h:450 [inline]
 ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272
 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083
 process_backlog+0x24e/0x7a0 net/core/dev.c:5923
kobject: 'gre0' (00000000cb1b2d7b): fill_kobj_path: path = '/devices/virtual/net/gre0'
 napi_poll net/core/dev.c:6346 [inline]
 net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412
 __do_softirq+0x308/0xb7e kernel/softirq.c:292

The buggy address belongs to the object at ffff888191b8cac0
 which belongs to the cache kmalloc-512 of size 512
The buggy address is located 176 bytes inside of
 512-byte region [ffff888191b8cac0, ffff888191b8ccc0)
The buggy address belongs to the page:
page:ffffea000646e300 count:1 mapcount:0 mapping:ffff8881da800940 index:0x0
flags: 0x2fffc0000000200(slab)
raw: 02fffc0000000200 ffffea0006eaaa48 ffffea00065356c8 ffff8881da800940
raw: 0000000000000000 ffff888191b8c0c0 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected
kobject: 'queues' (000000005fd6226e): kobject_add_internal: parent: 'gre0', set: '<NULL>'

Memory state around the buggy address:
 ffff888191b8ca00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888191b8ca80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>ffff888191b8cb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
 ffff888191b8cb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888191b8cc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 0d3c703 ("ipv6: Cleanup IPv6 tunnel receive path")
Fixes: ed1efb2 ("ipv6: Add support for IPsec virtual tunnel interfaces")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this pull request Jan 10, 2019
[ Upstream commit cbb4969 ]

xfrm6_policy_check() might have re-allocated skb->head, we need
to reload ipv6 header pointer.

sysbot reported :

BUG: KASAN: use-after-free in __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40
Read of size 4 at addr ffff888191b8cb70 by task syz-executor2/1304

CPU: 0 PID: 1304 Comm: syz-executor2 Not tainted 4.20.0-rc7+ torvalds#356
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x244/0x39d lib/dump_stack.c:113
 print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
 __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40
 ipv6_addr_type include/net/ipv6.h:403 [inline]
 ip6_tnl_get_cap+0x27/0x190 net/ipv6/ip6_tunnel.c:727
 ip6_tnl_rcv_ctl+0xdb/0x2a0 net/ipv6/ip6_tunnel.c:757
 vti6_rcv+0x336/0x8f3 net/ipv6/ip6_vti.c:321
 xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132
 ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394
 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443
IPVS: ftp: loaded support on port[0] = 21
 ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537
 dst_input include/net/dst.h:450 [inline]
 ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272
 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083
 process_backlog+0x24e/0x7a0 net/core/dev.c:5923
 napi_poll net/core/dev.c:6346 [inline]
 net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412
 __do_softirq+0x308/0xb7e kernel/softirq.c:292
 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1027
 </IRQ>
 do_softirq.part.14+0x126/0x160 kernel/softirq.c:337
 do_softirq+0x19/0x20 kernel/softirq.c:340
 netif_rx_ni+0x521/0x860 net/core/dev.c:4569
 dev_loopback_xmit+0x287/0x8c0 net/core/dev.c:3576
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_finish_output2+0x193a/0x2930 net/ipv6/ip6_output.c:84
 ip6_fragment+0x2b06/0x3850 net/ipv6/ip6_output.c:727
 ip6_finish_output+0x6b7/0xc50 net/ipv6/ip6_output.c:152
 NF_HOOK_COND include/linux/netfilter.h:278 [inline]
 ip6_output+0x232/0x9d0 net/ipv6/ip6_output.c:171
 dst_output include/net/dst.h:444 [inline]
 ip6_local_out+0xc5/0x1b0 net/ipv6/output_core.c:176
 ip6_send_skb+0xbc/0x340 net/ipv6/ip6_output.c:1727
 ip6_push_pending_frames+0xc5/0xf0 net/ipv6/ip6_output.c:1747
 rawv6_push_pending_frames net/ipv6/raw.c:615 [inline]
 rawv6_sendmsg+0x3a3e/0x4b40 net/ipv6/raw.c:945
kobject: 'queues' (0000000089e6eea2): kobject_add_internal: parent: 'tunl0', set: '<NULL>'
kobject: 'queues' (0000000089e6eea2): kobject_uevent_env
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
kobject: 'queues' (0000000089e6eea2): kobject_uevent_env: filter function caused the event to drop!
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:631
 sock_write_iter+0x35e/0x5c0 net/socket.c:900
 call_write_iter include/linux/fs.h:1857 [inline]
 new_sync_write fs/read_write.c:474 [inline]
 __vfs_write+0x6b8/0x9f0 fs/read_write.c:487
kobject: 'rx-0' (00000000e2d902d9): kobject_add_internal: parent: 'queues', set: 'queues'
kobject: 'rx-0' (00000000e2d902d9): kobject_uevent_env
 vfs_write+0x1fc/0x560 fs/read_write.c:549
 ksys_write+0x101/0x260 fs/read_write.c:598
kobject: 'rx-0' (00000000e2d902d9): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/rx-0'
 __do_sys_write fs/read_write.c:610 [inline]
 __se_sys_write fs/read_write.c:607 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:607
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
kobject: 'tx-0' (00000000443b70ac): kobject_add_internal: parent: 'queues', set: 'queues'
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457669
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f9bd200bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457669
RDX: 000000000000058f RSI: 00000000200033c0 RDI: 0000000000000003
kobject: 'tx-0' (00000000443b70ac): kobject_uevent_env
RBP: 000000000072bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9bd200c6d4
R13: 00000000004c2dcc R14: 00000000004da398 R15: 00000000ffffffff

Allocated by task 1304:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
 __do_kmalloc_node mm/slab.c:3684 [inline]
 __kmalloc_node_track_caller+0x50/0x70 mm/slab.c:3698
 __kmalloc_reserve.isra.41+0x41/0xe0 net/core/skbuff.c:140
 __alloc_skb+0x155/0x760 net/core/skbuff.c:208
kobject: 'tx-0' (00000000443b70ac): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/tx-0'
 alloc_skb include/linux/skbuff.h:1011 [inline]
 __ip6_append_data.isra.49+0x2f1a/0x3f50 net/ipv6/ip6_output.c:1450
 ip6_append_data+0x1bc/0x2d0 net/ipv6/ip6_output.c:1619
 rawv6_sendmsg+0x15ab/0x4b40 net/ipv6/raw.c:938
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:631
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2116
 __sys_sendmsg+0x11d/0x280 net/socket.c:2154
 __do_sys_sendmsg net/socket.c:2163 [inline]
 __se_sys_sendmsg net/socket.c:2161 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2161
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
kobject: 'gre0' (00000000cb1b2d7b): kobject_add_internal: parent: 'net', set: 'devices'

Freed by task 1304:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kfree+0xcf/0x230 mm/slab.c:3817
 skb_free_head+0x93/0xb0 net/core/skbuff.c:553
 pskb_expand_head+0x3b2/0x10d0 net/core/skbuff.c:1498
 __pskb_pull_tail+0x156/0x18a0 net/core/skbuff.c:1896
 pskb_may_pull include/linux/skbuff.h:2188 [inline]
 _decode_session6+0xd11/0x14d0 net/ipv6/xfrm6_policy.c:150
 __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:3272
kobject: 'gre0' (00000000cb1b2d7b): kobject_uevent_env
 __xfrm_policy_check+0x380/0x2c40 net/xfrm/xfrm_policy.c:3322
 __xfrm_policy_check2 include/net/xfrm.h:1170 [inline]
 xfrm_policy_check include/net/xfrm.h:1175 [inline]
 xfrm6_policy_check include/net/xfrm.h:1185 [inline]
 vti6_rcv+0x4bd/0x8f3 net/ipv6/ip6_vti.c:316
 xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132
 ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394
 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443
 ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537
 dst_input include/net/dst.h:450 [inline]
 ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272
 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083
 process_backlog+0x24e/0x7a0 net/core/dev.c:5923
kobject: 'gre0' (00000000cb1b2d7b): fill_kobj_path: path = '/devices/virtual/net/gre0'
 napi_poll net/core/dev.c:6346 [inline]
 net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412
 __do_softirq+0x308/0xb7e kernel/softirq.c:292

The buggy address belongs to the object at ffff888191b8cac0
 which belongs to the cache kmalloc-512 of size 512
The buggy address is located 176 bytes inside of
 512-byte region [ffff888191b8cac0, ffff888191b8ccc0)
The buggy address belongs to the page:
page:ffffea000646e300 count:1 mapcount:0 mapping:ffff8881da800940 index:0x0
flags: 0x2fffc0000000200(slab)
raw: 02fffc0000000200 ffffea0006eaaa48 ffffea00065356c8 ffff8881da800940
raw: 0000000000000000 ffff888191b8c0c0 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected
kobject: 'queues' (000000005fd6226e): kobject_add_internal: parent: 'gre0', set: '<NULL>'

Memory state around the buggy address:
 ffff888191b8ca00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888191b8ca80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>ffff888191b8cb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
 ffff888191b8cb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888191b8cc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 0d3c703 ("ipv6: Cleanup IPv6 tunnel receive path")
Fixes: ed1efb2 ("ipv6: Add support for IPsec virtual tunnel interfaces")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mic92 pushed a commit to Mic92/linux that referenced this pull request Feb 4, 2019
lkl: add support for LKL_INSTALL_ADDITIONAL_HEADERS environment variable
Noltari pushed a commit to Noltari/linux that referenced this pull request Feb 11, 2019
commit cbb4969 upstream.

xfrm6_policy_check() might have re-allocated skb->head, we need
to reload ipv6 header pointer.

sysbot reported :

BUG: KASAN: use-after-free in __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40
Read of size 4 at addr ffff888191b8cb70 by task syz-executor2/1304

CPU: 0 PID: 1304 Comm: syz-executor2 Not tainted 4.20.0-rc7+ torvalds#356
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x244/0x39d lib/dump_stack.c:113
 print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
 __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40
 ipv6_addr_type include/net/ipv6.h:403 [inline]
 ip6_tnl_get_cap+0x27/0x190 net/ipv6/ip6_tunnel.c:727
 ip6_tnl_rcv_ctl+0xdb/0x2a0 net/ipv6/ip6_tunnel.c:757
 vti6_rcv+0x336/0x8f3 net/ipv6/ip6_vti.c:321
 xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132
 ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394
 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443
IPVS: ftp: loaded support on port[0] = 21
 ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537
 dst_input include/net/dst.h:450 [inline]
 ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272
 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083
 process_backlog+0x24e/0x7a0 net/core/dev.c:5923
 napi_poll net/core/dev.c:6346 [inline]
 net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412
 __do_softirq+0x308/0xb7e kernel/softirq.c:292
 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1027
 </IRQ>
 do_softirq.part.14+0x126/0x160 kernel/softirq.c:337
 do_softirq+0x19/0x20 kernel/softirq.c:340
 netif_rx_ni+0x521/0x860 net/core/dev.c:4569
 dev_loopback_xmit+0x287/0x8c0 net/core/dev.c:3576
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_finish_output2+0x193a/0x2930 net/ipv6/ip6_output.c:84
 ip6_fragment+0x2b06/0x3850 net/ipv6/ip6_output.c:727
 ip6_finish_output+0x6b7/0xc50 net/ipv6/ip6_output.c:152
 NF_HOOK_COND include/linux/netfilter.h:278 [inline]
 ip6_output+0x232/0x9d0 net/ipv6/ip6_output.c:171
 dst_output include/net/dst.h:444 [inline]
 ip6_local_out+0xc5/0x1b0 net/ipv6/output_core.c:176
 ip6_send_skb+0xbc/0x340 net/ipv6/ip6_output.c:1727
 ip6_push_pending_frames+0xc5/0xf0 net/ipv6/ip6_output.c:1747
 rawv6_push_pending_frames net/ipv6/raw.c:615 [inline]
 rawv6_sendmsg+0x3a3e/0x4b40 net/ipv6/raw.c:945
kobject: 'queues' (0000000089e6eea2): kobject_add_internal: parent: 'tunl0', set: '<NULL>'
kobject: 'queues' (0000000089e6eea2): kobject_uevent_env
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
kobject: 'queues' (0000000089e6eea2): kobject_uevent_env: filter function caused the event to drop!
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:631
 sock_write_iter+0x35e/0x5c0 net/socket.c:900
 call_write_iter include/linux/fs.h:1857 [inline]
 new_sync_write fs/read_write.c:474 [inline]
 __vfs_write+0x6b8/0x9f0 fs/read_write.c:487
kobject: 'rx-0' (00000000e2d902d9): kobject_add_internal: parent: 'queues', set: 'queues'
kobject: 'rx-0' (00000000e2d902d9): kobject_uevent_env
 vfs_write+0x1fc/0x560 fs/read_write.c:549
 ksys_write+0x101/0x260 fs/read_write.c:598
kobject: 'rx-0' (00000000e2d902d9): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/rx-0'
 __do_sys_write fs/read_write.c:610 [inline]
 __se_sys_write fs/read_write.c:607 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:607
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
kobject: 'tx-0' (00000000443b70ac): kobject_add_internal: parent: 'queues', set: 'queues'
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457669
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f9bd200bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457669
RDX: 000000000000058f RSI: 00000000200033c0 RDI: 0000000000000003
kobject: 'tx-0' (00000000443b70ac): kobject_uevent_env
RBP: 000000000072bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9bd200c6d4
R13: 00000000004c2dcc R14: 00000000004da398 R15: 00000000ffffffff

Allocated by task 1304:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
 __do_kmalloc_node mm/slab.c:3684 [inline]
 __kmalloc_node_track_caller+0x50/0x70 mm/slab.c:3698
 __kmalloc_reserve.isra.41+0x41/0xe0 net/core/skbuff.c:140
 __alloc_skb+0x155/0x760 net/core/skbuff.c:208
kobject: 'tx-0' (00000000443b70ac): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/tx-0'
 alloc_skb include/linux/skbuff.h:1011 [inline]
 __ip6_append_data.isra.49+0x2f1a/0x3f50 net/ipv6/ip6_output.c:1450
 ip6_append_data+0x1bc/0x2d0 net/ipv6/ip6_output.c:1619
 rawv6_sendmsg+0x15ab/0x4b40 net/ipv6/raw.c:938
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:631
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2116
 __sys_sendmsg+0x11d/0x280 net/socket.c:2154
 __do_sys_sendmsg net/socket.c:2163 [inline]
 __se_sys_sendmsg net/socket.c:2161 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2161
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
kobject: 'gre0' (00000000cb1b2d7b): kobject_add_internal: parent: 'net', set: 'devices'

Freed by task 1304:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kfree+0xcf/0x230 mm/slab.c:3817
 skb_free_head+0x93/0xb0 net/core/skbuff.c:553
 pskb_expand_head+0x3b2/0x10d0 net/core/skbuff.c:1498
 __pskb_pull_tail+0x156/0x18a0 net/core/skbuff.c:1896
 pskb_may_pull include/linux/skbuff.h:2188 [inline]
 _decode_session6+0xd11/0x14d0 net/ipv6/xfrm6_policy.c:150
 __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:3272
kobject: 'gre0' (00000000cb1b2d7b): kobject_uevent_env
 __xfrm_policy_check+0x380/0x2c40 net/xfrm/xfrm_policy.c:3322
 __xfrm_policy_check2 include/net/xfrm.h:1170 [inline]
 xfrm_policy_check include/net/xfrm.h:1175 [inline]
 xfrm6_policy_check include/net/xfrm.h:1185 [inline]
 vti6_rcv+0x4bd/0x8f3 net/ipv6/ip6_vti.c:316
 xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132
 ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394
 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443
 ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537
 dst_input include/net/dst.h:450 [inline]
 ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272
 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083
 process_backlog+0x24e/0x7a0 net/core/dev.c:5923
kobject: 'gre0' (00000000cb1b2d7b): fill_kobj_path: path = '/devices/virtual/net/gre0'
 napi_poll net/core/dev.c:6346 [inline]
 net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412
 __do_softirq+0x308/0xb7e kernel/softirq.c:292

The buggy address belongs to the object at ffff888191b8cac0
 which belongs to the cache kmalloc-512 of size 512
The buggy address is located 176 bytes inside of
 512-byte region [ffff888191b8cac0, ffff888191b8ccc0)
The buggy address belongs to the page:
page:ffffea000646e300 count:1 mapcount:0 mapping:ffff8881da800940 index:0x0
flags: 0x2fffc0000000200(slab)
raw: 02fffc0000000200 ffffea0006eaaa48 ffffea00065356c8 ffff8881da800940
raw: 0000000000000000 ffff888191b8c0c0 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected
kobject: 'queues' (000000005fd6226e): kobject_add_internal: parent: 'gre0', set: '<NULL>'

Memory state around the buggy address:
 ffff888191b8ca00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888191b8ca80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>ffff888191b8cb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
 ffff888191b8cb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888191b8cc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 0d3c703 ("ipv6: Cleanup IPv6 tunnel receive path")
Fixes: ed1efb2 ("ipv6: Add support for IPsec virtual tunnel interfaces")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: Drop change in ipxip6_rcv()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Jul 8, 2020
Tests showed this BUG:
[572555.252867] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
[572555.252876] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 131031, name: smcapp
[572555.252879] INFO: lockdep is turned off.
[572555.252883] CPU: 1 PID: 131031 Comm: smcapp Tainted: G           O      5.7.0-rc3uschi+ torvalds#356
[572555.252885] Hardware name: IBM 3906 M03 703 (LPAR)
[572555.252887] Call Trace:
[572555.252896]  [<00000000ac364554>] show_stack+0x94/0xe8
[572555.252901]  [<00000000aca1f400>] dump_stack+0xa0/0xe0
[572555.252906]  [<00000000ac3c8c10>] ___might_sleep+0x260/0x280
[572555.252910]  [<00000000acdc0c98>] __mutex_lock+0x48/0x940
[572555.252912]  [<00000000acdc15c2>] mutex_lock_nested+0x32/0x40
[572555.252975]  [<000003ff801762d0>] mlx5_lag_get_roce_netdev+0x30/0xc0 [mlx5_core]
[572555.252996]  [<000003ff801fb3aa>] mlx5_ib_get_netdev+0x3a/0xe0 [mlx5_ib]
[572555.253007]  [<000003ff80063848>] smc_pnet_find_roce_resource+0x1d8/0x310 [smc]
[572555.253011]  [<000003ff800602f0>] __smc_connect+0x1f0/0x3e0 [smc]
[572555.253015]  [<000003ff80060634>] smc_connect+0x154/0x190 [smc]
[572555.253022]  [<00000000acbed8d4>] __sys_connect+0x94/0xd0
[572555.253025]  [<00000000acbef620>] __s390x_sys_socketcall+0x170/0x360
[572555.253028]  [<00000000acdc6800>] system_call+0x298/0x2b8
[572555.253030] INFO: lockdep is turned off.

Function smc_pnet_find_rdma_dev() might be called from
smc_pnet_find_roce_resource(). It holds the smc_ib_devices list
spinlock while calling infiniband op get_netdev(). At least for mlx5
the get_netdev operation wants mutex serialization, which conflicts
with the smc_ib_devices spinlock.
This patch switches the smc_ib_devices spinlock into a mutex to
allow sleeping when calling get_netdev().

Fixes: a4cf044 ("smc: introduce SMC as an IB-client")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
ruscur pushed a commit to ruscur/linux that referenced this pull request Jul 9, 2020
Tests showed this BUG:
[572555.252867] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
[572555.252876] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 131031, name: smcapp
[572555.252879] INFO: lockdep is turned off.
[572555.252883] CPU: 1 PID: 131031 Comm: smcapp Tainted: G           O      5.7.0-rc3uschi+ torvalds#356
[572555.252885] Hardware name: IBM 3906 M03 703 (LPAR)
[572555.252887] Call Trace:
[572555.252896]  [<00000000ac364554>] show_stack+0x94/0xe8
[572555.252901]  [<00000000aca1f400>] dump_stack+0xa0/0xe0
[572555.252906]  [<00000000ac3c8c10>] ___might_sleep+0x260/0x280
[572555.252910]  [<00000000acdc0c98>] __mutex_lock+0x48/0x940
[572555.252912]  [<00000000acdc15c2>] mutex_lock_nested+0x32/0x40
[572555.252975]  [<000003ff801762d0>] mlx5_lag_get_roce_netdev+0x30/0xc0 [mlx5_core]
[572555.252996]  [<000003ff801fb3aa>] mlx5_ib_get_netdev+0x3a/0xe0 [mlx5_ib]
[572555.253007]  [<000003ff80063848>] smc_pnet_find_roce_resource+0x1d8/0x310 [smc]
[572555.253011]  [<000003ff800602f0>] __smc_connect+0x1f0/0x3e0 [smc]
[572555.253015]  [<000003ff80060634>] smc_connect+0x154/0x190 [smc]
[572555.253022]  [<00000000acbed8d4>] __sys_connect+0x94/0xd0
[572555.253025]  [<00000000acbef620>] __s390x_sys_socketcall+0x170/0x360
[572555.253028]  [<00000000acdc6800>] system_call+0x298/0x2b8
[572555.253030] INFO: lockdep is turned off.

Function smc_pnet_find_rdma_dev() might be called from
smc_pnet_find_roce_resource(). It holds the smc_ib_devices list
spinlock while calling infiniband op get_netdev(). At least for mlx5
the get_netdev operation wants mutex serialization, which conflicts
with the smc_ib_devices spinlock.
This patch switches the smc_ib_devices spinlock into a mutex to
allow sleeping when calling get_netdev().

Fixes: a4cf044 ("smc: introduce SMC as an IB-client")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fifteenhex pushed a commit to fifteenhex/linux that referenced this pull request Aug 1, 2020
Tests showed this BUG:
[572555.252867] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
[572555.252876] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 131031, name: smcapp
[572555.252879] INFO: lockdep is turned off.
[572555.252883] CPU: 1 PID: 131031 Comm: smcapp Tainted: G           O      5.7.0-rc3uschi+ torvalds#356
[572555.252885] Hardware name: IBM 3906 M03 703 (LPAR)
[572555.252887] Call Trace:
[572555.252896]  [<00000000ac364554>] show_stack+0x94/0xe8
[572555.252901]  [<00000000aca1f400>] dump_stack+0xa0/0xe0
[572555.252906]  [<00000000ac3c8c10>] ___might_sleep+0x260/0x280
[572555.252910]  [<00000000acdc0c98>] __mutex_lock+0x48/0x940
[572555.252912]  [<00000000acdc15c2>] mutex_lock_nested+0x32/0x40
[572555.252975]  [<000003ff801762d0>] mlx5_lag_get_roce_netdev+0x30/0xc0 [mlx5_core]
[572555.252996]  [<000003ff801fb3aa>] mlx5_ib_get_netdev+0x3a/0xe0 [mlx5_ib]
[572555.253007]  [<000003ff80063848>] smc_pnet_find_roce_resource+0x1d8/0x310 [smc]
[572555.253011]  [<000003ff800602f0>] __smc_connect+0x1f0/0x3e0 [smc]
[572555.253015]  [<000003ff80060634>] smc_connect+0x154/0x190 [smc]
[572555.253022]  [<00000000acbed8d4>] __sys_connect+0x94/0xd0
[572555.253025]  [<00000000acbef620>] __s390x_sys_socketcall+0x170/0x360
[572555.253028]  [<00000000acdc6800>] system_call+0x298/0x2b8
[572555.253030] INFO: lockdep is turned off.

Function smc_pnet_find_rdma_dev() might be called from
smc_pnet_find_roce_resource(). It holds the smc_ib_devices list
spinlock while calling infiniband op get_netdev(). At least for mlx5
the get_netdev operation wants mutex serialization, which conflicts
with the smc_ib_devices spinlock.
This patch switches the smc_ib_devices spinlock into a mutex to
allow sleeping when calling get_netdev().

Fixes: a4cf044 ("smc: introduce SMC as an IB-client")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kyak pushed a commit to kyak/linux-odroid that referenced this pull request Dec 16, 2022
Correct cooling device maps for odroid xu3/4
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Nov 27, 2023
With latest upstream llvm18, the following test cases failed:
  $ ./test_progs -j
  torvalds#13/2    bpf_cookie/multi_kprobe_link_api:FAIL
  torvalds#13/3    bpf_cookie/multi_kprobe_attach_api:FAIL
  torvalds#13      bpf_cookie:FAIL
  torvalds#77      fentry_fexit:FAIL
  torvalds#78/1    fentry_test/fentry:FAIL
  torvalds#78      fentry_test:FAIL
  torvalds#82/1    fexit_test/fexit:FAIL
  torvalds#82      fexit_test:FAIL
  torvalds#112/1   kprobe_multi_test/skel_api:FAIL
  torvalds#112/2   kprobe_multi_test/link_api_addrs:FAIL
  ...
  torvalds#112     kprobe_multi_test:FAIL
  torvalds#356/17  test_global_funcs/global_func17:FAIL
  torvalds#356     test_global_funcs:FAIL

Further analysis shows llvm upstream patch [1] is responsible
for the above failures. For example, for function bpf_fentry_test7()
in net/bpf/test_run.c, without [1], the asm code is:
  0000000000000400 <bpf_fentry_test7>:
     400: f3 0f 1e fa                   endbr64
     404: e8 00 00 00 00                callq   0x409 <bpf_fentry_test7+0x9>
     409: 48 89 f8                      movq    %rdi, %rax
     40c: c3                            retq
     40d: 0f 1f 00                      nopl    (%rax)
and with [1], the asm code is:
  0000000000005d20 <bpf_fentry_test7.specialized.1>:
    5d20: e8 00 00 00 00                callq   0x5d25 <bpf_fentry_test7.specialized.1+0x5>
    5d25: c3                            retq
and <bpf_fentry_test7.specialized.1> is called instead of <bpf_fentry_test7>
and this caused test failures for torvalds#13/torvalds#77 etc. except torvalds#356.

For test case torvalds#356/17, with [1] (progs/test_global_func17.c)),
the main prog looks like:
  0000000000000000 <global_func17>:
       0:       b4 00 00 00 2a 00 00 00 w0 = 0x2a
       1:       95 00 00 00 00 00 00 00 exit
which passed verification while the test itself expects a verification
failure.

Let us add 'barrier_var' style asm code in both places to prevent
function specialization which caused selftests failure.

  [1] llvm/llvm-project#72903

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
alobakin pushed a commit to alobakin/linux that referenced this pull request Nov 27, 2023
With latest upstream llvm18, the following test cases failed:

  $ ./test_progs -j
  #13/2    bpf_cookie/multi_kprobe_link_api:FAIL
  #13/3    bpf_cookie/multi_kprobe_attach_api:FAIL
  #13      bpf_cookie:FAIL
  torvalds#77      fentry_fexit:FAIL
  torvalds#78/1    fentry_test/fentry:FAIL
  torvalds#78      fentry_test:FAIL
  torvalds#82/1    fexit_test/fexit:FAIL
  torvalds#82      fexit_test:FAIL
  torvalds#112/1   kprobe_multi_test/skel_api:FAIL
  torvalds#112/2   kprobe_multi_test/link_api_addrs:FAIL
  [...]
  torvalds#112     kprobe_multi_test:FAIL
  torvalds#356/17  test_global_funcs/global_func17:FAIL
  torvalds#356     test_global_funcs:FAIL

Further analysis shows llvm upstream patch [1] is responsible for the above
failures. For example, for function bpf_fentry_test7() in net/bpf/test_run.c,
without [1], the asm code is:

  0000000000000400 <bpf_fentry_test7>:
     400: f3 0f 1e fa                   endbr64
     404: e8 00 00 00 00                callq   0x409 <bpf_fentry_test7+0x9>
     409: 48 89 f8                      movq    %rdi, %rax
     40c: c3                            retq
     40d: 0f 1f 00                      nopl    (%rax)

... and with [1], the asm code is:

  0000000000005d20 <bpf_fentry_test7.specialized.1>:
    5d20: e8 00 00 00 00                callq   0x5d25 <bpf_fentry_test7.specialized.1+0x5>
    5d25: c3                            retq

... and <bpf_fentry_test7.specialized.1> is called instead of <bpf_fentry_test7>
and this caused test failures for #13/torvalds#77 etc. except torvalds#356.

For test case torvalds#356/17, with [1] (progs/test_global_func17.c)), the main prog
looks like:

  0000000000000000 <global_func17>:
       0:       b4 00 00 00 2a 00 00 00 w0 = 0x2a
       1:       95 00 00 00 00 00 00 00 exit

... which passed verification while the test itself expects a verification
failure.

Let us add 'barrier_var' style asm code in both places to prevent function
specialization which caused selftests failure.

  [1] llvm/llvm-project#72903

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231127050342.1945270-1-yonghong.song@linux.dev
Kaz205 pushed a commit to Kaz205/linux that referenced this pull request Feb 5, 2024
[ Upstream commit b16904f ]

With latest upstream llvm18, the following test cases failed:

  $ ./test_progs -j
  torvalds#13/2    bpf_cookie/multi_kprobe_link_api:FAIL
  torvalds#13/3    bpf_cookie/multi_kprobe_attach_api:FAIL
  torvalds#13      bpf_cookie:FAIL
  torvalds#77      fentry_fexit:FAIL
  torvalds#78/1    fentry_test/fentry:FAIL
  torvalds#78      fentry_test:FAIL
  torvalds#82/1    fexit_test/fexit:FAIL
  torvalds#82      fexit_test:FAIL
  torvalds#112/1   kprobe_multi_test/skel_api:FAIL
  torvalds#112/2   kprobe_multi_test/link_api_addrs:FAIL
  [...]
  torvalds#112     kprobe_multi_test:FAIL
  torvalds#356/17  test_global_funcs/global_func17:FAIL
  torvalds#356     test_global_funcs:FAIL

Further analysis shows llvm upstream patch [1] is responsible for the above
failures. For example, for function bpf_fentry_test7() in net/bpf/test_run.c,
without [1], the asm code is:

  0000000000000400 <bpf_fentry_test7>:
     400: f3 0f 1e fa                   endbr64
     404: e8 00 00 00 00                callq   0x409 <bpf_fentry_test7+0x9>
     409: 48 89 f8                      movq    %rdi, %rax
     40c: c3                            retq
     40d: 0f 1f 00                      nopl    (%rax)

... and with [1], the asm code is:

  0000000000005d20 <bpf_fentry_test7.specialized.1>:
    5d20: e8 00 00 00 00                callq   0x5d25 <bpf_fentry_test7.specialized.1+0x5>
    5d25: c3                            retq

... and <bpf_fentry_test7.specialized.1> is called instead of <bpf_fentry_test7>
and this caused test failures for torvalds#13/torvalds#77 etc. except torvalds#356.

For test case torvalds#356/17, with [1] (progs/test_global_func17.c)), the main prog
looks like:

  0000000000000000 <global_func17>:
       0:       b4 00 00 00 2a 00 00 00 w0 = 0x2a
       1:       95 00 00 00 00 00 00 00 exit

... which passed verification while the test itself expects a verification
failure.

Let us add 'barrier_var' style asm code in both places to prevent function
specialization which caused selftests failure.

  [1] llvm/llvm-project#72903

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231127050342.1945270-1-yonghong.song@linux.dev
Signed-off-by: Sasha Levin <sashal@kernel.org>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Feb 5, 2024
[ Upstream commit b16904f ]

With latest upstream llvm18, the following test cases failed:

  $ ./test_progs -j
  torvalds#13/2    bpf_cookie/multi_kprobe_link_api:FAIL
  torvalds#13/3    bpf_cookie/multi_kprobe_attach_api:FAIL
  torvalds#13      bpf_cookie:FAIL
  torvalds#77      fentry_fexit:FAIL
  torvalds#78/1    fentry_test/fentry:FAIL
  torvalds#78      fentry_test:FAIL
  torvalds#82/1    fexit_test/fexit:FAIL
  torvalds#82      fexit_test:FAIL
  torvalds#112/1   kprobe_multi_test/skel_api:FAIL
  torvalds#112/2   kprobe_multi_test/link_api_addrs:FAIL
  [...]
  torvalds#112     kprobe_multi_test:FAIL
  torvalds#356/17  test_global_funcs/global_func17:FAIL
  torvalds#356     test_global_funcs:FAIL

Further analysis shows llvm upstream patch [1] is responsible for the above
failures. For example, for function bpf_fentry_test7() in net/bpf/test_run.c,
without [1], the asm code is:

  0000000000000400 <bpf_fentry_test7>:
     400: f3 0f 1e fa                   endbr64
     404: e8 00 00 00 00                callq   0x409 <bpf_fentry_test7+0x9>
     409: 48 89 f8                      movq    %rdi, %rax
     40c: c3                            retq
     40d: 0f 1f 00                      nopl    (%rax)

... and with [1], the asm code is:

  0000000000005d20 <bpf_fentry_test7.specialized.1>:
    5d20: e8 00 00 00 00                callq   0x5d25 <bpf_fentry_test7.specialized.1+0x5>
    5d25: c3                            retq

... and <bpf_fentry_test7.specialized.1> is called instead of <bpf_fentry_test7>
and this caused test failures for torvalds#13/torvalds#77 etc. except torvalds#356.

For test case torvalds#356/17, with [1] (progs/test_global_func17.c)), the main prog
looks like:

  0000000000000000 <global_func17>:
       0:       b4 00 00 00 2a 00 00 00 w0 = 0x2a
       1:       95 00 00 00 00 00 00 00 exit

... which passed verification while the test itself expects a verification
failure.

Let us add 'barrier_var' style asm code in both places to prevent function
specialization which caused selftests failure.

  [1] llvm/llvm-project#72903

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231127050342.1945270-1-yonghong.song@linux.dev
Signed-off-by: Sasha Levin <sashal@kernel.org>
klarasm pushed a commit to klarasm/linux that referenced this pull request Oct 29, 2024
KASAN reports that the GPU metrics table allocated in
vangogh_tables_init() is not large enough for the memset done in
smu_cmn_init_soft_gpu_metrics(). Condensed report follows:

[   33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu]
[   33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067
...
[   33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G        W          6.12.0-rc4 torvalds#356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544
[   33.861816] Tainted: [W]=WARN
[   33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023
[   33.861822] Call Trace:
[   33.861826]  <TASK>
[   33.861829]  dump_stack_lvl+0x66/0x90
[   33.861838]  print_report+0xce/0x620
[   33.861853]  kasan_report+0xda/0x110
[   33.862794]  kasan_check_range+0xfd/0x1a0
[   33.862799]  __asan_memset+0x23/0x40
[   33.862803]  smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.863306]  vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.864257]  vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.865682]  amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.866160]  amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.867135]  dev_attr_show+0x43/0xc0
[   33.867147]  sysfs_kf_seq_show+0x1f1/0x3b0
[   33.867155]  seq_read_iter+0x3f8/0x1140
[   33.867173]  vfs_read+0x76c/0xc50
[   33.867198]  ksys_read+0xfb/0x1d0
[   33.867214]  do_syscall_64+0x90/0x160
...
[   33.867353] Allocated by task 378 on cpu 7 at 22.794876s:
[   33.867358]  kasan_save_stack+0x33/0x50
[   33.867364]  kasan_save_track+0x17/0x60
[   33.867367]  __kasan_kmalloc+0x87/0x90
[   33.867371]  vangogh_init_smc_tables+0x3f9/0x840 [amdgpu]
[   33.867835]  smu_sw_init+0xa32/0x1850 [amdgpu]
[   33.868299]  amdgpu_device_init+0x467b/0x8d90 [amdgpu]
[   33.868733]  amdgpu_driver_load_kms+0x19/0xf0 [amdgpu]
[   33.869167]  amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu]
[   33.869608]  local_pci_probe+0xda/0x180
[   33.869614]  pci_device_probe+0x43f/0x6b0

Empirically we can confirm that the former allocates 152 bytes for the
table, while the latter memsets the 168 large block.

Root cause appears that when GPU metrics tables for v2_4 parts were added
it was not considered to enlarge the table to fit.

The fix in this patch is rather "brute force" and perhaps later should be
done in a smarter way, by extracting and consolidating the part version to
size logic to a common helper, instead of brute forcing the largest
possible allocation. Nevertheless, for now this works and fixes the out of
bounds write.

v2:
 * Drop impossible v3_0 case. (Mario)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: 41cec40 ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Evan Quan <evan.quan@amd.com>
Cc: Wenyou Yang <WenYou.Yang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241025145639.19124-1-tursulin@igalia.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Nov 1, 2024
KASAN reports that the GPU metrics table allocated in
vangogh_tables_init() is not large enough for the memset done in
smu_cmn_init_soft_gpu_metrics(). Condensed report follows:

[   33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu]
[   33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067
...
[   33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G        W          6.12.0-rc4 torvalds#356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544
[   33.861816] Tainted: [W]=WARN
[   33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023
[   33.861822] Call Trace:
[   33.861826]  <TASK>
[   33.861829]  dump_stack_lvl+0x66/0x90
[   33.861838]  print_report+0xce/0x620
[   33.861853]  kasan_report+0xda/0x110
[   33.862794]  kasan_check_range+0xfd/0x1a0
[   33.862799]  __asan_memset+0x23/0x40
[   33.862803]  smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.863306]  vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.864257]  vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.865682]  amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.866160]  amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.867135]  dev_attr_show+0x43/0xc0
[   33.867147]  sysfs_kf_seq_show+0x1f1/0x3b0
[   33.867155]  seq_read_iter+0x3f8/0x1140
[   33.867173]  vfs_read+0x76c/0xc50
[   33.867198]  ksys_read+0xfb/0x1d0
[   33.867214]  do_syscall_64+0x90/0x160
...
[   33.867353] Allocated by task 378 on cpu 7 at 22.794876s:
[   33.867358]  kasan_save_stack+0x33/0x50
[   33.867364]  kasan_save_track+0x17/0x60
[   33.867367]  __kasan_kmalloc+0x87/0x90
[   33.867371]  vangogh_init_smc_tables+0x3f9/0x840 [amdgpu]
[   33.867835]  smu_sw_init+0xa32/0x1850 [amdgpu]
[   33.868299]  amdgpu_device_init+0x467b/0x8d90 [amdgpu]
[   33.868733]  amdgpu_driver_load_kms+0x19/0xf0 [amdgpu]
[   33.869167]  amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu]
[   33.869608]  local_pci_probe+0xda/0x180
[   33.869614]  pci_device_probe+0x43f/0x6b0

Empirically we can confirm that the former allocates 152 bytes for the
table, while the latter memsets the 168 large block.

Root cause appears that when GPU metrics tables for v2_4 parts were added
it was not considered to enlarge the table to fit.

The fix in this patch is rather "brute force" and perhaps later should be
done in a smarter way, by extracting and consolidating the part version to
size logic to a common helper, instead of brute forcing the largest
possible allocation. Nevertheless, for now this works and fixes the out of
bounds write.

v2:
 * Drop impossible v3_0 case. (Mario)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: 41cec40 ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Evan Quan <evan.quan@amd.com>
Cc: Wenyou Yang <WenYou.Yang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241025145639.19124-1-tursulin@igalia.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0880f58)
Cc: stable@vger.kernel.org # v6.6+
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 27, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 27, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 1, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 10, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 11, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 15, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 20, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 21, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 21, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 21, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 27, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 27, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 27, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 27, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 27, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 28, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 30, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Nov 3, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Nov 5, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Nov 5, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Nov 7, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Nov 15, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Nov 16, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Nov 20, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Nov 22, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Nov 23, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Nov 23, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Nov 23, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Nov 25, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Nov 26, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant