Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syzkaller: possible deadlock in sk_clone_lock #447

Closed
cpaasch opened this issue Oct 18, 2023 · 5 comments
Closed

syzkaller: possible deadlock in sk_clone_lock #447

cpaasch opened this issue Oct 18, 2023 · 5 comments

Comments

@cpaasch
Copy link
Member

cpaasch commented Oct 18, 2023

syzkaller-ID: 664c5311cf44e3aa732eaa6b2e79eb4a8961ec08

HEAD: 7a5720a

Trace:

============================================
WARNING: possible recursive locking detected
6.6.0-rc4-g7a5720a344e7 #57 Not tainted
--------------------------------------------
syz-executor.4/9364 is trying to acquire lock:
ffff8880171052f0 (k-slock-AF_INET6){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffff8880171052f0 (k-slock-AF_INET6){+.-.}-{2:2}, at: sk_clone_lock+0x129/0x6c0 net/core/sock.c:2310

but task is already holding lock:
ffff88805f3750b0 (k-slock-AF_INET6){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffff88805f3750b0 (k-slock-AF_INET6){+.-.}-{2:2}, at: sk_clone_lock+0x129/0x6c0 net/core/sock.c:2310

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(k-slock-AF_INET6);
  lock(k-slock-AF_INET6);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

7 locks held by syz-executor.4/9364:
 #0: ffff888032c9b970 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1722 [inline]
 #0: ffff888032c9b970 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x41/0x8e0 net/mptcp/protocol.c:1792
 #1: ffff888059892470 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1722 [inline]
 #1: ffff888059892470 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg_fastopen+0x7f/0x200 net/mptcp/protocol.c:1737
 #2: ffffffff8315b6f8 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:303 [inline]
 #2: ffffffff8315b6f8 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:749 [inline]
 #2: ffffffff8315b6f8 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x37/0xa00 net/ipv4/ip_output.c:468
 #3: ffffffff8315b6f8 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:303 [inline]
 #3: ffffffff8315b6f8 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:749 [inline]
 #3: ffffffff8315b6f8 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x1c7/0x850 net/ipv4/ip_output.c:226
 #4: ffffffff8315b6f8 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:303 [inline]
 #4: ffffffff8315b6f8 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:749 [inline]
 #4: ffffffff8315b6f8 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x1a6/0x3e0 net/core/dev.c:5968
 #5: ffffffff8315b6f8 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:303 [inline]
 #5: ffffffff8315b6f8 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:749 [inline]
 #5: ffffffff8315b6f8 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0xb9/0x260 net/ipv4/ip_input.c:232
 #6: ffff88805f3750b0 (k-slock-AF_INET6){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
 #6: ffff88805f3750b0 (k-slock-AF_INET6){+.-.}-{2:2}, at: sk_clone_lock+0x129/0x6c0 net/core/sock.c:2310

stack backtrace:
CPU: 1 PID: 9364 Comm: syz-executor.4 Not tainted 6.6.0-rc4-g7a5720a344e7 #57
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xad/0xf0 lib/dump_stack.c:106
 check_deadlock kernel/locking/lockdep.c:3062 [inline]
 validate_chain kernel/locking/lockdep.c:3855 [inline]
 __lock_acquire+0x12b6/0x2dd0 kernel/locking/lockdep.c:5136
 lock_acquire+0xdd/0x230 kernel/locking/lockdep.c:5753
 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
 _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
 spin_lock include/linux/spinlock.h:351 [inline]
 sk_clone_lock+0x129/0x6c0 net/core/sock.c:2310
 mptcp_sk_clone_init+0x2c/0x410 net/mptcp/protocol.c:3173
 subflow_syn_recv_sock+0x41d/0x7e0 net/mptcp/subflow.c:818
 tcp_check_req+0x794/0x950 net/ipv4/tcp_minisocks.c:811
 tcp_v4_rcv+0xeee/0x16c0 net/ipv4/tcp_ipv4.c:2078
 ip_protocol_deliver_rcu+0x300/0x5e0 net/ipv4/ip_input.c:205
 ip_local_deliver_finish+0x175/0x260 net/ipv4/ip_input.c:233
 NF_HOOK+0x25d/0x2d0 include/linux/netfilter.h:304
 NF_HOOK+0x25d/0x2d0 include/linux/netfilter.h:304
 __netif_receive_skb+0xe4/0x210 net/core/dev.c:5527
 process_backlog+0x24f/0x3e0 net/core/dev.c:5969
 __napi_poll+0x46/0x330 net/core/dev.c:6531
 napi_poll net/core/dev.c:6598 [inline]
 net_rx_action+0x216/0x4e0 net/core/dev.c:6731
 __do_softirq+0x158/0x3e6 kernel/softirq.c:553
 do_softirq+0x8b/0xd0 kernel/softirq.c:454
 </IRQ>
 <TASK>
 __local_bh_enable_ip+0x10c/0x120 kernel/softirq.c:381
 local_bh_enable include/linux/bottom_half.h:33 [inline]
 rcu_read_unlock_bh include/linux/rcupdate.h:819 [inline]
 __dev_queue_xmit+0x107c/0x1c80 net/core/dev.c:4371
 dev_queue_xmit include/linux/netdevice.h:3092 [inline]
 neigh_hh_output include/net/neighbour.h:526 [inline]
 neigh_output include/net/neighbour.h:540 [inline]
 ip_finish_output2+0x693/0x850 net/ipv4/ip_output.c:233
 __ip_queue_xmit+0x8dc/0xa00 net/ipv4/ip_output.c:533
 __tcp_transmit_skb+0xdf5/0x1000 net/ipv4/tcp_output.c:1408
 tcp_rcv_state_process+0x16ed/0x1750 net/ipv4/tcp_input.c:6383
 tcp_v4_do_rcv+0x3cc/0x610 net/ipv4/tcp_ipv4.c:1752
 __release_sock+0xcf/0x150 net/core/sock.c:2976
 release_sock+0x38/0x110 net/core/sock.c:3518
 mptcp_sendmsg_fastopen+0xb9/0x200 net/mptcp/protocol.c:1743
 mptcp_sendmsg+0x81d/0x8e0 net/mptcp/protocol.c:1798
 __sock_sendmsg+0x83/0xd0 net/socket.c:730
 __sys_sendto+0x20d/0x2b0 net/socket.c:2194
 __do_sys_sendto net/socket.c:2206 [inline]
 __se_sys_sendto net/socket.c:2202 [inline]
 __x64_sys_sendto+0x28/0x30 net/socket.c:2202
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0033:0x7f52b32bc6a9
Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007f52b25e9cd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00000000006bbf80 RCX: 00007f52b32bc6a9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000020000040 R09: 0000000000000010
R10: 000000002002c011 R11: 0000000000000246 R12: 00000000006bbf8c
R13: fffffffffffffea8 R14: 00000000006bbf80 R15: 000000000001fe40
 </TASK>
netlink: 36 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 36 bytes leftover after parsing attributes in process `syz-executor.5'.
FAULT_INJECTION: forcing a failure.
name fail_usercopy, interval 1, probability 0, space 0, times 0
CPU: 2 PID: 10015 Comm: syz-executor.7 Not tainted 6.6.0-rc4-g7a5720a344e7 #57
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xad/0xf0 lib/dump_stack.c:106
 should_fail_ex+0x156/0x160 lib/fault-inject.c:52
 _copy_from_user+0x2f/0xd0 lib/usercopy.c:15
 copy_from_user include/linux/uaccess.h:183 [inline]
 copy_msghdr_from_user net/socket.c:2488 [inline]
 sendmsg_copy_msghdr net/socket.c:2589 [inline]
 ___sys_sendmsg+0x7e/0x190 net/socket.c:2608
FAULT_INJECTION: forcing a failure.
name fail_usercopy, interval 1, probability 0, space 0, times 0
 __x64_sys_sendmsg+0xd8/0x150 net/socket.c:2641
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0033:0x7fbb9c0896a9
Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007fbb9b3b6cd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000006bbf80 RCX: 00007fbb9c0896a9
RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000003
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006bbf8c
R13: fffffffffffffea8 R14: 00000000006bbf80 R15: 000000000001fe40
 </TASK>
CPU: 0 PID: 10032 Comm: syz-executor.1 Not tainted 6.6.0-rc4-g7a5720a344e7 #57
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xad/0xf0 lib/dump_stack.c:106
 should_fail_ex+0x156/0x160 lib/fault-inject.c:52
 _copy_from_user+0x2f/0xd0 lib/usercopy.c:15
 copy_from_user include/linux/uaccess.h:183 [inline]
 move_addr_to_kernel net/socket.c:253 [inline]
 __sys_bind+0x103/0x200 net/socket.c:1843
 __do_sys_bind net/socket.c:1860 [inline]
 __se_sys_bind net/socket.c:1858 [inline]
 __x64_sys_bind+0x1c/0x30 net/socket.c:1858
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0033:0x7fa9db0ca6a9
Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007fa9da3f7cd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
RAX: ffffffffffffffda RBX: 00000000006bbf80 RCX: 00007fa9db0ca6a9
RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000003
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006bbf8c
R13: fffffffffffffea8 R14: 00000000006bbf80 R15: 000000000001fe40
 </TASK>
FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
CPU: 2 PID: 10048 Comm: syz-executor.7 Not tainted 6.6.0-rc4-g7a5720a344e7 #57
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xad/0xf0 lib/dump_stack.c:106
 should_fail_ex+0x156/0x160 lib/fault-inject.c:52
 should_failslab+0x9/0x20 mm/slab_common.c:1509
 slab_pre_alloc_hook+0x59/0x310 mm/slab.h:711
 kmem_cache_alloc_node+0x57/0x2a0 mm/slub.c:3460
 __alloc_skb+0xe2/0x260 net/core/skbuff.c:640
 alloc_skb include/linux/skbuff.h:1286 [inline]
 pfkey_sendmsg+0xdd/0x840 net/key/af_key.c:3685
 __sock_sendmsg+0xa4/0xd0 net/socket.c:730
 ____sys_sendmsg+0x22d/0x320 net/socket.c:2558
 ___sys_sendmsg+0x143/0x190 net/socket.c:2612
 __x64_sys_sendmsg+0xd8/0x150 net/socket.c:2641
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0033:0x7fbb9c0896a9
Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007fbb9b3b6cd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000006bbf80 RCX: 00007fbb9c0896a9
RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000003
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006bbf8c
R13: fffffffffffffea8 R14: 00000000006bbf80 R15: 000000000001fe40
 </TASK>

Kconfig:
Kconfig_k5_lockdep.txt

No reproducer yet.

@pabeni
Copy link

pabeni commented Oct 18, 2023

============================================
WARNING: possible recursive locking detected
6.6.0-rc4-g7a5720a344e7 #57 Not tainted
--------------------------------------------
syz-executor.4/9364 is trying to acquire lock:
ffff8880171052f0 (k-slock-AF_INET6){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffff8880171052f0 (k-slock-AF_INET6){+.-.}-{2:2}, at: sk_clone_lock+0x129/0x6c0 net/core/sock.c:2310

but task is already holding lock:
ffff88805f3750b0 (k-slock-AF_INET6){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffff88805f3750b0 (k-slock-AF_INET6){+.-.}-{2:2}, at: sk_clone_lock+0x129/0x6c0 net/core/sock.c:2310

[...]

__do_softirq+0x158/0x3e6 kernel/softirq.c:553
do_softirq+0x8b/0xd0 kernel/softirq.c:454


__local_bh_enable_ip+0x10c/0x120 kernel/softirq.c:381
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:819 [inline]
__dev_queue_xmit+0x107c/0x1c80 net/core/dev.c:4371
dev_queue_xmit include/linux/netdevice.h:3092 [inline]
neigh_hh_output include/net/neighbour.h:526 [inline]
neigh_output include/net/neighbour.h:540 [inline]
ip_finish_output2+0x693/0x850 net/ipv4/ip_output.c:233
__ip_queue_xmit+0x8dc/0xa00 net/ipv4/ip_output.c:533
__tcp_transmit_skb+0xdf5/0x1000 net/ipv4/tcp_output.c:1408
tcp_rcv_state_process+0x16ed/0x1750 net/ipv4/tcp_input.c:6383
tcp_v4_do_rcv+0x3cc/0x610 net/ipv4/tcp_ipv4.c:1752
__release_sock+0xcf/0x150 net/core/sock.c:2976
release_sock+0x38/0x110 net/core/sock.c:3518
[...]

The issue does not look like mptcp related (or memory corruption happens before the reported splat) because the above quoted stacktrace looks buggy: release_sock() disables local BH processing, the rcu_read_lock_bh()/rcu_read_unlock_bh() pair in __dev_queue_xmit() should not lead to irq processing, but indeed __do_softirq() is invoked.

it looks like current->preempt_count is corrupted in between such calls?!? I think the lock debug options enabled in the current build should trigger if we have a pair of locking primitives mismatched ?!?

All in all it looks confusing, and IIRC this is the 2nd stack trace we see in recent time that could be possibly a side effect of some random memory corruption. Perhaps it would be worthy increase the number of runners/configs with kasan enabled (and ev decreasing the number of config without such option, as needed)?!?

Have you observed similar splats without any mptcp reference?

@cpaasch
Copy link
Member Author

cpaasch commented Oct 18, 2023

I'm updating all my instances with KASAN. Any other "Memory Debugging" options you think may help ?

I do wonder if the fail-injections could play a role here ?

@cpaasch
Copy link
Member Author

cpaasch commented Oct 18, 2023

And no - I don't see other similar splats.

@pabeni
Copy link

pabeni commented Nov 29, 2023

@cpaasch: unless you observed more instances of this one I suggest to close it

@cpaasch
Copy link
Member Author

cpaasch commented Nov 29, 2023

Yes! Hasn't happened since > a month.

@cpaasch cpaasch closed this as completed Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants