Power8: Hit with "Oops: Kernel access of bad area, sig: 11" on latest nightly #14

sathnaga · 2017-08-31T09:06:29Z

Kernel Version: 4.13.0-3.rc3.dev.gitec0d270.el7.centos.ppc64le
Hit few mins after a fresh boot, tried to run avocado tests(just started).
Most of(sosreport, service restart, etc) command gets stuck after the crash.

[  909.585268] list_del corruption. prev->next should be c000000f23120760, but was c000000f23121760
[  909.585448] ------------[ cut here ]------------
[  909.585547] WARNING: CPU: 64 PID: 14123 at lib/list_debug.c:53 __list_del_entry_valid+0xd0/0x100
[  909.585705] Modules linked in: vhost_net vhost tap act_police cls_u32 sch_ingress cls_fw sch_sfq sch_htb xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables ses enclosure scsi_transport_sas i2c_opal i2c_core powernv_op_panel ipmi_powernv ipmi_devintf ipmi_msghandler nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc kvm_hv kvm_pr kvm xfs libcrc32c tg3 ptp pps_core
[  909.586812] CPU: 64 PID: 14123 Comm: qemu-system-ppc Not tainted 4.13.0-3.rc3.dev.gitec0d270.el7.centos.ppc64le #1
[  909.586963] task: c000000f0c9cc600 task.stack: c000000f061a8000
[  909.587026] NIP: c0000000005a0770 LR: c0000000005a076c CTR: 00000000300304d0
[  909.587100] REGS: c000000f061ab6c0 TRAP: 0700   Not tainted  (4.13.0-3.rc3.dev.gitec0d270.el7.centos.ppc64le)
[  909.587197] MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>
[  909.587205]   CR: 42024422  XER: 20000000
[  909.587291] CFAR: c00000000016e9c8 SOFTE: 1 
[  909.587291] GPR00: c0000000005a076c c000000f061ab940 c000000001397a00 0000000000000054 
[  909.587291] GPR04: 0000000000000000 c000000000098244 9000000000009033 0000000000000000 
[  909.587291] GPR08: 0000000000000001 0000000000000007 0000000000000006 9000000000001003 
[  909.587291] GPR12: 0000000000004400 c00000000fda8000 0000000000000000 0000000000000000 
[  909.587291] GPR16: 0000000000000000 0000000124cb8058 0000000124cb8038 00000001250ed8b8 
[  909.587291] GPR20: 00000001250ed8b0 00000001250ed8d0 c00000000138d820 c000000000d9c238 
[  909.587291] GPR24: 0000000000000001 5deadbeef0000100 c000000f061abb80 c000000000f24840 
[  909.587291] GPR28: c0000000013cbe50 0000000000000001 c000000f231215e0 c000000f23120750 
[  909.587927] NIP [c0000000005a0770] __list_del_entry_valid+0xd0/0x100
[  909.587990] LR [c0000000005a076c] __list_del_entry_valid+0xcc/0x100
[  909.588052] Call Trace:
[  909.588079] [c000000f061ab940] [c0000000005a076c] __list_del_entry_valid+0xcc/0x100 (unreliable)
[  909.588167] [c000000f061ab9a0] [c000000000988bbc] tcf_chain_destroy+0x2c/0xa0
[  909.588243] [c000000f061ab9d0] [c000000000988c84] tcf_block_put+0x54/0x90
[  909.588308] [c000000f061aba00] [d000000014d3178c] htb_destroy_class.isra.11+0x5c/0x80 [sch_htb]
[  909.588401] [c000000f061aba30] [d000000014d318a8] htb_destroy+0xf8/0x1b0 [sch_htb]
[  909.588476] [c000000f061abab0] [c0000000009818a4] qdisc_destroy+0xe4/0x170
[  909.588539] [c000000f061abae0] [c00000000098332c] dev_shutdown+0xbc/0x100
[  909.588604] [c000000f061abb20] [c00000000093f248] rollback_registered_many+0x2f8/0x560
[  909.588679] [c000000f061abbf0] [c00000000093f520] rollback_registered+0x70/0xb0
[  909.588755] [c000000f061abc40] [c000000000941908] unregister_netdevice_queue+0x128/0x180
[  909.588832] [c000000f061abcc0] [c00000000077a6cc] __tun_detach+0x22c/0x460
[  909.588895] [c000000f061abd20] [c00000000077a938] tun_chr_close+0x38/0x60
[  909.588959] [c000000f061abd50] [c00000000035abf8] __fput+0xd8/0x280
[  909.589024] [c000000f061abdb0] [c000000000120f20] task_work_run+0x140/0x1a0
[  909.589089] [c000000f061abe00] [c00000000001d810] do_notify_resume+0xf0/0x100
[  909.589164] [c000000f061abe30] [c00000000000bf44] ret_from_except_lite+0x70/0x74
[  909.589238] Instruction dump:
[  909.589295] 4bffffd4 3c62ff9b 3863f6d0 4bbce235 60000000 0fe00000 38600000 4bffffb8 
[  909.589435] 3c62ff9b 3863f690 4bbce219 60000000 <0fe00000> 38600000 4bffff9c 3c62ff9b 
[  909.589577] ---[ end trace c2b424e83e247e4b ]---
[  909.589685] Unable to handle kernel paging request for data at address 0x00000000
[  909.589823] Faulting instruction address: 0xc000000000988b48
[  909.589939] Oops: Kernel access of bad area, sig: 11 [#1]
[  909.590030] SMP NR_CPUS=1024 
[  909.590030] NUMA 
[  909.590101] PowerNV
[  909.590197] Modules linked in: vhost_net vhost tap act_police cls_u32 sch_ingress cls_fw sch_sfq sch_htb xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables ses enclosure scsi_transport_sas i2c_opal i2c_core powernv_op_panel ipmi_powernv ipmi_devintf ipmi_msghandler nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc kvm_hv kvm_pr kvm xfs libcrc32c tg3 ptp pps_core
[  909.591279] CPU: 64 PID: 14123 Comm: qemu-system-ppc Tainted: G        W       4.13.0-3.rc3.dev.gitec0d270.el7.centos.ppc64le #1
[  909.591481] task: c000000f0c9cc600 task.stack: c000000f061a8000
[  909.591596] NIP: c000000000988b48 LR: c000000000988c04 CTR: 00000000300304d0
[  909.591733] REGS: c000000f061ab6f0 TRAP: 0300   Tainted: G        W        (4.13.0-3.rc3.dev.gitec0d270.el7.centos.ppc64le)
[  909.591913] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[  909.591919]   CR: 42024422  XER: 20000000
[  909.592080] CFAR: c0000000000087d8 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 1 
[  909.592080] GPR00: c000000000988c04 c000000f061ab970 c000000001397a00 c000000f23120750 
[  909.592080] GPR04: 0000000000000000 c000000000098244 9000000000009033 0000000000000000 
[  909.592080] GPR08: 0000000000000001 0000000000000000 5deadbeef0000100 9000000000001003 
[  909.592080] GPR12: 0000000000004400 c00000000fda8000 0000000000000000 0000000000000000 
[  909.592080] GPR16: 0000000000000000 0000000124cb8058 0000000124cb8038 00000001250ed8b8 
[  909.592080] GPR20: 00000001250ed8b0 00000001250ed8d0 c00000000138d820 c000000000d9c238 
[  909.592080] GPR24: 0000000000000001 5deadbeef0000100 c000000f061abb80 c000000000f24840 
[  909.592080] GPR28: c0000000013cbe50 0000000000000001 c000000f231215e0 c000000f23120750 
[  909.593263] NIP [c000000000988b48] tcf_chain_flush+0x28/0x70
[  909.593377] LR [c000000000988c04] tcf_chain_destroy+0x74/0xa0
[  909.593491] Call Trace:
[  909.593540] [c000000f061ab970] [0000000000000001] 0x1 (unreliable)
[  909.593654] [c000000f061ab9a0] [c000000000988c04] tcf_chain_destroy+0x74/0xa0
[  909.593783] [c000000f061ab9d0] [c000000000988c84] tcf_block_put+0x54/0x90
[  909.593847] [c000000f061aba00] [d000000014d3178c] htb_destroy_class.isra.11+0x5c/0x80 [sch_htb]
[  909.593935] [c000000f061aba30] [d000000014d318a8] htb_destroy+0xf8/0x1b0 [sch_htb]
[  909.594013] [c000000f061abab0] [c0000000009818a4] qdisc_destroy+0xe4/0x170
[  909.594076] [c000000f061abae0] [c00000000098332c] dev_shutdown+0xbc/0x100
[  909.594140] [c000000f061abb20] [c00000000093f248] rollback_registered_many+0x2f8/0x560
[  909.594217] [c000000f061abbf0] [c00000000093f520] rollback_registered+0x70/0xb0
[  909.594292] [c000000f061abc40] [c000000000941908] unregister_netdevice_queue+0x128/0x180
[  909.594369] [c000000f061abcc0] [c00000000077a6cc] __tun_detach+0x22c/0x460
[  909.594433] [c000000f061abd20] [c00000000077a938] tun_chr_close+0x38/0x60
[  909.594496] [c000000f061abd50] [c00000000035abf8] __fput+0xd8/0x280
[  909.594563] [c000000f061abdb0] [c000000000120f20] task_work_run+0x140/0x1a0
[  909.594628] [c000000f061abe00] [c00000000001d810] do_notify_resume+0xf0/0x100
[  909.594704] [c000000f061abe30] [c00000000000bf44] ret_from_except_lite+0x70/0x74
[  909.594778] Instruction dump:
[  909.594816] 7c0803a6 4e800020 3c4c00a1 3842eee0 7c0802a6 60000000 7c0802a6 fbe1fff8 
[  909.594895] f8010010 f821ffd1 7c7f1b78 e9230008 <e9490000> 2faa0000 419e001c 39400000 
[  909.594975] ---[ end trace c2b424e83e247e4c ]---
[  909.601138]

cde:info Mirrored with LTC bug #158177 </cde:info>

The text was updated successfully, but these errors were encountered:

cdeadmin · 2017-08-31T13:35:31Z

------- Comment From viparash@in.ibm.com 2017-08-31 09:26:40 EDT-------
(In reply to comment #1)

I see two issues here

Issue 1

>
Subsequently it crashes further in tcf_chain_flush() due to hitting to segmentation fault.

cdeadmin · 2017-09-26T10:45:32Z

------- Comment From satheera@in.ibm.com 2017-09-26 06:44:04 EDT-------
Am not hitting an issue with latest nightly devel 4.13.0-4.dev.git49564cb.el7.centos.ppc64le

------- Comment From satheera@in.ibm.com 2017-09-26 06:45:15 EDT-------
Closing as per previous comment

... before the first use of kaiser_enabled as otherwise funky things happen: about to get started... (XEN) d0v0 Unhandled page fault fault/trap [open-power-host-os#14, ec=0000] (XEN) Pagetable walk from ffff88022a449090: (XEN) L4[0x110] = 0000000229e0e067 0000000000001e0e (XEN) L3[0x008] = 0000000000000000 ffffffffffffffff (XEN) domain_crash_sync called from entry.S: fault at ffff82d08033fd08 entry.o#create_bounce_frame+0x135/0x14d (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-4.9.1_02-3.21 x86_64 debug=n Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff81007460>] (XEN) RFLAGS: 0000000000000286 EM: 1 CONTEXT: pv guest (d0v0) Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

for_each_set_bit() only accepts variable of type unsigned long, and we can not cast it from smaller types. [ 16.499365] ================================================================== [ 16.506655] BUG: KASAN: stack-out-of-bounds in find_first_bit+0x1d/0x70 [ 16.513313] Read of size 8 at addr ffff8803616cf510 by task systemd-udevd/180 [ 16.521998] CPU: 0 PID: 180 Comm: systemd-udevd Tainted: G U O 4.15.0-rc3+ #14 [ 16.530317] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.2.8 01/26/2016 [ 16.537760] Call Trace: [ 16.540230] dump_stack+0x7c/0xbb [ 16.543569] print_address_description+0x6b/0x290 [ 16.548306] kasan_report+0x28a/0x370 [ 16.551993] ? find_first_bit+0x1d/0x70 [ 16.555858] find_first_bit+0x1d/0x70 [ 16.559625] intel_gvt_init_cmd_parser+0x127/0x3c0 [i915] [ 16.565060] ? __lock_is_held+0x8f/0xf0 [ 16.568990] ? intel_gvt_clean_cmd_parser+0x10/0x10 [i915] [ 16.574514] ? __hrtimer_init+0x5d/0xb0 [ 16.578445] intel_gvt_init_device+0x2c3/0x690 [i915] [ 16.583537] ? unregister_module_notifier+0x20/0x20 [ 16.588515] intel_gvt_init+0x89/0x100 [i915] [ 16.592962] i915_driver_load+0x1992/0x1c70 [i915] [ 16.597846] ? __i915_printk+0x210/0x210 [i915] [ 16.602410] ? wait_for_completion+0x280/0x280 [ 16.606883] ? lock_downgrade+0x2c0/0x2c0 [ 16.610923] ? __pm_runtime_resume+0x46/0x90 [ 16.615238] ? acpi_dev_found+0x76/0x80 [ 16.619162] ? i915_pci_remove+0x30/0x30 [i915] [ 16.623733] local_pci_probe+0x74/0xe0 [ 16.627518] pci_device_probe+0x208/0x310 [ 16.631561] ? pci_device_remove+0x100/0x100 [ 16.635871] ? __list_add_valid+0x29/0xa0 [ 16.639919] driver_probe_device+0x40b/0x6b0 [ 16.644223] ? driver_probe_device+0x6b0/0x6b0 [ 16.648696] __driver_attach+0x11d/0x130 [ 16.652649] bus_for_each_dev+0xe7/0x160 [ 16.656600] ? subsys_dev_iter_exit+0x10/0x10 [ 16.660987] ? __list_add_valid+0x29/0xa0 [ 16.665028] bus_add_driver+0x31d/0x3a0 [ 16.668893] driver_register+0xc6/0x170 [ 16.672758] ? 0xffffffffc0ad8000 [ 16.676108] do_one_initcall+0x9c/0x206 [ 16.679984] ? initcall_blacklisted+0x150/0x150 [ 16.684545] ? do_init_module+0x35/0x33b [ 16.688494] ? kasan_unpoison_shadow+0x31/0x40 [ 16.692968] ? kasan_kmalloc+0xa6/0xd0 [ 16.696743] ? do_init_module+0x35/0x33b [ 16.700694] ? kasan_unpoison_shadow+0x31/0x40 [ 16.705168] ? __asan_register_globals+0x82/0xa0 [ 16.709819] do_init_module+0xe7/0x33b [ 16.713597] load_module+0x4481/0x4ce0 [ 16.717397] ? module_frob_arch_sections+0x20/0x20 [ 16.722228] ? vfs_read+0x13b/0x190 [ 16.725742] ? kernel_read+0x74/0xa0 [ 16.729351] ? get_user_arg_ptr.isra.17+0x70/0x70 [ 16.734099] ? SYSC_finit_module+0x175/0x1b0 [ 16.738399] SYSC_finit_module+0x175/0x1b0 [ 16.742524] ? SYSC_init_module+0x1e0/0x1e0 [ 16.746741] ? __fget+0x157/0x240 [ 16.750090] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 16.754747] entry_SYSCALL_64_fastpath+0x23/0x9a [ 16.759397] RIP: 0033:0x7f8fbc837499 [ 16.762996] RSP: 002b:00007ffead76c138 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 16.770618] RAX: ffffffffffffffda RBX: 0000000000000012 RCX: 00007f8fbc837499 [ 16.777800] RDX: 0000000000000000 RSI: 000056484e67b080 RDI: 0000000000000012 [ 16.784979] RBP: 00007ffead76b140 R08: 0000000000000000 R09: 0000000000000021 [ 16.792164] R10: 0000000000000012 R11: 0000000000000246 R12: 000056484e67b460 [ 16.799345] R13: 00007ffead76b120 R14: 0000000000000005 R15: 0000000000000000 [ 16.808052] The buggy address belongs to the page: [ 16.812876] page:00000000dc4b8c1e count:0 mapcount:0 mapping: (null) index:0x0 [ 16.820934] flags: 0x17ffffc0000000() [ 16.824621] raw: 0017ffffc0000000 0000000000000000 0000000000000000 00000000ffffffff [ 16.832416] raw: ffffea000d85b3e0 ffffea000d85b3e0 0000000000000000 0000000000000000 [ 16.840208] page dumped because: kasan: bad access detected [ 16.847318] Memory state around the buggy address: [ 16.852143] ffff8803616cf400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 16.859427] ffff8803616cf480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 [ 16.866708] >ffff8803616cf500: f1 f1 04 f4 f4 f4 f3 f3 f3 f3 00 00 00 00 00 00 [ 16.873988] ^ [ 16.877770] ffff8803616cf580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 16.885042] ffff8803616cf600: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 [ 16.892312] ================================================================== Signed-off-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

When booting kernel with LOCKDEP option, below warning info was found: WARNING: possible recursive locking detected 4.19.0-rc7+ #14 Not tainted -------------------------------------------- swapper/0/1 is trying to acquire lock: 00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline] 00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at: tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850 but task is already holding lock: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline] 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&list->lock)->rlock#4); lock(&(&list->lock)->rlock#4); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by swapper/0/1: #0: 00000000f7539d34 (pernet_ops_rwsem){+.+.}, at: register_pernet_subsys+0x19/0x40 net/core/net_namespace.c:1051 #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline] #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849 stack backtrace: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc7+ #14 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1af/0x295 lib/dump_stack.c:113 print_deadlock_bug kernel/locking/lockdep.c:1759 [inline] check_deadlock kernel/locking/lockdep.c:1803 [inline] validate_chain kernel/locking/lockdep.c:2399 [inline] __lock_acquire+0xf1e/0x3c60 kernel/locking/lockdep.c:3411 lock_acquire+0x1db/0x520 kernel/locking/lockdep.c:3900 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline] _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168 spin_lock_bh include/linux/spinlock.h:334 [inline] tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850 tipc_link_bc_create+0xb5/0x1f0 net/tipc/link.c:526 tipc_bcast_init+0x59b/0xab0 net/tipc/bcast.c:521 tipc_init_net+0x472/0x610 net/tipc/core.c:82 ops_init+0xf7/0x520 net/core/net_namespace.c:129 __register_pernet_operations net/core/net_namespace.c:940 [inline] register_pernet_operations+0x453/0xac0 net/core/net_namespace.c:1011 register_pernet_subsys+0x28/0x40 net/core/net_namespace.c:1052 tipc_init+0x83/0x104 net/tipc/core.c:140 do_one_initcall+0x109/0x70a init/main.c:885 do_initcall_level init/main.c:953 [inline] do_initcalls init/main.c:961 [inline] do_basic_setup init/main.c:979 [inline] kernel_init_freeable+0x4bd/0x57f init/main.c:1144 kernel_init+0x13/0x180 init/main.c:1063 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413 The reason why the noise above was complained by LOCKDEP is because we nested to hold l->wakeupq.lock and l->inputq->lock in tipc_link_reset function. In fact it's unnecessary to move skb buffer from l->wakeupq queue to l->inputq queue while holding the two locks at the same time. Instead, we can move skb buffers in l->wakeupq queue to a temporary list first and then move the buffers of the temporary list to l->inputq queue, which is also safe for us. Fixes: 3f32d0b ("tipc: lock wakeup & inputq at tipc_link_reset()") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>

cdeadmin closed this as completed Sep 26, 2017

cdeadmin mentioned this issue Feb 23, 2018

hard lockup detected when vm starts with ftrace on #22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Power8: Hit with "Oops: Kernel access of bad area, sig: 11" on latest nightly #14

Power8: Hit with "Oops: Kernel access of bad area, sig: 11" on latest nightly #14

sathnaga commented Aug 31, 2017 •

edited by cdeadmin

Loading

cdeadmin commented Aug 31, 2017

cdeadmin commented Sep 26, 2017

Power8: Hit with "Oops: Kernel access of bad area, sig: 11" on latest nightly #14

Power8: Hit with "Oops: Kernel access of bad area, sig: 11" on latest nightly #14

Comments

sathnaga commented Aug 31, 2017 • edited by cdeadmin Loading

cdeadmin commented Aug 31, 2017

Issue 1

cdeadmin commented Sep 26, 2017

sathnaga commented Aug 31, 2017 •

edited by cdeadmin

Loading