Latest devel build update +reboot crashed host #18

sathnaga · 2017-10-25T02:59:28Z

cde:info Mirrored with LTC bug https://bugzilla.linux.ibm.com/show_bug.cgi?id=160569 </cde:info>

Action: yum update + reboot
https://ltc-jenkins.aus.stglabs.ibm.com/job/HostOS_CI/842/consoleText

         Stopping Replay Read-Ahead Data...
[  OK  ] Reached target Shutdown.
[119099.239708] Unable to handle kernel paging request for data at address 0x00000010
[119099.239794] Faulting instruction address: 0xd00000000730064c
cpu 0x0: Vector: 300 (Data Access) at [c0000007f86077d0]
    pc: d00000000730064c: bm_evict_inode+0x2c/0x80 [binfmt_misc]
    lr: c00000000039003c: evict+0xfc/0x260
    sp: c0000007f8607a50
   msr: 900000010280b033
   dar: 10
 dsisr: 40000000
  current = 0xc0000007f8580080
  paca    = 0xc00000000fd60000   softe: 0        irq_happened: 0x01
    pid   = 1, comm = systemd
Linux version 4.14.0-1.rc4.dev.gitb27fc5c.el7.centos.ppc64le (mockbuild@host-os-jenkins-slave03.aus.stglabs.ibm.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-17) (GCC)) #1 SMP Fri Oct 20 22:55:44 -02 2017
enter ? for help
[c0000007f8607a80] c00000000039003c evict+0xfc/0x260
[c0000007f8607ac0] c000000000389258 dentry_unlink_inode+0x148/0x1c0
[c0000007f8607af0] c00000000038ad58 __dentry_kill+0xe8/0x2a0
[c0000007f8607b30] c00000000038b634 shrink_dentry_list+0x1e4/0x4e0
[c0000007f8607ba0] c00000000038bb84 shrink_dcache_parent+0x54/0xb0
[c0000007f8607c00] c00000000038bc08 do_one_tree+0x28/0x60
[c0000007f8607c30] c00000000038ce4c shrink_dcache_for_umount+0x4c/0xc0
[c0000007f8607ca0] c00000000036a92c generic_shutdown_super+0x3c/0x190
[c0000007f8607d10] c00000000036af08 kill_litter_super+0x48/0x70
[c0000007f8607d40] c00000000036b45c deactivate_locked_super+0xac/0xf0
[c0000007f8607d70] c000000000397f94 cleanup_mnt+0x64/0xb0
[c0000007f8607da0] c0000000001287c0 task_work_run+0x140/0x1a0
[c0000007f8607e00] c00000000001ca70 do_notify_resume+0xf0/0x100
[c0000007f8607e30] c00000000000bec4 ret_from_except_lite+0x70/0x74
--- Exception: c00 (System Call) at 00007fff8c6a50a8
SP (7fffee70e770) is in userspace

The text was updated successfully, but these errors were encountered:

sathnaga · 2017-10-30T09:00:32Z

Tried to recreate with multiple reboots, unable to hit the issue
Tried running host stress and fs tests, hit with a different host crash bug, reported @hit with host crash while running host stress tests #20

cdeadmin · 2017-10-30T09:05:30Z

While trying to reproduce with host stress, I hit with the below host crash during xfs stress tests

enter ? for help
[link register   ] c0000000002543f0 irq_work_run+0x30/0x50
[c000000ffff53cc0] c000000ffff53cf0 (unreliable)
[c000000ffff53cf0] c0000000001b7ca0 flush_smp_call_function_queue+0xf0/0x200
[c000000ffff53d70] c0000000000477ec smp_ipi_demux_relaxed+0x9c/0x110
[c000000ffff53db0] c0000000000903d4 icp_native_ipi_action+0x64/0x80
[c000000ffff53dd0] c000000000179420 __handle_irq_event_percpu+0x90/0x2d0
[c000000ffff53e90] c000000000179698 handle_irq_event_percpu+0x38/0x90
[c000000ffff53ed0] c00000000017fcf4 handle_percpu_irq+0x84/0xd0
[c000000ffff53f00] c000000000177b7c generic_handle_irq+0x4c/0x80
[c000000ffff53f20] c0000000000165d4 __do_irq+0x94/0x200
[c000000ffff53f90] c000000000029fa4 call_do_irq+0x14/0x24
[c0000007f87f3a50] c0000000000167dc do_IRQ+0x9c/0x110
[c0000007f87f3aa0] c000000000008c58 hardware_interrupt_common+0x158/0x160
--- Exception: 501 (Hardware Interrupt) at c0000000008eb664 snooze_loop+0xa4/0x190
[c0000007f87f3d90] c0000007f87f3dc0 (unreliable)
[c0000007f87f3dd0] c0000000008e83a4 cpuidle_enter_state+0xc4/0x3d0
[c0000007f87f3e30] c00000000015f73c call_cpuidle+0x4c/0x80
[c0000007f87f3e50] c00000000015fbe0 do_idle+0x2b0/0x350
[c0000007f87f3ec0] c00000000015fe8c cpu_startup_entry+0x3c/0x50
[c0000007f87f3ef0] c000000000048a74 start_secondary+0x4e4/0x530
[c0000007f87f3f90] c00000000000b16c start_secondary_prolog+0x10/0x14
b:mon&gt;

jenkins_job_log.txt
looks like this patch , https://www.spinics.net/lists/linux-fsdevel/msg117031.html fixes this issue

commit e39d200 upstream. Reported by syzkaller: BUG: KASAN: stack-out-of-bounds in write_mmio+0x11e/0x270 [kvm] Read of size 8 at addr ffff8803259df7f8 by task syz-executor/32298 CPU: 6 PID: 32298 Comm: syz-executor Tainted: G OE 4.15.0-rc2+ open-power-host-os#18 Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016 Call Trace: dump_stack+0xab/0xe1 print_address_description+0x6b/0x290 kasan_report+0x28a/0x370 write_mmio+0x11e/0x270 [kvm] emulator_read_write_onepage+0x311/0x600 [kvm] emulator_read_write+0xef/0x240 [kvm] emulator_fix_hypercall+0x105/0x150 [kvm] em_hypercall+0x2b/0x80 [kvm] x86_emulate_insn+0x2b1/0x1640 [kvm] x86_emulate_instruction+0x39a/0xb90 [kvm] handle_exception+0x1b4/0x4d0 [kvm_intel] vcpu_enter_guest+0x15a0/0x2640 [kvm] kvm_arch_vcpu_ioctl_run+0x549/0x7d0 [kvm] kvm_vcpu_ioctl+0x479/0x880 [kvm] do_vfs_ioctl+0x142/0x9a0 SyS_ioctl+0x74/0x80 entry_SYSCALL_64_fastpath+0x23/0x9a The path of patched vmmcall will patch 3 bytes opcode 0F 01 C1(vmcall) to the guest memory, however, write_mmio tracepoint always prints 8 bytes through *(u64 *)val since kvm splits the mmio access into 8 bytes. This leaks 5 bytes from the kernel stack (CVE-2017-17741). This patch fixes it by just accessing the bytes which we operate on. Before patch: syz-executor-5567 [007] .... 51370.561696: kvm_mmio: mmio write len 3 gpa 0x10 val 0x1ffff10077c1010f After patch: syz-executor-13416 [002] .... 51302.299573: kvm_mmio: mmio write len 3 gpa 0x10 val 0xc1010f Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 373c83a ] Using built-in in kernel image without a firmware in filesystem or in the kernel image can lead to a kernel NULL pointer deference. Watchdog need to be stopped in brcmf_sdio_remove The system is going down NOW! [ 1348.110759] Unable to handle kernel NULL pointer dereference at virtual address 000002f8 Sent SIGTERM to all processes [ 1348.121412] Mem abort info: [ 1348.126962] ESR = 0x96000004 [ 1348.130023] Exception class = DABT (current EL), IL = 32 bits [ 1348.135948] SET = 0, FnV = 0 [ 1348.138997] EA = 0, S1PTW = 0 [ 1348.142154] Data abort info: [ 1348.145045] ISV = 0, ISS = 0x00000004 [ 1348.148884] CM = 0, WnR = 0 [ 1348.151861] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____) [ 1348.158475] [00000000000002f8] pgd=0000000000000000 [ 1348.163364] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 1348.168927] Modules linked in: ipv6 [ 1348.172421] CPU: 3 PID: 1421 Comm: brcmf_wdog/mmc0 Not tainted 4.17.0-rc5-next-20180517 #18 [ 1348.180757] Hardware name: Amarula A64-Relic (DT) [ 1348.185455] pstate: 60000005 (nZCv daif -PAN -UAO) [ 1348.190251] pc : brcmf_sdiod_freezer_count+0x0/0x20 [ 1348.195124] lr : brcmf_sdio_watchdog_thread+0x64/0x290 [ 1348.200253] sp : ffff00000b85be30 [ 1348.203561] x29: ffff00000b85be30 x28: 0000000000000000 [ 1348.208868] x27: ffff00000b6cb918 x26: ffff80003b990638 [ 1348.214176] x25: ffff0000087b1a20 x24: ffff80003b94f800 [ 1348.219483] x23: ffff000008e620c8 x22: ffff000008f0b660 [ 1348.224790] x21: ffff000008c6a858 x20: 00000000fffffe00 [ 1348.230097] x19: ffff80003b94f800 x18: 0000000000000001 [ 1348.235404] x17: 0000ffffab2e8a74 x16: ffff0000080d7de8 [ 1348.240711] x15: 0000000000000000 x14: 0000000000000400 [ 1348.246018] x13: 0000000000000400 x12: 0000000000000001 [ 1348.251324] x11: 00000000000002c4 x10: 0000000000000a10 [ 1348.256631] x9 : ffff00000b85bc40 x8 : ffff80003be11870 [ 1348.261937] x7 : ffff80003dfc7308 x6 : 000000078ff08b55 [ 1348.267243] x5 : 00000139e1058400 x4 : 0000000000000000 [ 1348.272550] x3 : dead000000000100 x2 : 958f2788d6618100 [ 1348.277856] x1 : 00000000fffffe00 x0 : 0000000000000000 Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com> Tested-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cdeadmin closed this as completed Oct 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latest devel build update +reboot crashed host #18

Latest devel build update +reboot crashed host #18

sathnaga commented Oct 25, 2017 •

edited by cdeadmin

Loading

sathnaga commented Oct 30, 2017

cdeadmin commented Oct 30, 2017

Latest devel build update +reboot crashed host #18

Latest devel build update +reboot crashed host #18

Comments

sathnaga commented Oct 25, 2017 • edited by cdeadmin Loading

sathnaga commented Oct 30, 2017

cdeadmin commented Oct 30, 2017

sathnaga commented Oct 25, 2017 •

edited by cdeadmin

Loading