-
Notifications
You must be signed in to change notification settings - Fork 58.5k
Updated README for MarkDown (This is not a final version but a draft). #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someone necessarily lend a hand to make critically posts I might state. That is the first time I frequented your website page and thus far? I amazed with the analysis you made to make this actual put up extraordinary. Fantastic task!
Comprar Nike Air Max Baratas http://sofos.scsalud.es/fondosDoc/Farmacia/AIR/comprar-nike-air-max-baratas.cfm
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Printing the "start_ip" for every secondary cpu is very noisy on a large
system - and doesn't add any value. Drop this message.
Console log before:
Booting Node 0, Processors #1
smpboot cpu 1: start_ip = 96000
#2
smpboot cpu 2: start_ip = 96000
#3
smpboot cpu 3: start_ip = 96000
#4
smpboot cpu 4: start_ip = 96000
...
torvalds#31
smpboot cpu 31: start_ip = 96000
Brought up 32 CPUs
Console log after:
Booting Node 0, Processors #1 #2 #3 #4 #5 torvalds#6 torvalds#7 Ok.
Booting Node 1, Processors torvalds#8 torvalds#9 torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 Ok.
Booting Node 0, Processors torvalds#16 torvalds#17 torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 Ok.
Booting Node 1, Processors torvalds#24 torvalds#25 torvalds#26 torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31
Brought up 32 CPUs
Acked-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4f452eb42507460426@agluck-desktop.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Add a parameter to avoid using MSI/MSI-X for PCIe native hotplug; it's known to be buggy on some platforms. In my environment, while shutting down, following stack trace is shown sometimes. irq 16: nobody cared (try booting with the "irqpoll" option) Pid: 1081, comm: reboot Not tainted 3.2.0 #1 Call Trace: <IRQ> [<ffffffff810cec1d>] __report_bad_irq+0x3d/0xe0 [<ffffffff810cee1c>] note_interrupt+0x15c/0x210 [<ffffffff810cc485>] handle_irq_event_percpu+0xb5/0x210 [<ffffffff810cc621>] handle_irq_event+0x41/0x70 [<ffffffff810cf675>] handle_fasteoi_irq+0x55/0xc0 [<ffffffff81015356>] handle_irq+0x46/0xb0 [<ffffffff814fbe9d>] do_IRQ+0x5d/0xe0 [<ffffffff814f146e>] common_interrupt+0x6e/0x6e [<ffffffff8106b040>] ? __do_softirq+0x60/0x210 [<ffffffff8108aeb1>] ? hrtimer_interrupt+0x151/0x240 [<ffffffff814fb5ec>] call_softirq+0x1c/0x30 [<ffffffff810152d5>] do_softirq+0x65/0xa0 [<ffffffff8106ae9d>] irq_exit+0xbd/0xe0 [<ffffffff814fbf8e>] smp_apic_timer_interrupt+0x6e/0x99 [<ffffffff814f9e5e>] apic_timer_interrupt+0x6e/0x80 <EOI> [<ffffffff814f0fb1>] ? _raw_spin_unlock_irqrestore+0x11/0x20 [<ffffffff812629fc>] pci_bus_write_config_word+0x6c/0x80 [<ffffffff81266fc2>] pci_intx+0x52/0xa0 [<ffffffff8127de3d>] pci_intx_for_msi+0x1d/0x30 [<ffffffff8127e4fb>] pci_msi_shutdown+0x7b/0x110 [<ffffffff81269d34>] pci_device_shutdown+0x34/0x50 [<ffffffff81326c4f>] device_shutdown+0x2f/0x140 [<ffffffff8107b981>] kernel_restart_prepare+0x31/0x40 [<ffffffff8107b9e6>] kernel_restart+0x16/0x60 [<ffffffff8107bbfd>] sys_reboot+0x1ad/0x220 [<ffffffff814f4b90>] ? do_page_fault+0x1e0/0x460 [<ffffffff811942d0>] ? __sync_filesystem+0x90/0x90 [<ffffffff8105c9aa>] ? __cond_resched+0x2a/0x40 [<ffffffff814ef090>] ? _cond_resched+0x30/0x40 [<ffffffff81169e17>] ? iterate_supers+0xb7/0xd0 [<ffffffff814f9382>] system_call_fastpath+0x16/0x1b handlers: [<ffffffff8138a0f0>] usb_hcd_irq [<ffffffff8138a0f0>] usb_hcd_irq [<ffffffff8138a0f0>] usb_hcd_irq Disabling IRQ torvalds#16 An un-wanted interrupt is generated when PCI driver switches from MSI/MSI-X to INTx while shutting down the device. The interrupt does not happen if MSI/MSI-X is not used on the device. I confirmed that this problem does not happen if pcie_hp=nomsi was specified and hotplug operation worked fine as usual. v2: Automatically disable MSI/MSI-X against following device: PCI bridge: Integrated Device Technology, Inc. Device 807f (rev 02) v3: Based on the review comment, combile the if statements. v4: Removed module parameter. Move some code to build pciehp as a module. Move device specific code to driver/pci/quirks.c. v5: Drop a device specific code until getting a vendor statement. Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: MUNEDA Takahiro <muneda.takahiro@jp.fujitsu.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Vivek reported a kernel crash: [ 94.217015] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c [ 94.218004] IP: [<ffffffff81142fae>] kmem_cache_free+0x5e/0x200 [ 94.218004] PGD 13abda067 PUD 137d52067 PMD 0 [ 94.218004] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [ 94.218004] CPU 0 [ 94.218004] Modules linked in: [last unloaded: scsi_wait_scan] [ 94.218004] [ 94.218004] Pid: 0, comm: swapper/0 Not tainted 3.2.0+ torvalds#16 Hewlett-Packard HP xw6600 Workstation/0A9Ch [ 94.218004] RIP: 0010:[<ffffffff81142fae>] [<ffffffff81142fae>] kmem_cache_free+0x5e/0x200 [ 94.218004] RSP: 0018:ffff88013fc03de0 EFLAGS: 00010006 [ 94.218004] RAX: ffffffff81e0d020 RBX: ffff880138b3c680 RCX: 00000001801c001b [ 94.218004] RDX: 00000000003aac1d RSI: ffff880138b3c680 RDI: ffffffff81142fae [ 94.218004] RBP: ffff88013fc03e10 R08: ffff880137830238 R09: 0000000000000001 [ 94.218004] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 94.218004] R13: ffffea0004e2cf00 R14: ffffffff812f6eb6 R15: 0000000000000246 [ 94.218004] FS: 0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000 [ 94.218004] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 94.218004] CR2: 000000000000001c CR3: 00000001395ab000 CR4: 00000000000006f0 [ 94.218004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 94.218004] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 94.218004] Process swapper/0 (pid: 0, threadinfo ffffffff81e00000, task ffffffff81e0d020) [ 94.218004] Stack: [ 94.218004] 0000000000000102 ffff88013fc0db20 ffffffff81e22700 ffff880139500f00 [ 94.218004] 0000000000000001 000000000000000a ffff88013fc03e20 ffffffff812f6eb6 [ 94.218004] ffff88013fc03e90 ffffffff810c8da2 ffffffff81e01fd8 ffff880137830240 [ 94.218004] Call Trace: [ 94.218004] <IRQ> [ 94.218004] [<ffffffff812f6eb6>] icq_free_icq_rcu+0x16/0x20 [ 94.218004] [<ffffffff810c8da2>] __rcu_process_callbacks+0x1c2/0x420 [ 94.218004] [<ffffffff810c9038>] rcu_process_callbacks+0x38/0x250 [ 94.218004] [<ffffffff810405ee>] __do_softirq+0xce/0x3e0 [ 94.218004] [<ffffffff8108ed04>] ? clockevents_program_event+0x74/0x100 [ 94.218004] [<ffffffff81090104>] ? tick_program_event+0x24/0x30 [ 94.218004] [<ffffffff8183ed1c>] call_softirq+0x1c/0x30 [ 94.218004] [<ffffffff8100422d>] do_softirq+0x8d/0xc0 [ 94.218004] [<ffffffff81040c3e>] irq_exit+0xae/0xe0 [ 94.218004] [<ffffffff8183f4be>] smp_apic_timer_interrupt+0x6e/0x99 [ 94.218004] [<ffffffff8183e330>] apic_timer_interrupt+0x70/0x80 Once a queue is quiesced, it's not supposed to have any elvpriv data or icq's, and elevator switching depends on that. Request alloc path followed the rule for elvpriv data but forgot apply it to icq's leading to the following crash during elevator switch. Fix it by not allocating icq's if ELVPRIV is not set for the request. Reported-by: Vivek Goyal <vgoyal@redhat.com> Tested-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
…S block during isolation for migration When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a torvalds#7 [d72d3d14] zone_watermark_ok at c02d26cb torvalds#8 [d72d3d2c] compact_zone at c030b8d torvalds#9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not update the real num tx queues. netdev_queue_update_kobjects() is already called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when upper layer driver, e.g., FCoE protocol stack is monitoring the netdev event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove extra queues allocated for FCoE, the associated txq sysfs kobjects are already removed, and trying to update the real num queues would cause something like below: ... PID: 25138 TASK: ffff88021e64c440 CPU: 3 COMMAND: "kworker/3:3" #0 [ffff88021f007760] machine_kexec at ffffffff810226d9 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d #2 [ffff88021f0078a0] oops_end at ffffffff813bca78 #3 [ffff88021f0078d0] no_context at ffffffff81029e72 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045 [exception RIP: sysfs_find_dirent+17] RIP: ffffffff81178611 RSP: ffff88021f007bc0 RFLAGS: 00010246 RAX: ffff88021e64c440 RBX: ffffffff8156cc63 RCX: 0000000000000004 RDX: ffffffff8156cc63 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88021f007be0 R8: 0000000000000004 R9: 0000000000000008 R10: ffffffff816fed00 R11: 0000000000000004 R12: 0000000000000000 R13: ffffffff8156cc63 R14: 0000000000000000 R15: ffff8802222a0000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27 torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9 torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38 torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe] torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe] torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe] torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q] torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe] torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe] torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513 torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6 torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4 Signed-off-by: Yi Zou <yi.zou@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a torvalds#7 [d72d3d14] zone_watermark_ok at c02d26cb torvalds#8 [d72d3d2c] compact_zone at c030b8d torvalds#9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…S block during isolation for migration BugLink: http://bugs.launchpad.net/bugs/931719 commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a torvalds#7 [d72d3d14] zone_watermark_ok at c02d26cb torvalds#8 [d72d3d2c] compact_zone at c030b8d torvalds#9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 #10 [d72d3db4] try_to_compact_pages at c030bc84 #11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 #12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 #13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 #14 [d72d3eb8] alloc_pages_vma at c030a845 #15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb #16 [d72d3f00] handle_mm_fault at c02f36c6 #17 [d72d3f30] do_page_fault at c05c70ed #18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
Nonsense. Markdown is for wikis, not for help files. |
|
It's also for READMEs. |
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
clang generates call to __msan_instrument_asm_store with 1 byte as size. Manually call kmsan helper to indicate correct amount of bytes written. If function fpu_vstl is called with argument 'index' > 0, it writes at least 2 bytes, but kmsan only marks first byte as written. This change fixes following kmsan reports: [ 36.563119] ===================================================== [ 36.563594] BUG: KMSAN: uninit-value in virtqueue_add+0x35c6/0x7c70 [ 36.563852] virtqueue_add+0x35c6/0x7c70 [ 36.564016] virtqueue_add_outbuf+0xa0/0xb0 [ 36.564266] start_xmit+0x288c/0x4a20 [ 36.564460] dev_hard_start_xmit+0x302/0x900 [ 36.564649] sch_direct_xmit+0x340/0xea0 [ 36.564894] __dev_queue_xmit+0x2e94/0x59b0 [ 36.565058] neigh_resolve_output+0x936/0xb40 [ 36.565278] __neigh_update+0x2f66/0x3a60 [ 36.565499] neigh_update+0x52/0x60 [ 36.565683] arp_process+0x1588/0x2de0 [ 36.565916] NF_HOOK+0x1da/0x240 [ 36.566087] arp_rcv+0x3e4/0x6e0 [ 36.566306] __netif_receive_skb_list_core+0x1374/0x15a0 [ 36.566527] netif_receive_skb_list_internal+0x1116/0x17d0 [ 36.566710] napi_complete_done+0x376/0x740 [ 36.566918] virtnet_poll+0x1bae/0x2910 [ 36.567130] __napi_poll+0xf4/0x830 [ 36.567294] net_rx_action+0x97c/0x1ed0 [ 36.567556] handle_softirqs+0x306/0xe10 [ 36.567731] irq_exit_rcu+0x14c/0x2e0 [ 36.567910] do_io_irq+0xd4/0x120 [ 36.568139] io_int_handler+0xc2/0xe8 [ 36.568299] arch_cpu_idle+0xb0/0xc0 [ 36.568540] arch_cpu_idle+0x76/0xc0 [ 36.568726] default_idle_call+0x40/0x70 [ 36.568953] do_idle+0x1d6/0x390 [ 36.569486] cpu_startup_entry+0x9a/0xb0 [ 36.569745] rest_init+0x1ea/0x290 [ 36.570029] start_kernel+0x95e/0xb90 [ 36.570348] startup_continue+0x2e/0x40 [ 36.570703] [ 36.570798] Uninit was created at: [ 36.571002] kmem_cache_alloc_node_noprof+0x9e8/0x10e0 [ 36.571261] kmalloc_reserve+0x12a/0x470 [ 36.571553] __alloc_skb+0x310/0x860 [ 36.571844] __ip_append_data+0x483e/0x6a30 [ 36.572170] ip_append_data+0x11c/0x1e0 [ 36.572477] raw_sendmsg+0x1c8c/0x2180 [ 36.572818] inet_sendmsg+0xe6/0x190 [ 36.573142] __sys_sendto+0x55e/0x8e0 [ 36.573392] __s390x_sys_socketcall+0x19ae/0x2ba0 [ 36.573571] __do_syscall+0x12e/0x240 [ 36.573823] system_call+0x6e/0x90 [ 36.573976] [ 36.574017] Byte 35 of 98 is uninitialized [ 36.574082] Memory access of size 98 starts at 0000000007aa0012 [ 36.574218] [ 36.574325] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G B N 6.17.0-dirty torvalds#16 NONE [ 36.574541] Tainted: [B]=BAD_PAGE, [N]=TEST [ 36.574617] Hardware name: IBM 3931 A01 703 (KVM/Linux) [ 36.574755] ===================================================== [ 63.532541] ===================================================== [ 63.533639] BUG: KMSAN: uninit-value in virtqueue_add+0x35c6/0x7c70 [ 63.533989] virtqueue_add+0x35c6/0x7c70 [ 63.534940] virtqueue_add_outbuf+0xa0/0xb0 [ 63.535861] start_xmit+0x288c/0x4a20 [ 63.536708] dev_hard_start_xmit+0x302/0x900 [ 63.537020] sch_direct_xmit+0x340/0xea0 [ 63.537997] __dev_queue_xmit+0x2e94/0x59b0 [ 63.538819] neigh_resolve_output+0x936/0xb40 [ 63.539793] ip_finish_output2+0x1ee2/0x2200 [ 63.540784] __ip_finish_output+0x272/0x7a0 [ 63.541765] ip_finish_output+0x4e/0x5e0 [ 63.542791] ip_output+0x166/0x410 [ 63.543771] ip_push_pending_frames+0x1a2/0x470 [ 63.544753] raw_sendmsg+0x1f06/0x2180 [ 63.545033] inet_sendmsg+0xe6/0x190 [ 63.546006] __sys_sendto+0x55e/0x8e0 [ 63.546859] __s390x_sys_socketcall+0x19ae/0x2ba0 [ 63.547730] __do_syscall+0x12e/0x240 [ 63.548019] system_call+0x6e/0x90 [ 63.548989] [ 63.549779] Uninit was created at: [ 63.550691] kmem_cache_alloc_node_noprof+0x9e8/0x10e0 [ 63.550975] kmalloc_reserve+0x12a/0x470 [ 63.551969] __alloc_skb+0x310/0x860 [ 63.552949] __ip_append_data+0x483e/0x6a30 [ 63.553902] ip_append_data+0x11c/0x1e0 [ 63.554912] raw_sendmsg+0x1c8c/0x2180 [ 63.556719] inet_sendmsg+0xe6/0x190 [ 63.557534] __sys_sendto+0x55e/0x8e0 [ 63.557875] __s390x_sys_socketcall+0x19ae/0x2ba0 [ 63.558869] __do_syscall+0x12e/0x240 [ 63.559832] system_call+0x6e/0x90 [ 63.560780] [ 63.560972] Byte 35 of 98 is uninitialized [ 63.561741] Memory access of size 98 starts at 0000000005704312 [ 63.561950] [ 63.562824] CPU: 3 UID: 0 PID: 192 Comm: ping Tainted: G B N 6.17.0-dirty torvalds#16 NONE [ 63.563868] Tainted: [B]=BAD_PAGE, [N]=TEST [ 63.564751] Hardware name: IBM 3931 A01 703 (KVM/Linux) [ 63.564986] ===================================================== Fixes: dcd3e1d ("s390/checksum: provide csum_partial_copy_nocheck()") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Aleksei Nikiforov <aleksei.nikiforov@linux.ibm.com>
With the kasan debug options enabled, the following errors are observed in the kernel log: [ 4.564453] ============================= [ 4.568951] [ BUG: Invalid wait context ] [ 4.573451] 6.6.71+ torvalds#16 Not tainted [ 4.577368] ----------------------------- [ 4.581864] swapper/0/1 is trying to lock: [ 4.586461] ffff000801ef2018 (syscon:125:(&syscon_config)->lock){....}-{3:3}, at: regmap_lock_spinlock+0x20/0x48 [ 4.597931] other info that might help us debug this: [ 4.603599] context-{5:5} [ 4.606539] 2 locks held by swapper/0/1: [ 4.610943] #0: ffff0008019000f8 (&dev->mutex){....}-{4:4}, at: __driver_attach+0x154/0x308 [ 4.620459] #1: ffff8000841911f8 (pci_lock){....}-{2:2}, at: pci_bus_read_config_dword+0xac/0x160 [ 4.630558] stack backtrace: [ 4.633793] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.6.71+ torvalds#16 [ 4.640634] Hardware name: Edelweiss TF307-MB-S-D/BM1BM1-D, BIOS 5.6 07/26/2022 [ 4.648836] Call trace: [ 4.651581] dump_backtrace+0xa4/0x130 [ 4.655796] show_stack+0x20/0x38 [ 4.659522] dump_stack_lvl+0x60/0xb0 [ 4.663638] dump_stack+0x1c/0x28 [ 4.667363] __lock_acquire+0xd24/0x2da8 [ 4.671775] lock_acquire+0x308/0x470 [ 4.675892] _raw_spin_lock_irqsave+0x80/0xb8 [ 4.680789] regmap_lock_spinlock+0x20/0x48 [ 4.685490] regmap_read+0x64/0xc0 [ 4.689314] bm1000_pcie_link_up+0x118/0x1b8 [ 4.694112] dw_pcie_link_up+0x44/0x90 [ 4.698324] dw_pcie_other_conf_map_bus+0x44/0x110 [ 4.703707] pci_generic_config_read+0x5c/0xf8 [ 4.708699] dw_pcie_rd_other_conf+0x58/0xf8 [ 4.713495] pci_bus_read_config_dword+0xe0/0x160 [ 4.718777] pci_bus_generic_read_dev_vendor_id+0x3c/0x228 [ 4.724939] pci_scan_single_device+0x114/0x1c0 [ 4.730029] pci_scan_slot+0xdc/0x2c0 [ 4.734144] pci_scan_child_bus_extend+0x58/0x3b0 [ 4.739429] pci_scan_bridge_extend+0x1e8/0x890 [ 4.744519] pci_scan_child_bus_extend+0x158/0x3b0 [ 4.749900] pci_scan_root_bus_bridge+0xa8/0x150 [ 4.755088] pci_host_probe+0x20/0xf0 [ 4.759203] dw_pcie_host_init+0x3e8/0x9e8 [ 4.763804] bm1000_add_pcie_port+0x130/0x3d8 [ 4.768698] baikal_pcie_probe+0x140/0x198 [ 4.773300] platform_probe+0x94/0x150 [ 4.777513] really_probe+0x254/0x5b8 [ 4.781627] __driver_probe_device+0xcc/0x238 [ 4.786520] driver_probe_device+0x64/0x1b0 [ 4.791219] __driver_attach+0x160/0x308 [ 4.795624] bus_for_each_dev+0xe4/0x168 [ 4.800035] driver_attach+0x3c/0x58 [ 4.804054] bus_add_driver+0x188/0x310 [ 4.808364] driver_register+0xb0/0x1f8 [ 4.812673] __platform_driver_register+0x4c/0x68 [ 4.817956] baikal_pcie_driver_init+0x28/0x40 [ 4.822951] do_one_initcall+0xe0/0x4c8 [ 4.827262] kernel_init_freeable+0x394/0x500 [ 4.832160] kernel_init+0x38/0x230 [ 4.836084] ret_from_fork+0x10/0x20 Caused by attempting to take spinlock while raw_spinlock is held. To fix, raw_spinlock is also used for regmap.
A false-positive kmsan report is detected when running ping command. An inline assembly instruction 'vstl' can write varied amount of bytes depending on value of 'index' argument. If 'index' > 0, 'vstl' writes at least 2 bytes. clang generates kmsan write helper call depending on inline assembly constraints. Constraints are evaluated compile-time, but value of 'index' argument is known only at runtime. clang currently generates call to __msan_instrument_asm_store with 1 byte as size. Manually call kmsan function to indicate correct amount of bytes written and fix false-positive report. This change fixes following kmsan reports: [ 36.563119] ===================================================== [ 36.563594] BUG: KMSAN: uninit-value in virtqueue_add+0x35c6/0x7c70 [ 36.563852] virtqueue_add+0x35c6/0x7c70 [ 36.564016] virtqueue_add_outbuf+0xa0/0xb0 [ 36.564266] start_xmit+0x288c/0x4a20 [ 36.564460] dev_hard_start_xmit+0x302/0x900 [ 36.564649] sch_direct_xmit+0x340/0xea0 [ 36.564894] __dev_queue_xmit+0x2e94/0x59b0 [ 36.565058] neigh_resolve_output+0x936/0xb40 [ 36.565278] __neigh_update+0x2f66/0x3a60 [ 36.565499] neigh_update+0x52/0x60 [ 36.565683] arp_process+0x1588/0x2de0 [ 36.565916] NF_HOOK+0x1da/0x240 [ 36.566087] arp_rcv+0x3e4/0x6e0 [ 36.566306] __netif_receive_skb_list_core+0x1374/0x15a0 [ 36.566527] netif_receive_skb_list_internal+0x1116/0x17d0 [ 36.566710] napi_complete_done+0x376/0x740 [ 36.566918] virtnet_poll+0x1bae/0x2910 [ 36.567130] __napi_poll+0xf4/0x830 [ 36.567294] net_rx_action+0x97c/0x1ed0 [ 36.567556] handle_softirqs+0x306/0xe10 [ 36.567731] irq_exit_rcu+0x14c/0x2e0 [ 36.567910] do_io_irq+0xd4/0x120 [ 36.568139] io_int_handler+0xc2/0xe8 [ 36.568299] arch_cpu_idle+0xb0/0xc0 [ 36.568540] arch_cpu_idle+0x76/0xc0 [ 36.568726] default_idle_call+0x40/0x70 [ 36.568953] do_idle+0x1d6/0x390 [ 36.569486] cpu_startup_entry+0x9a/0xb0 [ 36.569745] rest_init+0x1ea/0x290 [ 36.570029] start_kernel+0x95e/0xb90 [ 36.570348] startup_continue+0x2e/0x40 [ 36.570703] [ 36.570798] Uninit was created at: [ 36.571002] kmem_cache_alloc_node_noprof+0x9e8/0x10e0 [ 36.571261] kmalloc_reserve+0x12a/0x470 [ 36.571553] __alloc_skb+0x310/0x860 [ 36.571844] __ip_append_data+0x483e/0x6a30 [ 36.572170] ip_append_data+0x11c/0x1e0 [ 36.572477] raw_sendmsg+0x1c8c/0x2180 [ 36.572818] inet_sendmsg+0xe6/0x190 [ 36.573142] __sys_sendto+0x55e/0x8e0 [ 36.573392] __s390x_sys_socketcall+0x19ae/0x2ba0 [ 36.573571] __do_syscall+0x12e/0x240 [ 36.573823] system_call+0x6e/0x90 [ 36.573976] [ 36.574017] Byte 35 of 98 is uninitialized [ 36.574082] Memory access of size 98 starts at 0000000007aa0012 [ 36.574218] [ 36.574325] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G B N 6.17.0-dirty torvalds#16 NONE [ 36.574541] Tainted: [B]=BAD_PAGE, [N]=TEST [ 36.574617] Hardware name: IBM 3931 A01 703 (KVM/Linux) [ 36.574755] ===================================================== [ 63.532541] ===================================================== [ 63.533639] BUG: KMSAN: uninit-value in virtqueue_add+0x35c6/0x7c70 [ 63.533989] virtqueue_add+0x35c6/0x7c70 [ 63.534940] virtqueue_add_outbuf+0xa0/0xb0 [ 63.535861] start_xmit+0x288c/0x4a20 [ 63.536708] dev_hard_start_xmit+0x302/0x900 [ 63.537020] sch_direct_xmit+0x340/0xea0 [ 63.537997] __dev_queue_xmit+0x2e94/0x59b0 [ 63.538819] neigh_resolve_output+0x936/0xb40 [ 63.539793] ip_finish_output2+0x1ee2/0x2200 [ 63.540784] __ip_finish_output+0x272/0x7a0 [ 63.541765] ip_finish_output+0x4e/0x5e0 [ 63.542791] ip_output+0x166/0x410 [ 63.543771] ip_push_pending_frames+0x1a2/0x470 [ 63.544753] raw_sendmsg+0x1f06/0x2180 [ 63.545033] inet_sendmsg+0xe6/0x190 [ 63.546006] __sys_sendto+0x55e/0x8e0 [ 63.546859] __s390x_sys_socketcall+0x19ae/0x2ba0 [ 63.547730] __do_syscall+0x12e/0x240 [ 63.548019] system_call+0x6e/0x90 [ 63.548989] [ 63.549779] Uninit was created at: [ 63.550691] kmem_cache_alloc_node_noprof+0x9e8/0x10e0 [ 63.550975] kmalloc_reserve+0x12a/0x470 [ 63.551969] __alloc_skb+0x310/0x860 [ 63.552949] __ip_append_data+0x483e/0x6a30 [ 63.553902] ip_append_data+0x11c/0x1e0 [ 63.554912] raw_sendmsg+0x1c8c/0x2180 [ 63.556719] inet_sendmsg+0xe6/0x190 [ 63.557534] __sys_sendto+0x55e/0x8e0 [ 63.557875] __s390x_sys_socketcall+0x19ae/0x2ba0 [ 63.558869] __do_syscall+0x12e/0x240 [ 63.559832] system_call+0x6e/0x90 [ 63.560780] [ 63.560972] Byte 35 of 98 is uninitialized [ 63.561741] Memory access of size 98 starts at 0000000005704312 [ 63.561950] [ 63.562824] CPU: 3 UID: 0 PID: 192 Comm: ping Tainted: G B N 6.17.0-dirty torvalds#16 NONE [ 63.563868] Tainted: [B]=BAD_PAGE, [N]=TEST [ 63.564751] Hardware name: IBM 3931 A01 703 (KVM/Linux) [ 63.564986] ===================================================== Fixes: dcd3e1d ("s390/checksum: provide csum_partial_copy_nocheck()") Signed-off-by: Aleksei Nikiforov <aleksei.nikiforov@linux.ibm.com>
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.
This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0: hint #0x19
ffffffc080ffcff4: stp x29, x30, [sp, #-48]!
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc: add x29, sp, #0x0
ffffffc080ffd000: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd004: orr x20, xzr, x0
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008: add x0, x2, #0xc20
{
ffffffc080ffd00c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014: add x19, x19, #0xbb0
ffffffc080ffd018: ldr w3, [x20, #4]
dev->last_state_idx = state;
to
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034: hint #0x19
ffffffc080ffd038: stp x29, x30, [sp, #-48]!
ffffffc080ffd03c: add x29, sp, #0x0
ffffffc080ffd040: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd044: orr x20, xzr, x0
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050: add x19, x19, #0xbb0
dev->last_state_idx = state;
This saves us:
adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
add x0, x2, #0xc20
ldr w3, [x20, #4]
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
This series enables the display subsystem on the StarFive JH7110 SoC.
This hardware has a complex set of dependencies that this series aims to
solve.
The dom_vout (Video Output) block is a wrapper containing the display
controller (dc8200), the clock generator (voutcrg), and the HDMI IP, all
of which are managed by a single power domain (PD_VOUT).
More importantly, the HDMI IP is a monolithic block (controller and PHY
in one register space) that has a circular dependency with voutcrg:
1. The HDMI Controller needs clocks (like sysclk, mclk) from voutcrg to
function.
2. The voutcrg (for its pixel MUXes) needs the variable pixel clock,
which is generated by the HDMI PHY.
This series breaks this dependency loop by modeling the hardware
correctly:
1. A new vout-subsystem wrapper driver is added. It manages the shared
PD_VOUT power domain and top level bus clocks. It uses
of_platform_populate() to ensure its children (hdmi_mfd, voutcrg,
dc8200) are probed only after power is on.
2. The monolithic hdmi node is refactored into an MFD. A new hdmi-mfd
parent driver is added, which maps the shared register space and
creates a regmap.
3. The MFD populates two children:
- hdmi-phy: A new PHY driver that binds to the MFD. Its only
dependency is the xin24m reference clock. It acts as the clock
provider for the variable pixel clock (hdmi_pclk).
- hdmi-controller: A new DRM bridge driver. It consumes clocks from
voutcrg and the hdmi_pclk/PHY from its sibling hdmi-phy driver.
4. The generic inno-hdmi bridge library is refactored to accept a regmap
from a parent MFD, making this model possible.
This MFD split breaks the circular dependency, as the kernel's deferred
probe can now find a correct, linear probe order: hdmi-phy (probes
first) -> voutcrg (probes second) -> hdmi-controller (probes third).
This series provides all the necessary dt-bindings, the new drivers, the
modification to inno-hdmi, and the final device tree changes to enable
the display.
Series depends on patchsets that are not merged yet:
- dc8200 driver [1]
- th1520 reset (dependency of dc8200 series) [2]
- inno-hdmi bridge [3]
Testing:
I've tested on my monitor using `modetest` for following modes:
#0 2560x1440 59.95 2560 2608 2640 2720 1440 1443 1448 1481 241500
flags: phsync, nvsync; type: preferred, driver [DOESN"T WORK]
#1 2048x1080 60.00 2048 2096 2128 2208 1080 1083 1093 1111 147180
flags: phsync, nvsync; type: driver [DOESN"T WORK]
#2 2048x1080 24.00 2048 2096 2128 2208 1080 1083 1093 1099 58230
flags: phsync, nvsync; type: driver [DOESN'T WORK]
#3 1920x1080 60.00 1920 2008 2052 2200 1080 1084 1089 1125 148500
flags: phsync, pvsync; type: driver [WORKS]
#4 1920x1080 59.94 1920 2008 2052 2200 1080 1084 1089 1125 148352
flags: phsync, pvsync; type: driver [WORKS]
#5 1920x1080 50.00 1920 2448 2492 2640 1080 1084 1089 1125 148500
flags: phsync, pvsync; type: driver [WORKS]
torvalds#6 1600x1200 60.00 1600 1664 1856 2160 1200 1201 1204 1250 162000
flags: phsync, pvsync; type: driver [WORKS]
torvalds#7 1280x1024 75.02 1280 1296 1440 1688 1024 1025 1028 1066 135000
flags: phsync, pvsync; type: driver [WORKS]
torvalds#8 1280x1024 60.02 1280 1328 1440 1688 1024 1025 1028 1066 108000
flags: phsync, pvsync; type: driver [WORKS]
torvalds#9 1152x864 75.00 1152 1216 1344 1600 864 865 868 900 108000 flags:
phsync, pvsync; type: driver [WORKS]
torvalds#10 1280x720 60.00 1280 1390 1430 1650 720 725 730 750 74250 flags:
phsync, pvsync; type: driver [WORKS]
torvalds#11 1280x720 59.94 1280 1390 1430 1650 720 725 730 750 74176 flags:
phsync, pvsync; type: driver [WORKS]
torvalds#12 1280x720 50.00 1280 1720 1760 1980 720 725 730 750 74250 flags:
phsync, pvsync; type: driver [WORKS]
torvalds#13 1024x768 75.03 1024 1040 1136 1312 768 769 772 800 78750 flags:
phsync, pvsync; type: driver [WORKS]
torvalds#14 1024x768 60.00 1024 1048 1184 1344 768 771 777 806 65000 flags:
nhsync, nvsync; type: driver [WORKS]
torvalds#15 800x600 75.00 800 816 896 1056 600 601 604 625 49500 flags:
phsync, pvsync; type: driver [WORKS]
torvalds#16 800x600 60.32 800 840 968 1056 600 601 605 628 40000 flags:
phsync, pvsync; type: driver [WORKS]
torvalds#17 720x576 50.00 720 732 796 864 576 581 586 625 27000 flags: nhsync,
nvsync; type: driver [WORKS]
torvalds#18 720x480 60.00 720 736 798 858 480 489 495 525 27027 flags: nhsync,
nvsync; type: driver [WORKS]
torvalds#19 720x480 59.94 720 736 798 858 480 489 495 525 27000 flags: nhsync,
nvsync; type: driver [WORKS]
torvalds#20 640x480 75.00 640 656 720 840 480 481 484 500 31500 flags: nhsync,
nvsync; type: driver [WORKS]
torvalds#21 640x480 60.00 640 656 752 800 480 490 492 525 25200 flags: nhsync,
nvsync; type: driver [WORKS]
torvalds#22 640x480 59.94 640 656 752 800 480 490 492 525 25175 flags: nhsync,
nvsync; type: driver [WORKS]
torvalds#23 720x400 70.08 720 738 846 900 400 412 414 449 28320 flags: nhsync,
pvsync; type: driver [DOESN'T WORK]
I believe this is a PHY tuning issue that can be fixed in the new
phy-jh7110-inno-hdmi.c driver without changing the overall architecture.
I plan to continue debugging these modes and will submit follow up fixes
as needed.
The core architectural plumbing is sound and ready for review.
Notes:
- The JH7110 does not have a centralized MAINTAINERS entry like the
TH1520, and driver maintainership seems fragmented. I have therefore
added a MAINTAINERS entry for the display subsystem and am willing to
help with its maintenance.
- I am aware that the new phy-jh7110-inno-hdmi.c driver (patch 12) is a
near duplicate of the existing phy-rockchip-inno-hdmi.c. This
duplication is intentional and temporary for this RFC series. My goal
is to first get feedback on the overall architecture (the vout-subsystem
wrapper, the hdmi-mfd split, and the dual-function PHY/CLK driver).
If this architectural approach is acceptable, I will rework the PHY
driver for a formal v1 submission. This will involve refactoring the
common logic from the Rockchip PHY into a generic core driver that both
the Rockchip and this new StarFive PHY driver will use.
Many thanks to the Icenowy Zheng who developed a dc8200 driver, as well
as helped me understand how the SoC and the display pipeline works.
[1] - https://lore.kernel.org/all/20250921083446.790374-1-uwu@icenowy.me/
[2] - https://lore.kernel.org/all/20251014131032.49616-1-ziyao@disroot.org/
[3] - https://lore.kernel.org/all/20251016083843.76675-1-andyshrk@163.com/
# Describe the purpose of this series. The information you put here
# will be used by the project maintainer to make a decision whether
# your patches should be reviewed, and in what priority order. Please be
# very detailed and link to any relevant discussions or sites that the
# maintainer can review to better understand your proposed changes. If you
# only have a single patch in your series, the contents of the cover
# letter will be appended to the "under-the-cut" portion of the patch.
# Lines starting with # will be removed from the cover letter. You can
# use them to add notes or reminders to yourself. If you want to use
# markdown headers in your cover letter, start the line with ">#".
# You can add trailers to the cover letter. Any email addresses found in
# these trailers will be added to the addresses specified/generated
# during the b4 send stage. You can also run "b4 prep --auto-to-cc" to
# auto-populate the To: and Cc: trailers based on the code being
# modified.
To: Michal Wilczynski <m.wilczynski@samsung.com>
To: Conor Dooley <conor@kernel.org>
To: Rob Herring <robh@kernel.org>
To: Krzysztof Kozlowski <krzk+dt@kernel.org>
To: Emil Renner Berthing <kernel@esmil.dk>
To: Hal Feng <hal.feng@starfivetech.com>
To: Michael Turquette <mturquette@baylibre.com>
To: Stephen Boyd <sboyd@kernel.org>
To: Conor Dooley <conor+dt@kernel.org>
To: Xingyu Wu <xingyu.wu@starfivetech.com>
To: Vinod Koul <vkoul@kernel.org>
To: Kishon Vijay Abraham I <kishon@kernel.org>
To: Andrzej Hajda <andrzej.hajda@intel.com>
To: Neil Armstrong <neil.armstrong@linaro.org>
To: Robert Foss <rfoss@kernel.org>
To: Laurent Pinchart <Laurent.pinchart@ideasonboard.com>
To: Jonas Karlman <jonas@kwiboo.se>
To: Jernej Skrabec <jernej.skrabec@gmail.com>
To: David Airlie <airlied@gmail.com>
To: Simona Vetter <simona@ffwll.ch>
To: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
To: Maxime Ripard <mripard@kernel.org>
To: Thomas Zimmermann <tzimmermann@suse.de>
To: Lee Jones <lee@kernel.org>
To: Philipp Zabel <p.zabel@pengutronix.de>
To: Paul Walmsley <paul.walmsley@sifive.com>
To: Palmer Dabbelt <palmer@dabbelt.com>
To: Albert Ou <aou@eecs.berkeley.edu>
To: Alexandre Ghiti <alex@ghiti.fr>
To: Marek Szyprowski <m.szyprowski@samsung.com>
To: Icenowy Zheng <uwu@icenowy.me>
To: Maud Spierings <maudspierings@gocontroll.com>
To: Andy Yan <andyshrk@163.com>
To: Heiko Stuebner <heiko@sntech.de>
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-clk@vger.kernel.org
Cc: linux-phy@lists.infradead.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-riscv@lists.infradead.org
---
Changes in v2:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
- Link to v1: https://lore.kernel.org/r/20251108-jh7110-clean-send-v1-0-06bf43bb76b1@samsung.com
--- b4-submit-tracking ---
# This section is used internally by b4 prep for tracking purposes.
{
"series": {
"revision": 2,
"change-id": "20251031-jh7110-clean-send-7d2242118026",
"prefixes": [
"RFC"
],
"prerequisites": [
"message-id: <20251014131032.49616-1-ziyao@disroot.org>",
"message-id: <20251016083843.76675-1-andyshrk@163.com>",
"message-id: <20250921083446.790374-1-uwu@icenowy.me>",
"base-commit: v6.17-rc6"
],
"history": {
"v1": [
"20251108-jh7110-clean-send-v1-0-06bf43bb76b1@samsung.com"
]
}
}
}
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.
This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0: hint #0x19
ffffffc080ffcff4: stp x29, x30, [sp, #-48]!
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc: add x29, sp, #0x0
ffffffc080ffd000: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd004: orr x20, xzr, x0
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008: add x0, x2, #0xc20
{
ffffffc080ffd00c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014: add x19, x19, #0xbb0
ffffffc080ffd018: ldr w3, [x20, #4]
dev->last_state_idx = state;
to
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034: hint #0x19
ffffffc080ffd038: stp x29, x30, [sp, #-48]!
ffffffc080ffd03c: add x29, sp, #0x0
ffffffc080ffd040: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd044: orr x20, xzr, x0
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050: add x19, x19, #0xbb0
dev->last_state_idx = state;
This saves us:
adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
add x0, x2, #0xc20
ldr w3, [x20, #4]
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/20251110120819.714560-1-christian.loehle@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.
This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0: hint #0x19
ffffffc080ffcff4: stp x29, x30, [sp, #-48]!
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc: add x29, sp, #0x0
ffffffc080ffd000: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd004: orr x20, xzr, x0
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008: add x0, x2, #0xc20
{
ffffffc080ffd00c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014: add x19, x19, #0xbb0
ffffffc080ffd018: ldr w3, [x20, #4]
dev->last_state_idx = state;
to
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034: hint #0x19
ffffffc080ffd038: stp x29, x30, [sp, #-48]!
ffffffc080ffd03c: add x29, sp, #0x0
ffffffc080ffd040: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd044: orr x20, xzr, x0
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050: add x19, x19, #0xbb0
dev->last_state_idx = state;
This saves us:
adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
add x0, x2, #0xc20
ldr w3, [x20, #4]
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/20251110120819.714560-1-christian.loehle@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
[ Upstream commit a699781 ] A sysfs reader can race with a device reset or removal, attempting to read device state when the device is not actually present. eg: [exception RIP: qed_get_current_link+17] torvalds#8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede] torvalds#9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3 torvalds#10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4 torvalds#11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300 torvalds#12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c torvalds#13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b torvalds#14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3 torvalds#15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1 torvalds#16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f torvalds#17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb crash> struct net_device.state ffff9a9d21336000 state = 5, state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100). The device is not present, note lack of __LINK_STATE_PRESENT (0b10). This is the same sort of panic as observed in commit 4224cfd ("net-sysfs: add check for netdevice being present to speed_show"). There are many other callers of __ethtool_get_link_ksettings() which don't have a device presence check. Move this check into ethtool to protect all callers. Fixes: d519e17 ("net: export device speed and duplex via sysfs") Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show") Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
A false-positive kmsan report is detected when running ping command. An inline assembly instruction 'vstl' can write varied amount of bytes depending on value of 'index' argument. If 'index' > 0, 'vstl' writes at least 2 bytes. clang generates kmsan write helper call depending on inline assembly constraints. Constraints are evaluated compile-time, but value of 'index' argument is known only at runtime. clang currently generates call to __msan_instrument_asm_store with 1 byte as size. Manually call kmsan function to indicate correct amount of bytes written and fix false-positive report. This change fixes following kmsan reports: [ 36.563119] ===================================================== [ 36.563594] BUG: KMSAN: uninit-value in virtqueue_add+0x35c6/0x7c70 [ 36.563852] virtqueue_add+0x35c6/0x7c70 [ 36.564016] virtqueue_add_outbuf+0xa0/0xb0 [ 36.564266] start_xmit+0x288c/0x4a20 [ 36.564460] dev_hard_start_xmit+0x302/0x900 [ 36.564649] sch_direct_xmit+0x340/0xea0 [ 36.564894] __dev_queue_xmit+0x2e94/0x59b0 [ 36.565058] neigh_resolve_output+0x936/0xb40 [ 36.565278] __neigh_update+0x2f66/0x3a60 [ 36.565499] neigh_update+0x52/0x60 [ 36.565683] arp_process+0x1588/0x2de0 [ 36.565916] NF_HOOK+0x1da/0x240 [ 36.566087] arp_rcv+0x3e4/0x6e0 [ 36.566306] __netif_receive_skb_list_core+0x1374/0x15a0 [ 36.566527] netif_receive_skb_list_internal+0x1116/0x17d0 [ 36.566710] napi_complete_done+0x376/0x740 [ 36.566918] virtnet_poll+0x1bae/0x2910 [ 36.567130] __napi_poll+0xf4/0x830 [ 36.567294] net_rx_action+0x97c/0x1ed0 [ 36.567556] handle_softirqs+0x306/0xe10 [ 36.567731] irq_exit_rcu+0x14c/0x2e0 [ 36.567910] do_io_irq+0xd4/0x120 [ 36.568139] io_int_handler+0xc2/0xe8 [ 36.568299] arch_cpu_idle+0xb0/0xc0 [ 36.568540] arch_cpu_idle+0x76/0xc0 [ 36.568726] default_idle_call+0x40/0x70 [ 36.568953] do_idle+0x1d6/0x390 [ 36.569486] cpu_startup_entry+0x9a/0xb0 [ 36.569745] rest_init+0x1ea/0x290 [ 36.570029] start_kernel+0x95e/0xb90 [ 36.570348] startup_continue+0x2e/0x40 [ 36.570703] [ 36.570798] Uninit was created at: [ 36.571002] kmem_cache_alloc_node_noprof+0x9e8/0x10e0 [ 36.571261] kmalloc_reserve+0x12a/0x470 [ 36.571553] __alloc_skb+0x310/0x860 [ 36.571844] __ip_append_data+0x483e/0x6a30 [ 36.572170] ip_append_data+0x11c/0x1e0 [ 36.572477] raw_sendmsg+0x1c8c/0x2180 [ 36.572818] inet_sendmsg+0xe6/0x190 [ 36.573142] __sys_sendto+0x55e/0x8e0 [ 36.573392] __s390x_sys_socketcall+0x19ae/0x2ba0 [ 36.573571] __do_syscall+0x12e/0x240 [ 36.573823] system_call+0x6e/0x90 [ 36.573976] [ 36.574017] Byte 35 of 98 is uninitialized [ 36.574082] Memory access of size 98 starts at 0000000007aa0012 [ 36.574218] [ 36.574325] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G B N 6.17.0-dirty torvalds#16 NONE [ 36.574541] Tainted: [B]=BAD_PAGE, [N]=TEST [ 36.574617] Hardware name: IBM 3931 A01 703 (KVM/Linux) [ 36.574755] ===================================================== [ 63.532541] ===================================================== [ 63.533639] BUG: KMSAN: uninit-value in virtqueue_add+0x35c6/0x7c70 [ 63.533989] virtqueue_add+0x35c6/0x7c70 [ 63.534940] virtqueue_add_outbuf+0xa0/0xb0 [ 63.535861] start_xmit+0x288c/0x4a20 [ 63.536708] dev_hard_start_xmit+0x302/0x900 [ 63.537020] sch_direct_xmit+0x340/0xea0 [ 63.537997] __dev_queue_xmit+0x2e94/0x59b0 [ 63.538819] neigh_resolve_output+0x936/0xb40 [ 63.539793] ip_finish_output2+0x1ee2/0x2200 [ 63.540784] __ip_finish_output+0x272/0x7a0 [ 63.541765] ip_finish_output+0x4e/0x5e0 [ 63.542791] ip_output+0x166/0x410 [ 63.543771] ip_push_pending_frames+0x1a2/0x470 [ 63.544753] raw_sendmsg+0x1f06/0x2180 [ 63.545033] inet_sendmsg+0xe6/0x190 [ 63.546006] __sys_sendto+0x55e/0x8e0 [ 63.546859] __s390x_sys_socketcall+0x19ae/0x2ba0 [ 63.547730] __do_syscall+0x12e/0x240 [ 63.548019] system_call+0x6e/0x90 [ 63.548989] [ 63.549779] Uninit was created at: [ 63.550691] kmem_cache_alloc_node_noprof+0x9e8/0x10e0 [ 63.550975] kmalloc_reserve+0x12a/0x470 [ 63.551969] __alloc_skb+0x310/0x860 [ 63.552949] __ip_append_data+0x483e/0x6a30 [ 63.553902] ip_append_data+0x11c/0x1e0 [ 63.554912] raw_sendmsg+0x1c8c/0x2180 [ 63.556719] inet_sendmsg+0xe6/0x190 [ 63.557534] __sys_sendto+0x55e/0x8e0 [ 63.557875] __s390x_sys_socketcall+0x19ae/0x2ba0 [ 63.558869] __do_syscall+0x12e/0x240 [ 63.559832] system_call+0x6e/0x90 [ 63.560780] [ 63.560972] Byte 35 of 98 is uninitialized [ 63.561741] Memory access of size 98 starts at 0000000005704312 [ 63.561950] [ 63.562824] CPU: 3 UID: 0 PID: 192 Comm: ping Tainted: G B N 6.17.0-dirty torvalds#16 NONE [ 63.563868] Tainted: [B]=BAD_PAGE, [N]=TEST [ 63.564751] Hardware name: IBM 3931 A01 703 (KVM/Linux) [ 63.564986] ===================================================== Fixes: dcd3e1d ("s390/checksum: provide csum_partial_copy_nocheck()") Signed-off-by: Aleksei Nikiforov <aleksei.nikiforov@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
BugLink: https://bugs.launchpad.net/bugs/2125142 kernfs_remove supported NULL kernfs_node param to bail out but revent per-fs lock change introduced regression that dereferencing the param without NULL check so kernel goes crash. This patch checks the NULL kernfs_node in kernfs_remove and if so, just return. Quote from bug report by Jirka ``` The bug is triggered by running NAS Parallel benchmark suite on SuperMicro servers with 2x Xeon(R) Gold 6126 CPU. Here is the error log: [ 247.035564] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 247.036009] #PF: supervisor read access in kernel mode [ 247.036009] #PF: error_code(0x0000) - not-present page [ 247.036009] PGD 0 P4D 0 [ 247.036009] Oops: 0000 [#1] PREEMPT SMP PTI [ 247.058060] CPU: 1 PID: 6546 Comm: umount Not tainted 5.16.0393c3714081a53795bbff0e985d24146def6f57f+ torvalds#16 [ 247.058060] Hardware name: Supermicro Super Server/X11DDW-L, BIOS 2.0b 03/07/2018 [ 247.058060] RIP: 0010:kernfs_remove+0x8/0x50 [ 247.058060] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4 ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83 c4 60 [ 247.058060] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246 [ 247.058060] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018 [ 247.058060] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000 [ 247.058060] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018 [ 247.122048] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000 [ 247.122048] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100 [ 247.122048] FS: 00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000) knlGS:0000000000000000 [ 247.122048] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 247.122048] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0 [ 247.122048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 247.122048] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 247.122048] PKRU: 55555554 [ 247.122048] Call Trace: [ 247.122048] <TASK> [ 247.122048] rdt_kill_sb+0x29d/0x350 [ 247.122048] deactivate_locked_super+0x36/0xa0 [ 247.122048] cleanup_mnt+0x131/0x190 [ 247.122048] task_work_run+0x5c/0x90 [ 247.122048] exit_to_user_mode_prepare+0x229/0x230 [ 247.122048] syscall_exit_to_user_mode+0x18/0x40 [ 247.122048] do_syscall_64+0x48/0x90 [ 247.122048] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 247.122048] RIP: 0033:0x7f01be2d735b ``` Link: https://bugzilla.kernel.org/show_bug.cgi?id=215696 Link: https://lore.kernel.org/lkml/CAE4VaGDZr_4wzRn2___eDYRtmdPaGGJdzu_LCSkJYuY9BEO3cw@mail.gmail.com/ Fixes: 393c371 (kernfs: switch global kernfs_rwsem lock to per-fs lock) Cc: stable@vger.kernel.org Reported-by: Jirka Hladky <jhladky@redhat.com> Tested-by: Jirka Hladky <jhladky@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Minchan Kim <minchan@kernel.org> Link: https://lore.kernel.org/r/20220427172152.3505364-1-minchan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit ad8d869) Signed-off-by: Ghadi Elie Rahme <ghadi.rahme@canonical.com> Acked-by: Alessio Faina <alessio.faina@canonical.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Signed-off-by: Mehmet Basaran <mehmet.basaran@canonical.com> Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags;
};
test_task_btf:PASS:bpf_iter_task_btf__open_and_load 0 nsec
do_btf_read:PASS:attach_iter 0 nsec
do_btf_read:PASS:create_iter 0 nsec
do_btf_read:PASS:read 0 nsec
do_btf_read:FAIL:check for btf representation of task_struct in iter data unexpected check for btf representation of task_struct in iter data: '(struct task_struct)' is not a substring of 'Raw BTF task
'
test_task_btf:FAIL:no task iteration, did BPF program run? unexpected no task iteration, did BPF program run?: actual 0 == expected 0
torvalds#16/13 bpf_iter/task_btf:FAIL
static struct btf_ptr ptr = {};
long ret;
[...]
ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
--> -EINVAL
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
This is a MarkDown version of the README file... must be renamed to README.md to work properly (I think).