Skip to content

Conversation

@andrewshadura
Copy link
Contributor

Checking for just two variants of standard timings for
1366x768 isn't quite correct, let's check for ranges
instead.

Signed-off-by: Andrew Shadura andrew@beldisplaytech.com

Checking for just two variants of standard timings for
1366x768 isn't quite correct, let's check for ranges
instead.

Signed-off-by: Andrew Shadura <andrew@beldisplaytech.com>
jkstrick pushed a commit to jkstrick/linux that referenced this pull request Feb 11, 2012
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not
update the real num tx queues. netdev_queue_update_kobjects() is already
called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when
upper layer driver, e.g., FCoE protocol stack is monitoring the netdev
event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove
extra queues allocated for FCoE, the associated txq sysfs kobjects are already
removed, and trying to update the real num queues would cause something like
below:

...
PID: 25138  TASK: ffff88021e64c440  CPU: 3   COMMAND: "kworker/3:3"
 #0 [ffff88021f007760] machine_kexec at ffffffff810226d9
 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d
 #2 [ffff88021f0078a0] oops_end at ffffffff813bca78
 #3 [ffff88021f0078d0] no_context at ffffffff81029e72
 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155
 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e
 torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e
 torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045
    [exception RIP: sysfs_find_dirent+17]
    RIP: ffffffff81178611  RSP: ffff88021f007bc0  RFLAGS: 00010246
    RAX: ffff88021e64c440  RBX: ffffffff8156cc63  RCX: 0000000000000004
    RDX: ffffffff8156cc63  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88021f007be0   R8: 0000000000000004   R9: 0000000000000008
    R10: ffffffff816fed00  R11: 0000000000000004  R12: 0000000000000000
    R13: ffffffff8156cc63  R14: 0000000000000000  R15: ffff8802222a0000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07
 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27
torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9
torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38
torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe]
torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe]
torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe]
torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q]
torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe]
torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe]
torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca
torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513
torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6
torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4

Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
zachariasmaladroit pushed a commit to galaxys-cm7miui-kernel/linux that referenced this pull request Feb 11, 2012
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not
update the real num tx queues. netdev_queue_update_kobjects() is already
called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when
upper layer driver, e.g., FCoE protocol stack is monitoring the netdev
event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove
extra queues allocated for FCoE, the associated txq sysfs kobjects are already
removed, and trying to update the real num queues would cause something like
below:

...
PID: 25138  TASK: ffff88021e64c440  CPU: 3   COMMAND: "kworker/3:3"
 #0 [ffff88021f007760] machine_kexec at ffffffff810226d9
 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d
 #2 [ffff88021f0078a0] oops_end at ffffffff813bca78
 #3 [ffff88021f0078d0] no_context at ffffffff81029e72
 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155
 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e
 torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e
 torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045
    [exception RIP: sysfs_find_dirent+17]
    RIP: ffffffff81178611  RSP: ffff88021f007bc0  RFLAGS: 00010246
    RAX: ffff88021e64c440  RBX: ffffffff8156cc63  RCX: 0000000000000004
    RDX: ffffffff8156cc63  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88021f007be0   R8: 0000000000000004   R9: 0000000000000008
    R10: ffffffff816fed00  R11: 0000000000000004  R12: 0000000000000000
    R13: ffffffff8156cc63  R14: 0000000000000000  R15: ffff8802222a0000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07
 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27
torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9
torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38
torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe]
torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe]
torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe]
torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q]
torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe]
torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe]
torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca
torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513
torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6
torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4

Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
@peterwillcn
Copy link

please add LinuxPPS in kernel support ntpd for gps...
http://wiki.enneenne.com/index.php/LinuxPPS_installation

tworaz pushed a commit to tworaz/linux that referenced this pull request Feb 13, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d14] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8d
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xXorAa pushed a commit to xXorAa/linux that referenced this pull request Feb 17, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d14] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8d
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Feb 23, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Mar 1, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Mar 19, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Mar 22, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Apr 2, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Apr 9, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Apr 11, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Apr 12, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
psanford pushed a commit to retailnext/linux that referenced this pull request Apr 16, 2012
…S block during isolation for migration

BugLink: http://bugs.launchpad.net/bugs/931719

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d14] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8d
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Apr 19, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 4, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 4, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 5, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 7, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 9, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 14, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
nomis pushed a commit to nomis/linux that referenced this pull request May 15, 2012
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 16, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 17, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 21, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 22, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 22, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 23, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 24, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
johnstultz-work pushed a commit to johnstultz-work/linux-dev that referenced this pull request Nov 20, 2025
[ Upstream commit 1626e5e ]

While performing the rq locking dance in dispatch_to_local_dsq(), we may
trigger the following lock imbalance condition, in particular when
multiple tasks are rapidly changing CPU affinity (i.e., running a
`stress-ng --race-sched 0`):

[   13.413579] =====================================
[   13.413660] WARNING: bad unlock balance detected!
[   13.413729] 6.13.0-virtme torvalds#15 Not tainted
[   13.413792] -------------------------------------
[   13.413859] kworker/1:1/80 is trying to release lock (&rq->__lock) at:
[   13.413954] [<ffffffff873c6c48>] dispatch_to_local_dsq+0x108/0x1a0
[   13.414111] but there are no more locks to release!
[   13.414176]
[   13.414176] other info that might help us debug this:
[   13.414258] 1 lock held by kworker/1:1/80:
[   13.414318]  #0: ffff8b66feb41698 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x20/0x90
[   13.414612]
[   13.414612] stack backtrace:
[   13.415255] CPU: 1 UID: 0 PID: 80 Comm: kworker/1:1 Not tainted 6.13.0-virtme torvalds#15
[   13.415505] Workqueue:  0x0 (events)
[   13.415567] Sched_ext: dsp_local_on (enabled+all), task: runnable_at=-2ms
[   13.415570] Call Trace:
[   13.415700]  <TASK>
[   13.415744]  dump_stack_lvl+0x78/0xe0
[   13.415806]  ? dispatch_to_local_dsq+0x108/0x1a0
[   13.415884]  print_unlock_imbalance_bug+0x11b/0x130
[   13.415965]  ? dispatch_to_local_dsq+0x108/0x1a0
[   13.416226]  lock_release+0x231/0x2c0
[   13.416326]  _raw_spin_unlock+0x1b/0x40
[   13.416422]  dispatch_to_local_dsq+0x108/0x1a0
[   13.416554]  flush_dispatch_buf+0x199/0x1d0
[   13.416652]  balance_one+0x194/0x370
[   13.416751]  balance_scx+0x61/0x1e0
[   13.416848]  prev_balance+0x43/0xb0
[   13.416947]  __pick_next_task+0x6b/0x1b0
[   13.417052]  __schedule+0x20d/0x1740

This happens because dispatch_to_local_dsq() is racing with
dispatch_dequeue() and, when the latter wins, we incorrectly assume that
the task has been moved to dst_rq.

Fix by properly tracking the currently locked rq.

Fixes: 4d3ca89 ("sched_ext: Refactor consume_remote_task()")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 25ddd8f)
Signed-off-by: John Stultz <jstultz@google.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Nov 20, 2025
The namespaces can access the controller's admin request_queue, and
stale references on the namespaces may exist after tearing down the
controller. Ensure the admin request_queue is active by moving the
controller's 'put' to after all controller references have been released
to ensure no one is can access the request_queue. This fixes a reported
use-after-free bug:

  BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
  Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
  CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e torvalds#15
  Tainted: [E]=UNSIGNED_MODULE
  Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4f/0x60
   print_report+0xc4/0x620
   ? _raw_spin_lock_irqsave+0x70/0xb0
   ? _raw_read_unlock_irqrestore+0x30/0x30
   ? blk_queue_enter+0x41c/0x4a0
   kasan_report+0xab/0xe0
   ? blk_queue_enter+0x41c/0x4a0
   blk_queue_enter+0x41c/0x4a0
   ? __irq_work_queue_local+0x75/0x1d0
   ? blk_queue_start_drain+0x70/0x70
   ? irq_work_queue+0x18/0x20
   ? vprintk_emit.part.0+0x1cc/0x350
   ? wake_up_klogd_work_func+0x60/0x60
   blk_mq_alloc_request+0x2b7/0x6b0
   ? __blk_mq_alloc_requests+0x1060/0x1060
   ? __switch_to+0x5b7/0x1060
   nvme_submit_user_cmd+0xa9/0x330
   nvme_user_cmd.isra.0+0x240/0x3f0
   ? force_sigsegv+0xe0/0xe0
   ? nvme_user_cmd64+0x400/0x400
   ? vfs_fileattr_set+0x9b0/0x9b0
   ? cgroup_update_frozen_flag+0x24/0x1c0
   ? cgroup_leave_frozen+0x204/0x330
   ? nvme_ioctl+0x7c/0x2c0
   blkdev_ioctl+0x1a8/0x4d0
   ? blkdev_common_ioctl+0x1930/0x1930
   ? fdget+0x54/0x380
   __x64_sys_ioctl+0x129/0x190
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f765f703b0b
  Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
  RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
  RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
  R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
  R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
   </TASK>

Reported-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
AirFortressIlikara referenced this pull request in AOSC-Tracking/linux Nov 24, 2025
Use BPF_TRAMP_F_INDIRECT flag to detect struct ops and emit proper
prologue and epilogue for this case.

With this patch, all of the struct_ops related testcases (except
struct_ops_multi_pages) passed on LoongArch.

The testcase struct_ops_multi_pages failed is because the actual
image_pages_cnt is 40 which is bigger than MAX_TRAMP_IMAGE_PAGES.

Before:

  $ sudo ./test_progs -t struct_ops -d struct_ops_multi_pages
  ...
  WATCHDOG: test case struct_ops_module/struct_ops_load executes for 10 seconds...

After:

  $ sudo ./test_progs -t struct_ops -d struct_ops_multi_pages
  ...
  #15      bad_struct_ops:OK
  ...
  torvalds#399     struct_ops_autocreate:OK
  ...
  torvalds#400     struct_ops_kptr_return:OK
  ...
  torvalds#401     struct_ops_maybe_null:OK
  ...
  torvalds#402     struct_ops_module:OK
  ...
  torvalds#404     struct_ops_no_cfi:OK
  ...
  torvalds#405     struct_ops_private_stack:SKIP
  ...
  torvalds#406     struct_ops_refcounted:OK
  Summary: 8/25 PASSED, 3 SKIPPED, 0 FAILED

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: maoxiaochuan <maoxiaochuan@loongson.cn>
AirFortressIlikara referenced this pull request in AOSC-Tracking/linux Nov 24, 2025
Use BPF_TRAMP_F_INDIRECT flag to detect struct ops and emit proper
prologue and epilogue for this case.

With this patch, all of the struct_ops related testcases (except
struct_ops_multi_pages) passed on LoongArch.

The testcase struct_ops_multi_pages failed is because the actual
image_pages_cnt is 40 which is bigger than MAX_TRAMP_IMAGE_PAGES.

Before:

  $ sudo ./test_progs -t struct_ops -d struct_ops_multi_pages
  ...
  WATCHDOG: test case struct_ops_module/struct_ops_load executes for 10 seconds...

After:

  $ sudo ./test_progs -t struct_ops -d struct_ops_multi_pages
  ...
  #15      bad_struct_ops:OK
  ...
  torvalds#399     struct_ops_autocreate:OK
  ...
  torvalds#400     struct_ops_kptr_return:OK
  ...
  torvalds#401     struct_ops_maybe_null:OK
  ...
  torvalds#402     struct_ops_module:OK
  ...
  torvalds#404     struct_ops_no_cfi:OK
  ...
  torvalds#405     struct_ops_private_stack:SKIP
  ...
  torvalds#406     struct_ops_refcounted:OK
  Summary: 8/25 PASSED, 3 SKIPPED, 0 FAILED

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: maoxiaochuan <maoxiaochuan@loongson.cn>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Nov 27, 2025
Marc Kleine-Budde <mkl@pengutronix.de> says:

Similarly to how CAN FD reuses the bittiming logic of Classical CAN, CAN XL
also reuses the entirety of CAN FD features, and, on top of that, adds new
features which are specific to CAN XL.

A so-called 'mixed-mode' is intended to have (XL-tolerant) CAN FD nodes and
CAN XL nodes on one CAN segment, where the FD-controllers can talk CC/FD
and the XL-controllers can talk CC/FD/XL. This mixed-mode utilizes the
known error-signalling (ES) for sending CC/FD/XL frames. For CAN FD and CAN
XL the tranceiver delay compensation (TDC) is supported to use common CAN
and CAN-SIG transceivers.

The CANXL-only mode disables the error-signalling in the CAN XL controller.
This mode does not allow CC/FD frames to be sent but additionally offers a
CAN XL transceiver mode switching (TMS) to send CAN XL frames with up to
20Mbit/s data rate. The TMS utilizes a PWM configuration which is added to
the netlink interface.

Configured with CAN_CTRLMODE_FD and CAN_CTRLMODE_XL this leads to:

FD=0 XL=0 CC-only mode         (ES=1)
FD=1 XL=0 FD/CC mixed-mode     (ES=1)
FD=1 XL=1 XL/FD/CC mixed-mode  (ES=1)
FD=0 XL=1 XL-only mode         (ES=0, TMS optional)

Patch #1 print defined ctrlmode strings capitalized to increase the
readability and to be in line with the 'ip' tool (iproute2).

Patch #2 is a small clean-up which makes can_calc_bittiming() use
NL_SET_ERR_MSG() instead of netdev_err().

Patch #3 adds a check in can_dev_dropped_skb() to drop CAN FD frames
when CAN FD is turned off.

Patch #4 adds CAN_CTRLMODE_RESTRICTED. Note that contrary to the other
CAN_CTRL_MODE_XL_* that are introduced in the later patches, this control
mode is not specific to CAN XL. The nuance is that because this restricted
mode was only added in ISO 11898-1:2024, it is made mandatory for CAN XL
devices but optional for other protocols. This is why this patch is added
as a preparation before introducing the core CAN XL logic.

Patch #5 adds all the CAN XL features which are inherited from CAN FD: the
nominal bittiming, the data bittiming and the TDC.

Patch torvalds#6 add a new CAN_CTRLMODE_XL_TMS control mode which is specific to
CAN XL to enable the transceiver mode switching (TMS) in XL-only mode.

Patch torvalds#7 adds a check in can_dev_dropped_skb() to drop CAN CC/FD frames
when the CAN XL controller is in CAN XL-only mode. The introduced
can_dev_in_xl_only_mode() function also determines the error-signalling
configuration for the CAN XL controllers.

Patch torvalds#8 to torvalds#11 add the PWM logic for the CAN XL TMS mode.

Patch torvalds#12 to torvalds#14 add different default sample-points for standard CAN and
CAN SIG transceivers (with TDC) and CAN XL transceivers using PWM in the
CAN XL TMS mode.

Patch torvalds#15 add a dummy_can driver for netlink testing and debugging.

Patch torvalds#16 check CAN frame type (CC/FD/XL) when writing those frames to the
CAN_RAW socket and reject them if it's not supported by the CAN interface.

Patch torvalds#17 increase the resolution when printing the bitrate error and
round-up the value to 0.01% in the case the resolution would still provide
values which would lead to 0.00%.

Link: https://patch.msgid.link/20251126-canxl-v8-0-e7e3eb74f889@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
morbidrsa pushed a commit to morbidrsa/linux that referenced this pull request Dec 1, 2025
The namespaces can access the controller's admin request_queue, and
stale references on the namespaces may exist after tearing down the
controller. Ensure the admin request_queue is active by moving the
controller's 'put' to after all controller references have been released
to ensure no one is can access the request_queue. This fixes a reported
use-after-free bug:

  BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
  Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
  CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e torvalds#15
  Tainted: [E]=UNSIGNED_MODULE
  Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4f/0x60
   print_report+0xc4/0x620
   ? _raw_spin_lock_irqsave+0x70/0xb0
   ? _raw_read_unlock_irqrestore+0x30/0x30
   ? blk_queue_enter+0x41c/0x4a0
   kasan_report+0xab/0xe0
   ? blk_queue_enter+0x41c/0x4a0
   blk_queue_enter+0x41c/0x4a0
   ? __irq_work_queue_local+0x75/0x1d0
   ? blk_queue_start_drain+0x70/0x70
   ? irq_work_queue+0x18/0x20
   ? vprintk_emit.part.0+0x1cc/0x350
   ? wake_up_klogd_work_func+0x60/0x60
   blk_mq_alloc_request+0x2b7/0x6b0
   ? __blk_mq_alloc_requests+0x1060/0x1060
   ? __switch_to+0x5b7/0x1060
   nvme_submit_user_cmd+0xa9/0x330
   nvme_user_cmd.isra.0+0x240/0x3f0
   ? force_sigsegv+0xe0/0xe0
   ? nvme_user_cmd64+0x400/0x400
   ? vfs_fileattr_set+0x9b0/0x9b0
   ? cgroup_update_frozen_flag+0x24/0x1c0
   ? cgroup_leave_frozen+0x204/0x330
   ? nvme_ioctl+0x7c/0x2c0
   blkdev_ioctl+0x1a8/0x4d0
   ? blkdev_common_ioctl+0x1930/0x1930
   ? fdget+0x54/0x380
   __x64_sys_ioctl+0x129/0x190
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f765f703b0b
  Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
  RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
  RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
  R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
  R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
   </TASK>

Reported-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Dec 6, 2025
commit 9d274c1 upstream.

We have been seeing crashes on duplicate keys in
btrfs_set_item_key_safe():

  BTRFS critical (device vdb): slot 4 key (450 108 8192) new key (450 108 8192)
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/ctree.c:2620!
  invalid opcode: 0000 [#1] PREEMPT SMP PTI
  CPU: 0 PID: 3139 Comm: xfs_io Kdump: loaded Not tainted 6.9.0 torvalds#6
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
  RIP: 0010:btrfs_set_item_key_safe+0x11f/0x290 [btrfs]

With the following stack trace:

  #0  btrfs_set_item_key_safe (fs/btrfs/ctree.c:2620:4)
  #1  btrfs_drop_extents (fs/btrfs/file.c:411:4)
  #2  log_one_extent (fs/btrfs/tree-log.c:4732:9)
  #3  btrfs_log_changed_extents (fs/btrfs/tree-log.c:4955:9)
  #4  btrfs_log_inode (fs/btrfs/tree-log.c:6626:9)
  #5  btrfs_log_inode_parent (fs/btrfs/tree-log.c:7070:8)
  torvalds#6  btrfs_log_dentry_safe (fs/btrfs/tree-log.c:7171:8)
  torvalds#7  btrfs_sync_file (fs/btrfs/file.c:1933:8)
  torvalds#8  vfs_fsync_range (fs/sync.c:188:9)
  torvalds#9  vfs_fsync (fs/sync.c:202:9)
  torvalds#10 do_fsync (fs/sync.c:212:9)
  torvalds#11 __do_sys_fdatasync (fs/sync.c:225:9)
  torvalds#12 __se_sys_fdatasync (fs/sync.c:223:1)
  torvalds#13 __x64_sys_fdatasync (fs/sync.c:223:1)
  torvalds#14 do_syscall_x64 (arch/x86/entry/common.c:52:14)
  torvalds#15 do_syscall_64 (arch/x86/entry/common.c:83:7)
  torvalds#16 entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121)

So we're logging a changed extent from fsync, which is splitting an
extent in the log tree. But this split part already exists in the tree,
triggering the BUG().

This is the state of the log tree at the time of the crash, dumped with
drgn (https://github.com/osandov/drgn/blob/main/contrib/btrfs_tree.py)
to get more details than btrfs_print_leaf() gives us:

  >>> print_extent_buffer(prog.crashed_thread().stack_trace()[0]["eb"])
  leaf 33439744 level 0 items 72 generation 9 owner 18446744073709551610
  leaf 33439744 flags 0x100000000000000
  fs uuid e5bd3946-400c-4223-8923-190ef1f18677
  chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da
          item 0 key (450 INODE_ITEM 0) itemoff 16123 itemsize 160
                  generation 7 transid 9 size 8192 nbytes 8473563889606862198
                  block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
                  sequence 204 flags 0x10(PREALLOC)
                  atime 1716417703.220000000 (2024-05-22 15:41:43)
                  ctime 1716417704.983333333 (2024-05-22 15:41:44)
                  mtime 1716417704.983333333 (2024-05-22 15:41:44)
                  otime 17592186044416.000000000 (559444-03-08 01:40:16)
          item 1 key (450 INODE_REF 256) itemoff 16110 itemsize 13
                  index 195 namelen 3 name: 193
          item 2 key (450 XATTR_ITEM 1640047104) itemoff 16073 itemsize 37
                  location key (0 UNKNOWN.0 0) type XATTR
                  transid 7 data_len 1 name_len 6
                  name: user.a
                  data a
          item 3 key (450 EXTENT_DATA 0) itemoff 16020 itemsize 53
                  generation 9 type 1 (regular)
                  extent data disk byte 303144960 nr 12288
                  extent data offset 0 nr 4096 ram 12288
                  extent compression 0 (none)
          item 4 key (450 EXTENT_DATA 4096) itemoff 15967 itemsize 53
                  generation 9 type 2 (prealloc)
                  prealloc data disk byte 303144960 nr 12288
                  prealloc data offset 4096 nr 8192
          item 5 key (450 EXTENT_DATA 8192) itemoff 15914 itemsize 53
                  generation 9 type 2 (prealloc)
                  prealloc data disk byte 303144960 nr 12288
                  prealloc data offset 8192 nr 4096
  ...

So the real problem happened earlier: notice that items 4 (4k-12k) and 5
(8k-12k) overlap. Both are prealloc extents. Item 4 straddles i_size and
item 5 starts at i_size.

Here is the state of the filesystem tree at the time of the crash:

  >>> root = prog.crashed_thread().stack_trace()[2]["inode"].root
  >>> ret, nodes, slots = btrfs_search_slot(root, BtrfsKey(450, 0, 0))
  >>> print_extent_buffer(nodes[0])
  leaf 30425088 level 0 items 184 generation 9 owner 5
  leaf 30425088 flags 0x100000000000000
  fs uuid e5bd3946-400c-4223-8923-190ef1f18677
  chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da
  	...
          item 179 key (450 INODE_ITEM 0) itemoff 4907 itemsize 160
                  generation 7 transid 7 size 4096 nbytes 12288
                  block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
                  sequence 6 flags 0x10(PREALLOC)
                  atime 1716417703.220000000 (2024-05-22 15:41:43)
                  ctime 1716417703.220000000 (2024-05-22 15:41:43)
                  mtime 1716417703.220000000 (2024-05-22 15:41:43)
                  otime 1716417703.220000000 (2024-05-22 15:41:43)
          item 180 key (450 INODE_REF 256) itemoff 4894 itemsize 13
                  index 195 namelen 3 name: 193
          item 181 key (450 XATTR_ITEM 1640047104) itemoff 4857 itemsize 37
                  location key (0 UNKNOWN.0 0) type XATTR
                  transid 7 data_len 1 name_len 6
                  name: user.a
                  data a
          item 182 key (450 EXTENT_DATA 0) itemoff 4804 itemsize 53
                  generation 9 type 1 (regular)
                  extent data disk byte 303144960 nr 12288
                  extent data offset 0 nr 8192 ram 12288
                  extent compression 0 (none)
          item 183 key (450 EXTENT_DATA 8192) itemoff 4751 itemsize 53
                  generation 9 type 2 (prealloc)
                  prealloc data disk byte 303144960 nr 12288
                  prealloc data offset 8192 nr 4096

Item 5 in the log tree corresponds to item 183 in the filesystem tree,
but nothing matches item 4. Furthermore, item 183 is the last item in
the leaf.

btrfs_log_prealloc_extents() is responsible for logging prealloc extents
beyond i_size. It first truncates any previously logged prealloc extents
that start beyond i_size. Then, it walks the filesystem tree and copies
the prealloc extent items to the log tree.

If it hits the end of a leaf, then it calls btrfs_next_leaf(), which
unlocks the tree and does another search. However, while the filesystem
tree is unlocked, an ordered extent completion may modify the tree. In
particular, it may insert an extent item that overlaps with an extent
item that was already copied to the log tree.

This may manifest in several ways depending on the exact scenario,
including an EEXIST error that is silently translated to a full sync,
overlapping items in the log tree, or this crash. This particular crash
is triggered by the following sequence of events:

- Initially, the file has i_size=4k, a regular extent from 0-4k, and a
  prealloc extent beyond i_size from 4k-12k. The prealloc extent item is
  the last item in its B-tree leaf.
- The file is fsync'd, which copies its inode item and both extent items
  to the log tree.
- An xattr is set on the file, which sets the
  BTRFS_INODE_COPY_EVERYTHING flag.
- The range 4k-8k in the file is written using direct I/O. i_size is
  extended to 8k, but the ordered extent is still in flight.
- The file is fsync'd. Since BTRFS_INODE_COPY_EVERYTHING is set, this
  calls copy_inode_items_to_log(), which calls
  btrfs_log_prealloc_extents().
- btrfs_log_prealloc_extents() finds the 4k-12k prealloc extent in the
  filesystem tree. Since it starts before i_size, it skips it. Since it
  is the last item in its B-tree leaf, it calls btrfs_next_leaf().
- btrfs_next_leaf() unlocks the path.
- The ordered extent completion runs, which converts the 4k-8k part of
  the prealloc extent to written and inserts the remaining prealloc part
  from 8k-12k.
- btrfs_next_leaf() does a search and finds the new prealloc extent
  8k-12k.
- btrfs_log_prealloc_extents() copies the 8k-12k prealloc extent into
  the log tree. Note that it overlaps with the 4k-12k prealloc extent
  that was copied to the log tree by the first fsync.
- fsync calls btrfs_log_changed_extents(), which tries to log the 4k-8k
  extent that was written.
- This tries to drop the range 4k-8k in the log tree, which requires
  adjusting the start of the 4k-12k prealloc extent in the log tree to
  8k.
- btrfs_set_item_key_safe() sees that there is already an extent
  starting at 8k in the log tree and calls BUG().

Fix this by detecting when we're about to insert an overlapping file
extent item in the log tree and truncating the part that would overlap.

CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ZXlieC pushed a commit to Xlie-Electronic-Customs/linux that referenced this pull request Dec 7, 2025
… 'T'

When perf report with annotation for a symbol, press 's' and 'T', then exit
the annotate browser. Once annotate the same symbol, the annotate browser
will crash.

The browser.arch was required to be correctly updated when data type
feature was enabled by 'T'. Usually it was initialized by symbol__annotate2
function. If a symbol has already been correctly annotated at the first
time, it should not call the symbol__annotate2 function again, thus the
browser.arch will not get initialized. Then at the second time to show the
annotate browser, the data type needs to be displayed but the browser.arch
is empty.

Stack trace as below:

Perf: Segmentation fault
-------- backtrace --------
    #0 0x55d365 in ui__signal_backtrace setup.c:0
    #1 0x7f5ff1a3e930 in __restore_rt libc.so.6[3e930]
    #2 0x570f08 in arch__is perf[570f08]
    #3 0x562186 in annotate_get_insn_location perf[562186]
    #4 0x562626 in __hist_entry__get_data_type annotate.c:0
    #5 0x56476d in annotation_line__write perf[56476d]
    torvalds#6 0x54e2db in annotate_browser__write annotate.c:0
    torvalds#7 0x54d061 in ui_browser__list_head_refresh perf[54d061]
    torvalds#8 0x54dc9e in annotate_browser__refresh annotate.c:0
    torvalds#9 0x54c03d in __ui_browser__refresh browser.c:0
    torvalds#10 0x54ccf8 in ui_browser__run perf[54ccf8]
    torvalds#11 0x54eb92 in __hist_entry__tui_annotate perf[54eb92]
    torvalds#12 0x552293 in do_annotate hists.c:0
    torvalds#13 0x55941c in evsel__hists_browse hists.c:0
    torvalds#14 0x55b00f in evlist__tui_browse_hists perf[55b00f]
    torvalds#15 0x42ff02 in cmd_report perf[42ff02]
    torvalds#16 0x494008 in run_builtin perf.c:0
    torvalds#17 0x494305 in handle_internal_command perf.c:0
    torvalds#18 0x410547 in main perf[410547]
    torvalds#19 0x7f5ff1a295d0 in __libc_start_call_main libc.so.6[295d0]
    torvalds#20 0x7f5ff1a29680 in __libc_start_main@@GLIBC_2.34 libc.so.6[29680]
    torvalds#21 0x410b75 in _start perf[410b75]

Fixes: 1d4374a ("perf annotate: Add 'T' hot key to toggle data type display")
Reviewed-by: James Clark <james.clark@linaro.org>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Dec 8, 2025
[ Upstream commit 03b3bcd ]

The namespaces can access the controller's admin request_queue, and
stale references on the namespaces may exist after tearing down the
controller. Ensure the admin request_queue is active by moving the
controller's 'put' to after all controller references have been released
to ensure no one is can access the request_queue. This fixes a reported
use-after-free bug:

  BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
  Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
  CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e torvalds#15
  Tainted: [E]=UNSIGNED_MODULE
  Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4f/0x60
   print_report+0xc4/0x620
   ? _raw_spin_lock_irqsave+0x70/0xb0
   ? _raw_read_unlock_irqrestore+0x30/0x30
   ? blk_queue_enter+0x41c/0x4a0
   kasan_report+0xab/0xe0
   ? blk_queue_enter+0x41c/0x4a0
   blk_queue_enter+0x41c/0x4a0
   ? __irq_work_queue_local+0x75/0x1d0
   ? blk_queue_start_drain+0x70/0x70
   ? irq_work_queue+0x18/0x20
   ? vprintk_emit.part.0+0x1cc/0x350
   ? wake_up_klogd_work_func+0x60/0x60
   blk_mq_alloc_request+0x2b7/0x6b0
   ? __blk_mq_alloc_requests+0x1060/0x1060
   ? __switch_to+0x5b7/0x1060
   nvme_submit_user_cmd+0xa9/0x330
   nvme_user_cmd.isra.0+0x240/0x3f0
   ? force_sigsegv+0xe0/0xe0
   ? nvme_user_cmd64+0x400/0x400
   ? vfs_fileattr_set+0x9b0/0x9b0
   ? cgroup_update_frozen_flag+0x24/0x1c0
   ? cgroup_leave_frozen+0x204/0x330
   ? nvme_ioctl+0x7c/0x2c0
   blkdev_ioctl+0x1a8/0x4d0
   ? blkdev_common_ioctl+0x1930/0x1930
   ? fdget+0x54/0x380
   __x64_sys_ioctl+0x129/0x190
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f765f703b0b
  Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
  RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
  RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
  R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
  R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
   </TASK>

Reported-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Dec 8, 2025
[ Upstream commit 03b3bcd ]

The namespaces can access the controller's admin request_queue, and
stale references on the namespaces may exist after tearing down the
controller. Ensure the admin request_queue is active by moving the
controller's 'put' to after all controller references have been released
to ensure no one is can access the request_queue. This fixes a reported
use-after-free bug:

  BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
  Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
  CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e torvalds#15
  Tainted: [E]=UNSIGNED_MODULE
  Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4f/0x60
   print_report+0xc4/0x620
   ? _raw_spin_lock_irqsave+0x70/0xb0
   ? _raw_read_unlock_irqrestore+0x30/0x30
   ? blk_queue_enter+0x41c/0x4a0
   kasan_report+0xab/0xe0
   ? blk_queue_enter+0x41c/0x4a0
   blk_queue_enter+0x41c/0x4a0
   ? __irq_work_queue_local+0x75/0x1d0
   ? blk_queue_start_drain+0x70/0x70
   ? irq_work_queue+0x18/0x20
   ? vprintk_emit.part.0+0x1cc/0x350
   ? wake_up_klogd_work_func+0x60/0x60
   blk_mq_alloc_request+0x2b7/0x6b0
   ? __blk_mq_alloc_requests+0x1060/0x1060
   ? __switch_to+0x5b7/0x1060
   nvme_submit_user_cmd+0xa9/0x330
   nvme_user_cmd.isra.0+0x240/0x3f0
   ? force_sigsegv+0xe0/0xe0
   ? nvme_user_cmd64+0x400/0x400
   ? vfs_fileattr_set+0x9b0/0x9b0
   ? cgroup_update_frozen_flag+0x24/0x1c0
   ? cgroup_leave_frozen+0x204/0x330
   ? nvme_ioctl+0x7c/0x2c0
   blkdev_ioctl+0x1a8/0x4d0
   ? blkdev_common_ioctl+0x1930/0x1930
   ? fdget+0x54/0x380
   __x64_sys_ioctl+0x129/0x190
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f765f703b0b
  Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
  RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
  RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
  R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
  R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
   </TASK>

Reported-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Dec 9, 2025
[ Upstream commit 03b3bcd ]

The namespaces can access the controller's admin request_queue, and
stale references on the namespaces may exist after tearing down the
controller. Ensure the admin request_queue is active by moving the
controller's 'put' to after all controller references have been released
to ensure no one is can access the request_queue. This fixes a reported
use-after-free bug:

  BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
  Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
  CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e torvalds#15
  Tainted: [E]=UNSIGNED_MODULE
  Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4f/0x60
   print_report+0xc4/0x620
   ? _raw_spin_lock_irqsave+0x70/0xb0
   ? _raw_read_unlock_irqrestore+0x30/0x30
   ? blk_queue_enter+0x41c/0x4a0
   kasan_report+0xab/0xe0
   ? blk_queue_enter+0x41c/0x4a0
   blk_queue_enter+0x41c/0x4a0
   ? __irq_work_queue_local+0x75/0x1d0
   ? blk_queue_start_drain+0x70/0x70
   ? irq_work_queue+0x18/0x20
   ? vprintk_emit.part.0+0x1cc/0x350
   ? wake_up_klogd_work_func+0x60/0x60
   blk_mq_alloc_request+0x2b7/0x6b0
   ? __blk_mq_alloc_requests+0x1060/0x1060
   ? __switch_to+0x5b7/0x1060
   nvme_submit_user_cmd+0xa9/0x330
   nvme_user_cmd.isra.0+0x240/0x3f0
   ? force_sigsegv+0xe0/0xe0
   ? nvme_user_cmd64+0x400/0x400
   ? vfs_fileattr_set+0x9b0/0x9b0
   ? cgroup_update_frozen_flag+0x24/0x1c0
   ? cgroup_leave_frozen+0x204/0x330
   ? nvme_ioctl+0x7c/0x2c0
   blkdev_ioctl+0x1a8/0x4d0
   ? blkdev_common_ioctl+0x1930/0x1930
   ? fdget+0x54/0x380
   __x64_sys_ioctl+0x129/0x190
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f765f703b0b
  Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
  RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
  RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
  R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
  R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
   </TASK>

Reported-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Dec 9, 2025
[ Upstream commit 03b3bcd ]

The namespaces can access the controller's admin request_queue, and
stale references on the namespaces may exist after tearing down the
controller. Ensure the admin request_queue is active by moving the
controller's 'put' to after all controller references have been released
to ensure no one is can access the request_queue. This fixes a reported
use-after-free bug:

  BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
  Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
  CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e torvalds#15
  Tainted: [E]=UNSIGNED_MODULE
  Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4f/0x60
   print_report+0xc4/0x620
   ? _raw_spin_lock_irqsave+0x70/0xb0
   ? _raw_read_unlock_irqrestore+0x30/0x30
   ? blk_queue_enter+0x41c/0x4a0
   kasan_report+0xab/0xe0
   ? blk_queue_enter+0x41c/0x4a0
   blk_queue_enter+0x41c/0x4a0
   ? __irq_work_queue_local+0x75/0x1d0
   ? blk_queue_start_drain+0x70/0x70
   ? irq_work_queue+0x18/0x20
   ? vprintk_emit.part.0+0x1cc/0x350
   ? wake_up_klogd_work_func+0x60/0x60
   blk_mq_alloc_request+0x2b7/0x6b0
   ? __blk_mq_alloc_requests+0x1060/0x1060
   ? __switch_to+0x5b7/0x1060
   nvme_submit_user_cmd+0xa9/0x330
   nvme_user_cmd.isra.0+0x240/0x3f0
   ? force_sigsegv+0xe0/0xe0
   ? nvme_user_cmd64+0x400/0x400
   ? vfs_fileattr_set+0x9b0/0x9b0
   ? cgroup_update_frozen_flag+0x24/0x1c0
   ? cgroup_leave_frozen+0x204/0x330
   ? nvme_ioctl+0x7c/0x2c0
   blkdev_ioctl+0x1a8/0x4d0
   ? blkdev_common_ioctl+0x1930/0x1930
   ? fdget+0x54/0x380
   __x64_sys_ioctl+0x129/0x190
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f765f703b0b
  Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
  RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
  RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
  R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
  R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
   </TASK>

Reported-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
reigadegr pushed a commit to reigadegr/linux that referenced this pull request Dec 10, 2025
BugLink: https://bugs.launchpad.net/bugs/2131259

[ Upstream commit 48918ca ]

The test starts a workload and then opens events. If the events fail
to open, for example because of perf_event_paranoid, the gopipe of the
workload is leaked and the file descriptor leak check fails when the
test exits. To avoid this cancel the workload when opening the events
fails.

Before:
```
$ perf test -vv 7
  7: PERF_RECORD_* events & perf_sample fields:
 --- start ---
test child forked, pid 1189568
Using CPUID GenuineIntel-6-B7-1
 ------------------------------------------------------------
perf_event_attr:
  type                    	   0 (PERF_TYPE_HARDWARE)
  config                  	   0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                	   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
  exclude_kernel                   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8 = 3
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
  exclude_kernel                   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8 = 3
Attempt to add: software/cpu-clock/
..after resolving event: software/config=0/
cpu-clock -> software/cpu-clock/
 ------------------------------------------------------------
perf_event_attr:
  type                             1 (PERF_TYPE_SOFTWARE)
  size                             136
  config                           0x9 (PERF_COUNT_SW_DUMMY)
  sample_type                      IP|TID|TIME|CPU
  read_format                      ID|LOST
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  enable_on_exec                   1
  task                             1
  sample_id_all                    1
  mmap2                            1
  comm_exec                        1
  ksymbol                          1
  bpf_event                        1
  { wakeup_events, wakeup_watermark } 1
 ------------------------------------------------------------
sys_perf_event_open: pid 1189569  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
perf_evlist__open: Permission denied
 ---- end(-2) ----
Leak of file descriptor 6 that opened: 'pipe:[14200347]'
 ---- unexpected signal (6) ----
iFailed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
    #0 0x565358f6666e in child_test_sig_handler builtin-test.c:311
    #1 0x7f29ce849df0 in __restore_rt libc_sigaction.c:0
    #2 0x7f29ce89e95c in __pthread_kill_implementation pthread_kill.c:44
    #3 0x7f29ce849cc2 in raise raise.c:27
    #4 0x7f29ce8324ac in abort abort.c:81
    #5 0x565358f662d4 in check_leaks builtin-test.c:226
    torvalds#6 0x565358f6682e in run_test_child builtin-test.c:344
    torvalds#7 0x565358ef7121 in start_command run-command.c:128
    torvalds#8 0x565358f67273 in start_test builtin-test.c:545
    torvalds#9 0x565358f6771d in __cmd_test builtin-test.c:647
    torvalds#10 0x565358f682bd in cmd_test builtin-test.c:849
    torvalds#11 0x565358ee5ded in run_builtin perf.c:349
    torvalds#12 0x565358ee6085 in handle_internal_command perf.c:401
    torvalds#13 0x565358ee61de in run_argv perf.c:448
    torvalds#14 0x565358ee6527 in main perf.c:555
    torvalds#15 0x7f29ce833ca8 in __libc_start_call_main libc_start_call_main.h:74
    torvalds#16 0x7f29ce833d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
    torvalds#17 0x565358e391c1 in _start perf[851c1]
  7: PERF_RECORD_* events & perf_sample fields                       : FAILED!
```

After:
```
$ perf test 7
  7: PERF_RECORD_* events & perf_sample fields                       : Skip (permissions)
```

Fixes: 16d00fe ("perf tests: Move test__PERF_RECORD into separate object")
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.ibm.com>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Alice C. Munduruca <alice.munduruca@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Dec 10, 2025
[ Upstream commit 03b3bcd ]

The namespaces can access the controller's admin request_queue, and
stale references on the namespaces may exist after tearing down the
controller. Ensure the admin request_queue is active by moving the
controller's 'put' to after all controller references have been released
to ensure no one is can access the request_queue. This fixes a reported
use-after-free bug:

  BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
  Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
  CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e torvalds#15
  Tainted: [E]=UNSIGNED_MODULE
  Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4f/0x60
   print_report+0xc4/0x620
   ? _raw_spin_lock_irqsave+0x70/0xb0
   ? _raw_read_unlock_irqrestore+0x30/0x30
   ? blk_queue_enter+0x41c/0x4a0
   kasan_report+0xab/0xe0
   ? blk_queue_enter+0x41c/0x4a0
   blk_queue_enter+0x41c/0x4a0
   ? __irq_work_queue_local+0x75/0x1d0
   ? blk_queue_start_drain+0x70/0x70
   ? irq_work_queue+0x18/0x20
   ? vprintk_emit.part.0+0x1cc/0x350
   ? wake_up_klogd_work_func+0x60/0x60
   blk_mq_alloc_request+0x2b7/0x6b0
   ? __blk_mq_alloc_requests+0x1060/0x1060
   ? __switch_to+0x5b7/0x1060
   nvme_submit_user_cmd+0xa9/0x330
   nvme_user_cmd.isra.0+0x240/0x3f0
   ? force_sigsegv+0xe0/0xe0
   ? nvme_user_cmd64+0x400/0x400
   ? vfs_fileattr_set+0x9b0/0x9b0
   ? cgroup_update_frozen_flag+0x24/0x1c0
   ? cgroup_leave_frozen+0x204/0x330
   ? nvme_ioctl+0x7c/0x2c0
   blkdev_ioctl+0x1a8/0x4d0
   ? blkdev_common_ioctl+0x1930/0x1930
   ? fdget+0x54/0x380
   __x64_sys_ioctl+0x129/0x190
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f765f703b0b
  Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
  RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
  RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
  R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
  R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
   </TASK>

Reported-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Dec 10, 2025
[ Upstream commit 03b3bcd ]

The namespaces can access the controller's admin request_queue, and
stale references on the namespaces may exist after tearing down the
controller. Ensure the admin request_queue is active by moving the
controller's 'put' to after all controller references have been released
to ensure no one is can access the request_queue. This fixes a reported
use-after-free bug:

  BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
  Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
  CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e torvalds#15
  Tainted: [E]=UNSIGNED_MODULE
  Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4f/0x60
   print_report+0xc4/0x620
   ? _raw_spin_lock_irqsave+0x70/0xb0
   ? _raw_read_unlock_irqrestore+0x30/0x30
   ? blk_queue_enter+0x41c/0x4a0
   kasan_report+0xab/0xe0
   ? blk_queue_enter+0x41c/0x4a0
   blk_queue_enter+0x41c/0x4a0
   ? __irq_work_queue_local+0x75/0x1d0
   ? blk_queue_start_drain+0x70/0x70
   ? irq_work_queue+0x18/0x20
   ? vprintk_emit.part.0+0x1cc/0x350
   ? wake_up_klogd_work_func+0x60/0x60
   blk_mq_alloc_request+0x2b7/0x6b0
   ? __blk_mq_alloc_requests+0x1060/0x1060
   ? __switch_to+0x5b7/0x1060
   nvme_submit_user_cmd+0xa9/0x330
   nvme_user_cmd.isra.0+0x240/0x3f0
   ? force_sigsegv+0xe0/0xe0
   ? nvme_user_cmd64+0x400/0x400
   ? vfs_fileattr_set+0x9b0/0x9b0
   ? cgroup_update_frozen_flag+0x24/0x1c0
   ? cgroup_leave_frozen+0x204/0x330
   ? nvme_ioctl+0x7c/0x2c0
   blkdev_ioctl+0x1a8/0x4d0
   ? blkdev_common_ioctl+0x1930/0x1930
   ? fdget+0x54/0x380
   __x64_sys_ioctl+0x129/0x190
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f765f703b0b
  Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
  RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
  RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
  R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
  R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
   </TASK>

Reported-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Dec 10, 2025
[ Upstream commit 03b3bcd ]

The namespaces can access the controller's admin request_queue, and
stale references on the namespaces may exist after tearing down the
controller. Ensure the admin request_queue is active by moving the
controller's 'put' to after all controller references have been released
to ensure no one is can access the request_queue. This fixes a reported
use-after-free bug:

  BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
  Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
  CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e torvalds#15
  Tainted: [E]=UNSIGNED_MODULE
  Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4f/0x60
   print_report+0xc4/0x620
   ? _raw_spin_lock_irqsave+0x70/0xb0
   ? _raw_read_unlock_irqrestore+0x30/0x30
   ? blk_queue_enter+0x41c/0x4a0
   kasan_report+0xab/0xe0
   ? blk_queue_enter+0x41c/0x4a0
   blk_queue_enter+0x41c/0x4a0
   ? __irq_work_queue_local+0x75/0x1d0
   ? blk_queue_start_drain+0x70/0x70
   ? irq_work_queue+0x18/0x20
   ? vprintk_emit.part.0+0x1cc/0x350
   ? wake_up_klogd_work_func+0x60/0x60
   blk_mq_alloc_request+0x2b7/0x6b0
   ? __blk_mq_alloc_requests+0x1060/0x1060
   ? __switch_to+0x5b7/0x1060
   nvme_submit_user_cmd+0xa9/0x330
   nvme_user_cmd.isra.0+0x240/0x3f0
   ? force_sigsegv+0xe0/0xe0
   ? nvme_user_cmd64+0x400/0x400
   ? vfs_fileattr_set+0x9b0/0x9b0
   ? cgroup_update_frozen_flag+0x24/0x1c0
   ? cgroup_leave_frozen+0x204/0x330
   ? nvme_ioctl+0x7c/0x2c0
   blkdev_ioctl+0x1a8/0x4d0
   ? blkdev_common_ioctl+0x1930/0x1930
   ? fdget+0x54/0x380
   __x64_sys_ioctl+0x129/0x190
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f765f703b0b
  Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
  RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
  RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
  R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
  R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
   </TASK>

Reported-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Dec 12, 2025
[ Upstream commit 03b3bcd ]

The namespaces can access the controller's admin request_queue, and
stale references on the namespaces may exist after tearing down the
controller. Ensure the admin request_queue is active by moving the
controller's 'put' to after all controller references have been released
to ensure no one is can access the request_queue. This fixes a reported
use-after-free bug:

  BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
  Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
  CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e torvalds#15
  Tainted: [E]=UNSIGNED_MODULE
  Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4f/0x60
   print_report+0xc4/0x620
   ? _raw_spin_lock_irqsave+0x70/0xb0
   ? _raw_read_unlock_irqrestore+0x30/0x30
   ? blk_queue_enter+0x41c/0x4a0
   kasan_report+0xab/0xe0
   ? blk_queue_enter+0x41c/0x4a0
   blk_queue_enter+0x41c/0x4a0
   ? __irq_work_queue_local+0x75/0x1d0
   ? blk_queue_start_drain+0x70/0x70
   ? irq_work_queue+0x18/0x20
   ? vprintk_emit.part.0+0x1cc/0x350
   ? wake_up_klogd_work_func+0x60/0x60
   blk_mq_alloc_request+0x2b7/0x6b0
   ? __blk_mq_alloc_requests+0x1060/0x1060
   ? __switch_to+0x5b7/0x1060
   nvme_submit_user_cmd+0xa9/0x330
   nvme_user_cmd.isra.0+0x240/0x3f0
   ? force_sigsegv+0xe0/0xe0
   ? nvme_user_cmd64+0x400/0x400
   ? vfs_fileattr_set+0x9b0/0x9b0
   ? cgroup_update_frozen_flag+0x24/0x1c0
   ? cgroup_leave_frozen+0x204/0x330
   ? nvme_ioctl+0x7c/0x2c0
   blkdev_ioctl+0x1a8/0x4d0
   ? blkdev_common_ioctl+0x1930/0x1930
   ? fdget+0x54/0x380
   __x64_sys_ioctl+0x129/0x190
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f765f703b0b
  Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
  RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
  RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
  R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
  R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
   </TASK>

Reported-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Dec 12, 2025
[ Upstream commit 03b3bcd ]

The namespaces can access the controller's admin request_queue, and
stale references on the namespaces may exist after tearing down the
controller. Ensure the admin request_queue is active by moving the
controller's 'put' to after all controller references have been released
to ensure no one is can access the request_queue. This fixes a reported
use-after-free bug:

  BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
  Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
  CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e torvalds#15
  Tainted: [E]=UNSIGNED_MODULE
  Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4f/0x60
   print_report+0xc4/0x620
   ? _raw_spin_lock_irqsave+0x70/0xb0
   ? _raw_read_unlock_irqrestore+0x30/0x30
   ? blk_queue_enter+0x41c/0x4a0
   kasan_report+0xab/0xe0
   ? blk_queue_enter+0x41c/0x4a0
   blk_queue_enter+0x41c/0x4a0
   ? __irq_work_queue_local+0x75/0x1d0
   ? blk_queue_start_drain+0x70/0x70
   ? irq_work_queue+0x18/0x20
   ? vprintk_emit.part.0+0x1cc/0x350
   ? wake_up_klogd_work_func+0x60/0x60
   blk_mq_alloc_request+0x2b7/0x6b0
   ? __blk_mq_alloc_requests+0x1060/0x1060
   ? __switch_to+0x5b7/0x1060
   nvme_submit_user_cmd+0xa9/0x330
   nvme_user_cmd.isra.0+0x240/0x3f0
   ? force_sigsegv+0xe0/0xe0
   ? nvme_user_cmd64+0x400/0x400
   ? vfs_fileattr_set+0x9b0/0x9b0
   ? cgroup_update_frozen_flag+0x24/0x1c0
   ? cgroup_leave_frozen+0x204/0x330
   ? nvme_ioctl+0x7c/0x2c0
   blkdev_ioctl+0x1a8/0x4d0
   ? blkdev_common_ioctl+0x1930/0x1930
   ? fdget+0x54/0x380
   __x64_sys_ioctl+0x129/0x190
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f765f703b0b
  Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
  RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
  RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
  R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
  R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
   </TASK>

Reported-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Dec 13, 2025
[ Upstream commit 03b3bcd ]

The namespaces can access the controller's admin request_queue, and
stale references on the namespaces may exist after tearing down the
controller. Ensure the admin request_queue is active by moving the
controller's 'put' to after all controller references have been released
to ensure no one is can access the request_queue. This fixes a reported
use-after-free bug:

  BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
  Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
  CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e torvalds#15
  Tainted: [E]=UNSIGNED_MODULE
  Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4f/0x60
   print_report+0xc4/0x620
   ? _raw_spin_lock_irqsave+0x70/0xb0
   ? _raw_read_unlock_irqrestore+0x30/0x30
   ? blk_queue_enter+0x41c/0x4a0
   kasan_report+0xab/0xe0
   ? blk_queue_enter+0x41c/0x4a0
   blk_queue_enter+0x41c/0x4a0
   ? __irq_work_queue_local+0x75/0x1d0
   ? blk_queue_start_drain+0x70/0x70
   ? irq_work_queue+0x18/0x20
   ? vprintk_emit.part.0+0x1cc/0x350
   ? wake_up_klogd_work_func+0x60/0x60
   blk_mq_alloc_request+0x2b7/0x6b0
   ? __blk_mq_alloc_requests+0x1060/0x1060
   ? __switch_to+0x5b7/0x1060
   nvme_submit_user_cmd+0xa9/0x330
   nvme_user_cmd.isra.0+0x240/0x3f0
   ? force_sigsegv+0xe0/0xe0
   ? nvme_user_cmd64+0x400/0x400
   ? vfs_fileattr_set+0x9b0/0x9b0
   ? cgroup_update_frozen_flag+0x24/0x1c0
   ? cgroup_leave_frozen+0x204/0x330
   ? nvme_ioctl+0x7c/0x2c0
   blkdev_ioctl+0x1a8/0x4d0
   ? blkdev_common_ioctl+0x1930/0x1930
   ? fdget+0x54/0x380
   __x64_sys_ioctl+0x129/0x190
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f765f703b0b
  Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
  RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
  RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
  R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
  R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
   </TASK>

Reported-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Dec 14, 2025
[ Upstream commit 03b3bcd ]

The namespaces can access the controller's admin request_queue, and
stale references on the namespaces may exist after tearing down the
controller. Ensure the admin request_queue is active by moving the
controller's 'put' to after all controller references have been released
to ensure no one is can access the request_queue. This fixes a reported
use-after-free bug:

  BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
  Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
  CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e torvalds#15
  Tainted: [E]=UNSIGNED_MODULE
  Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4f/0x60
   print_report+0xc4/0x620
   ? _raw_spin_lock_irqsave+0x70/0xb0
   ? _raw_read_unlock_irqrestore+0x30/0x30
   ? blk_queue_enter+0x41c/0x4a0
   kasan_report+0xab/0xe0
   ? blk_queue_enter+0x41c/0x4a0
   blk_queue_enter+0x41c/0x4a0
   ? __irq_work_queue_local+0x75/0x1d0
   ? blk_queue_start_drain+0x70/0x70
   ? irq_work_queue+0x18/0x20
   ? vprintk_emit.part.0+0x1cc/0x350
   ? wake_up_klogd_work_func+0x60/0x60
   blk_mq_alloc_request+0x2b7/0x6b0
   ? __blk_mq_alloc_requests+0x1060/0x1060
   ? __switch_to+0x5b7/0x1060
   nvme_submit_user_cmd+0xa9/0x330
   nvme_user_cmd.isra.0+0x240/0x3f0
   ? force_sigsegv+0xe0/0xe0
   ? nvme_user_cmd64+0x400/0x400
   ? vfs_fileattr_set+0x9b0/0x9b0
   ? cgroup_update_frozen_flag+0x24/0x1c0
   ? cgroup_leave_frozen+0x204/0x330
   ? nvme_ioctl+0x7c/0x2c0
   blkdev_ioctl+0x1a8/0x4d0
   ? blkdev_common_ioctl+0x1930/0x1930
   ? fdget+0x54/0x380
   __x64_sys_ioctl+0x129/0x190
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f765f703b0b
  Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
  RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
  RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
  R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
  R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
   </TASK>

Reported-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Dec 16, 2025
[ Upstream commit 03b3bcd ]

The namespaces can access the controller's admin request_queue, and
stale references on the namespaces may exist after tearing down the
controller. Ensure the admin request_queue is active by moving the
controller's 'put' to after all controller references have been released
to ensure no one is can access the request_queue. This fixes a reported
use-after-free bug:

  BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
  Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
  CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e torvalds#15
  Tainted: [E]=UNSIGNED_MODULE
  Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4f/0x60
   print_report+0xc4/0x620
   ? _raw_spin_lock_irqsave+0x70/0xb0
   ? _raw_read_unlock_irqrestore+0x30/0x30
   ? blk_queue_enter+0x41c/0x4a0
   kasan_report+0xab/0xe0
   ? blk_queue_enter+0x41c/0x4a0
   blk_queue_enter+0x41c/0x4a0
   ? __irq_work_queue_local+0x75/0x1d0
   ? blk_queue_start_drain+0x70/0x70
   ? irq_work_queue+0x18/0x20
   ? vprintk_emit.part.0+0x1cc/0x350
   ? wake_up_klogd_work_func+0x60/0x60
   blk_mq_alloc_request+0x2b7/0x6b0
   ? __blk_mq_alloc_requests+0x1060/0x1060
   ? __switch_to+0x5b7/0x1060
   nvme_submit_user_cmd+0xa9/0x330
   nvme_user_cmd.isra.0+0x240/0x3f0
   ? force_sigsegv+0xe0/0xe0
   ? nvme_user_cmd64+0x400/0x400
   ? vfs_fileattr_set+0x9b0/0x9b0
   ? cgroup_update_frozen_flag+0x24/0x1c0
   ? cgroup_leave_frozen+0x204/0x330
   ? nvme_ioctl+0x7c/0x2c0
   blkdev_ioctl+0x1a8/0x4d0
   ? blkdev_common_ioctl+0x1930/0x1930
   ? fdget+0x54/0x380
   __x64_sys_ioctl+0x129/0x190
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f765f703b0b
  Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
  RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
  RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
  R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
  R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
   </TASK>

Reported-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Dec 16, 2025
[ Upstream commit 03b3bcd ]

The namespaces can access the controller's admin request_queue, and
stale references on the namespaces may exist after tearing down the
controller. Ensure the admin request_queue is active by moving the
controller's 'put' to after all controller references have been released
to ensure no one is can access the request_queue. This fixes a reported
use-after-free bug:

  BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
  Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
  CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e torvalds#15
  Tainted: [E]=UNSIGNED_MODULE
  Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4f/0x60
   print_report+0xc4/0x620
   ? _raw_spin_lock_irqsave+0x70/0xb0
   ? _raw_read_unlock_irqrestore+0x30/0x30
   ? blk_queue_enter+0x41c/0x4a0
   kasan_report+0xab/0xe0
   ? blk_queue_enter+0x41c/0x4a0
   blk_queue_enter+0x41c/0x4a0
   ? __irq_work_queue_local+0x75/0x1d0
   ? blk_queue_start_drain+0x70/0x70
   ? irq_work_queue+0x18/0x20
   ? vprintk_emit.part.0+0x1cc/0x350
   ? wake_up_klogd_work_func+0x60/0x60
   blk_mq_alloc_request+0x2b7/0x6b0
   ? __blk_mq_alloc_requests+0x1060/0x1060
   ? __switch_to+0x5b7/0x1060
   nvme_submit_user_cmd+0xa9/0x330
   nvme_user_cmd.isra.0+0x240/0x3f0
   ? force_sigsegv+0xe0/0xe0
   ? nvme_user_cmd64+0x400/0x400
   ? vfs_fileattr_set+0x9b0/0x9b0
   ? cgroup_update_frozen_flag+0x24/0x1c0
   ? cgroup_leave_frozen+0x204/0x330
   ? nvme_ioctl+0x7c/0x2c0
   blkdev_ioctl+0x1a8/0x4d0
   ? blkdev_common_ioctl+0x1930/0x1930
   ? fdget+0x54/0x380
   __x64_sys_ioctl+0x129/0x190
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f765f703b0b
  Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
  RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
  RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
  R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
  R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
   </TASK>

Reported-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Dec 19, 2025
[ Upstream commit 03b3bcd ]

The namespaces can access the controller's admin request_queue, and
stale references on the namespaces may exist after tearing down the
controller. Ensure the admin request_queue is active by moving the
controller's 'put' to after all controller references have been released
to ensure no one is can access the request_queue. This fixes a reported
use-after-free bug:

  BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
  Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
  CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e torvalds#15
  Tainted: [E]=UNSIGNED_MODULE
  Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4f/0x60
   print_report+0xc4/0x620
   ? _raw_spin_lock_irqsave+0x70/0xb0
   ? _raw_read_unlock_irqrestore+0x30/0x30
   ? blk_queue_enter+0x41c/0x4a0
   kasan_report+0xab/0xe0
   ? blk_queue_enter+0x41c/0x4a0
   blk_queue_enter+0x41c/0x4a0
   ? __irq_work_queue_local+0x75/0x1d0
   ? blk_queue_start_drain+0x70/0x70
   ? irq_work_queue+0x18/0x20
   ? vprintk_emit.part.0+0x1cc/0x350
   ? wake_up_klogd_work_func+0x60/0x60
   blk_mq_alloc_request+0x2b7/0x6b0
   ? __blk_mq_alloc_requests+0x1060/0x1060
   ? __switch_to+0x5b7/0x1060
   nvme_submit_user_cmd+0xa9/0x330
   nvme_user_cmd.isra.0+0x240/0x3f0
   ? force_sigsegv+0xe0/0xe0
   ? nvme_user_cmd64+0x400/0x400
   ? vfs_fileattr_set+0x9b0/0x9b0
   ? cgroup_update_frozen_flag+0x24/0x1c0
   ? cgroup_leave_frozen+0x204/0x330
   ? nvme_ioctl+0x7c/0x2c0
   blkdev_ioctl+0x1a8/0x4d0
   ? blkdev_common_ioctl+0x1930/0x1930
   ? fdget+0x54/0x380
   __x64_sys_ioctl+0x129/0x190
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f765f703b0b
  Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
  RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
  RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
  R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
  R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
   </TASK>

Reported-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Dec 19, 2025
The usage of BPF_TRAMP_F_ORIG_STACK in __arch_prepare_bpf_trampoline() is
wrong, and it should be BPF_TRAMP_F_CALL_ORIG, which caused crash as
Andreas reported:

  Insufficient stack space to handle exception!
  Task stack:     [0xff20000000010000..0xff20000000014000]
  Overflow stack: [0xff600000ffdad070..0xff600000ffdae070]
  CPU: 1 UID: 0 PID: 1 Comm: systemd Not tainted 6.18.0-rc5+ torvalds#15 PREEMPT(voluntary)
  Hardware name: riscv-virtio qemu/qemu, BIOS 2025.10 10/01/2025
  epc : copy_from_kernel_nofault+0xa/0x198
   ra : bpf_probe_read_kernel+0x20/0x60
  epc : ffffffff802b732a ra : ffffffff801e6070 sp : ff2000000000ffe0
   gp : ffffffff82262ed0 tp : 0000000000000000 t0 : ffffffff80022320
   t1 : ffffffff801e6056 t2 : 0000000000000000 s0 : ff20000000010040
   s1 : 0000000000000008 a0 : ff20000000010050 a1 : ff60000083b3d320
   a2 : 0000000000000008 a3 : 0000000000000097 a4 : 0000000000000000
   a5 : 0000000000000000 a6 : 0000000000000021 a7 : 0000000000000003
   s2 : ff20000000010050 s3 : ff6000008459fc18 s4 : ff60000083b3d340
   s5 : ff20000000010060 s6 : 0000000000000000 s7 : ff20000000013aa8
   s8 : 0000000000000000 s9 : 0000000000008000 s10: 000000000058dcb0
   s11: 000000000058dca7 t3 : 000000006925116d t4 : ff6000008090f026
   t5 : 00007fff9b0cbaa8 t6 : 0000000000000016
  status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000005
  Kernel panic - not syncing: Kernel stack overflow
  CPU: 1 UID: 0 PID: 1 Comm: systemd Not tainted 6.18.0-rc5+ torvalds#15 PREEMPT(voluntary)
  Hardware name: riscv-virtio qemu/qemu, BIOS 2025.10 10/01/2025
  Call Trace:
  [<ffffffff8001a1f8>] dump_backtrace+0x28/0x38
  [<ffffffff80002502>] show_stack+0x3a/0x50
  [<ffffffff800122be>] dump_stack_lvl+0x56/0x80
  [<ffffffff80012300>] dump_stack+0x18/0x22
  [<ffffffff80002abe>] vpanic+0xf6/0x328
  [<ffffffff80002d2e>] panic+0x3e/0x40
  [<ffffffff80019ef0>] handle_bad_stack+0x98/0xa0
  [<ffffffff801e6070>] bpf_probe_read_kernel+0x20/0x60

Just fix it.

Fixes: 47c9214 ("bpf: fix the usage of BPF_TRAMP_F_SKIP_FRAME")
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Closes: https://lore.kernel.org/bpf/874ipnkfvt.fsf@igel.home/
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Dec 19, 2025
The usage of BPF_TRAMP_F_ORIG_STACK in __arch_prepare_bpf_trampoline() is
wrong, and it should be BPF_TRAMP_F_CALL_ORIG, which caused crash as
Andreas reported:

  Insufficient stack space to handle exception!
  Task stack:     [0xff20000000010000..0xff20000000014000]
  Overflow stack: [0xff600000ffdad070..0xff600000ffdae070]
  CPU: 1 UID: 0 PID: 1 Comm: systemd Not tainted 6.18.0-rc5+ torvalds#15 PREEMPT(voluntary)
  Hardware name: riscv-virtio qemu/qemu, BIOS 2025.10 10/01/2025
  epc : copy_from_kernel_nofault+0xa/0x198
   ra : bpf_probe_read_kernel+0x20/0x60
  epc : ffffffff802b732a ra : ffffffff801e6070 sp : ff2000000000ffe0
   gp : ffffffff82262ed0 tp : 0000000000000000 t0 : ffffffff80022320
   t1 : ffffffff801e6056 t2 : 0000000000000000 s0 : ff20000000010040
   s1 : 0000000000000008 a0 : ff20000000010050 a1 : ff60000083b3d320
   a2 : 0000000000000008 a3 : 0000000000000097 a4 : 0000000000000000
   a5 : 0000000000000000 a6 : 0000000000000021 a7 : 0000000000000003
   s2 : ff20000000010050 s3 : ff6000008459fc18 s4 : ff60000083b3d340
   s5 : ff20000000010060 s6 : 0000000000000000 s7 : ff20000000013aa8
   s8 : 0000000000000000 s9 : 0000000000008000 s10: 000000000058dcb0
   s11: 000000000058dca7 t3 : 000000006925116d t4 : ff6000008090f026
   t5 : 00007fff9b0cbaa8 t6 : 0000000000000016
  status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000005
  Kernel panic - not syncing: Kernel stack overflow
  CPU: 1 UID: 0 PID: 1 Comm: systemd Not tainted 6.18.0-rc5+ torvalds#15 PREEMPT(voluntary)
  Hardware name: riscv-virtio qemu/qemu, BIOS 2025.10 10/01/2025
  Call Trace:
  [<ffffffff8001a1f8>] dump_backtrace+0x28/0x38
  [<ffffffff80002502>] show_stack+0x3a/0x50
  [<ffffffff800122be>] dump_stack_lvl+0x56/0x80
  [<ffffffff80012300>] dump_stack+0x18/0x22
  [<ffffffff80002abe>] vpanic+0xf6/0x328
  [<ffffffff80002d2e>] panic+0x3e/0x40
  [<ffffffff80019ef0>] handle_bad_stack+0x98/0xa0
  [<ffffffff801e6070>] bpf_probe_read_kernel+0x20/0x60

Just fix it.

Fixes: 47c9214 ("bpf: fix the usage of BPF_TRAMP_F_SKIP_FRAME")
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Closes: https://lore.kernel.org/bpf/874ipnkfvt.fsf@igel.home/
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
RevySR pushed a commit to revyos/linux that referenced this pull request Dec 20, 2025
The usage of BPF_TRAMP_F_ORIG_STACK in __arch_prepare_bpf_trampoline() is
wrong, and it should be BPF_TRAMP_F_CALL_ORIG, which caused crash as
Andreas reported:

  Insufficient stack space to handle exception!
  Task stack:     [0xff20000000010000..0xff20000000014000]
  Overflow stack: [0xff600000ffdad070..0xff600000ffdae070]
  CPU: 1 UID: 0 PID: 1 Comm: systemd Not tainted 6.18.0-rc5+ torvalds#15 PREEMPT(voluntary)
  Hardware name: riscv-virtio qemu/qemu, BIOS 2025.10 10/01/2025
  epc : copy_from_kernel_nofault+0xa/0x198
   ra : bpf_probe_read_kernel+0x20/0x60
  epc : ffffffff802b732a ra : ffffffff801e6070 sp : ff2000000000ffe0
   gp : ffffffff82262ed0 tp : 0000000000000000 t0 : ffffffff80022320
   t1 : ffffffff801e6056 t2 : 0000000000000000 s0 : ff20000000010040
   s1 : 0000000000000008 a0 : ff20000000010050 a1 : ff60000083b3d320
   a2 : 0000000000000008 a3 : 0000000000000097 a4 : 0000000000000000
   a5 : 0000000000000000 a6 : 0000000000000021 a7 : 0000000000000003
   s2 : ff20000000010050 s3 : ff6000008459fc18 s4 : ff60000083b3d340
   s5 : ff20000000010060 s6 : 0000000000000000 s7 : ff20000000013aa8
   s8 : 0000000000000000 s9 : 0000000000008000 s10: 000000000058dcb0
   s11: 000000000058dca7 t3 : 000000006925116d t4 : ff6000008090f026
   t5 : 00007fff9b0cbaa8 t6 : 0000000000000016
  status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000005
  Kernel panic - not syncing: Kernel stack overflow
  CPU: 1 UID: 0 PID: 1 Comm: systemd Not tainted 6.18.0-rc5+ torvalds#15 PREEMPT(voluntary)
  Hardware name: riscv-virtio qemu/qemu, BIOS 2025.10 10/01/2025
  Call Trace:
  [<ffffffff8001a1f8>] dump_backtrace+0x28/0x38
  [<ffffffff80002502>] show_stack+0x3a/0x50
  [<ffffffff800122be>] dump_stack_lvl+0x56/0x80
  [<ffffffff80012300>] dump_stack+0x18/0x22
  [<ffffffff80002abe>] vpanic+0xf6/0x328
  [<ffffffff80002d2e>] panic+0x3e/0x40
  [<ffffffff80019ef0>] handle_bad_stack+0x98/0xa0
  [<ffffffff801e6070>] bpf_probe_read_kernel+0x20/0x60

Just fix it.

Fixes: 47c9214 ("bpf: fix the usage of BPF_TRAMP_F_SKIP_FRAME")
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Closes: https://lore.kernel.org/bpf/874ipnkfvt.fsf@igel.home/
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20251219142948.204312-1-dongml2@chinatelecom.cn
Signed-off-by: Han Gao <gaohan@iscas.ac.cn>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Dec 20, 2025
The usage of BPF_TRAMP_F_ORIG_STACK in __arch_prepare_bpf_trampoline() is
wrong, and it should be BPF_TRAMP_F_CALL_ORIG, which caused crash as
Andreas reported:

  Insufficient stack space to handle exception!
  Task stack:     [0xff20000000010000..0xff20000000014000]
  Overflow stack: [0xff600000ffdad070..0xff600000ffdae070]
  CPU: 1 UID: 0 PID: 1 Comm: systemd Not tainted 6.18.0-rc5+ torvalds#15 PREEMPT(voluntary)
  Hardware name: riscv-virtio qemu/qemu, BIOS 2025.10 10/01/2025
  epc : copy_from_kernel_nofault+0xa/0x198
   ra : bpf_probe_read_kernel+0x20/0x60
  epc : ffffffff802b732a ra : ffffffff801e6070 sp : ff2000000000ffe0
   gp : ffffffff82262ed0 tp : 0000000000000000 t0 : ffffffff80022320
   t1 : ffffffff801e6056 t2 : 0000000000000000 s0 : ff20000000010040
   s1 : 0000000000000008 a0 : ff20000000010050 a1 : ff60000083b3d320
   a2 : 0000000000000008 a3 : 0000000000000097 a4 : 0000000000000000
   a5 : 0000000000000000 a6 : 0000000000000021 a7 : 0000000000000003
   s2 : ff20000000010050 s3 : ff6000008459fc18 s4 : ff60000083b3d340
   s5 : ff20000000010060 s6 : 0000000000000000 s7 : ff20000000013aa8
   s8 : 0000000000000000 s9 : 0000000000008000 s10: 000000000058dcb0
   s11: 000000000058dca7 t3 : 000000006925116d t4 : ff6000008090f026
   t5 : 00007fff9b0cbaa8 t6 : 0000000000000016
  status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000005
  Kernel panic - not syncing: Kernel stack overflow
  CPU: 1 UID: 0 PID: 1 Comm: systemd Not tainted 6.18.0-rc5+ torvalds#15 PREEMPT(voluntary)
  Hardware name: riscv-virtio qemu/qemu, BIOS 2025.10 10/01/2025
  Call Trace:
  [<ffffffff8001a1f8>] dump_backtrace+0x28/0x38
  [<ffffffff80002502>] show_stack+0x3a/0x50
  [<ffffffff800122be>] dump_stack_lvl+0x56/0x80
  [<ffffffff80012300>] dump_stack+0x18/0x22
  [<ffffffff80002abe>] vpanic+0xf6/0x328
  [<ffffffff80002d2e>] panic+0x3e/0x40
  [<ffffffff80019ef0>] handle_bad_stack+0x98/0xa0
  [<ffffffff801e6070>] bpf_probe_read_kernel+0x20/0x60

Just fix it.

Fixes: 47c9214 ("bpf: fix the usage of BPF_TRAMP_F_SKIP_FRAME")
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251219142948.204312-1-dongml2@chinatelecom.cn

Closes: https://lore.kernel.org/bpf/874ipnkfvt.fsf@igel.home/
RevySR pushed a commit to revyos/linux that referenced this pull request Dec 22, 2025
The usage of BPF_TRAMP_F_ORIG_STACK in __arch_prepare_bpf_trampoline() is
wrong, and it should be BPF_TRAMP_F_CALL_ORIG, which caused crash as
Andreas reported:

  Insufficient stack space to handle exception!
  Task stack:     [0xff20000000010000..0xff20000000014000]
  Overflow stack: [0xff600000ffdad070..0xff600000ffdae070]
  CPU: 1 UID: 0 PID: 1 Comm: systemd Not tainted 6.18.0-rc5+ torvalds#15 PREEMPT(voluntary)
  Hardware name: riscv-virtio qemu/qemu, BIOS 2025.10 10/01/2025
  epc : copy_from_kernel_nofault+0xa/0x198
   ra : bpf_probe_read_kernel+0x20/0x60
  epc : ffffffff802b732a ra : ffffffff801e6070 sp : ff2000000000ffe0
   gp : ffffffff82262ed0 tp : 0000000000000000 t0 : ffffffff80022320
   t1 : ffffffff801e6056 t2 : 0000000000000000 s0 : ff20000000010040
   s1 : 0000000000000008 a0 : ff20000000010050 a1 : ff60000083b3d320
   a2 : 0000000000000008 a3 : 0000000000000097 a4 : 0000000000000000
   a5 : 0000000000000000 a6 : 0000000000000021 a7 : 0000000000000003
   s2 : ff20000000010050 s3 : ff6000008459fc18 s4 : ff60000083b3d340
   s5 : ff20000000010060 s6 : 0000000000000000 s7 : ff20000000013aa8
   s8 : 0000000000000000 s9 : 0000000000008000 s10: 000000000058dcb0
   s11: 000000000058dca7 t3 : 000000006925116d t4 : ff6000008090f026
   t5 : 00007fff9b0cbaa8 t6 : 0000000000000016
  status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000005
  Kernel panic - not syncing: Kernel stack overflow
  CPU: 1 UID: 0 PID: 1 Comm: systemd Not tainted 6.18.0-rc5+ torvalds#15 PREEMPT(voluntary)
  Hardware name: riscv-virtio qemu/qemu, BIOS 2025.10 10/01/2025
  Call Trace:
  [<ffffffff8001a1f8>] dump_backtrace+0x28/0x38
  [<ffffffff80002502>] show_stack+0x3a/0x50
  [<ffffffff800122be>] dump_stack_lvl+0x56/0x80
  [<ffffffff80012300>] dump_stack+0x18/0x22
  [<ffffffff80002abe>] vpanic+0xf6/0x328
  [<ffffffff80002d2e>] panic+0x3e/0x40
  [<ffffffff80019ef0>] handle_bad_stack+0x98/0xa0
  [<ffffffff801e6070>] bpf_probe_read_kernel+0x20/0x60

Just fix it.

Fixes: 47c9214 ("bpf: fix the usage of BPF_TRAMP_F_SKIP_FRAME")
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Closes: https://lore.kernel.org/bpf/874ipnkfvt.fsf@igel.home/
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20251219142948.204312-1-dongml2@chinatelecom.cn
Signed-off-by: Han Gao <gaohan@iscas.ac.cn>
jbrun3t added a commit to jbrun3t/linux that referenced this pull request Dec 22, 2025
This relocates register pokes of the HDMI VPU encoder out of the
HDMI phy driver. As far as HDMI is concerned, the sequence in which
the setup is done remains mostly the same.

This was tested with modetest, cycling through the following resolutions:
  #0 3840x2160 60.00
  #1 3840x2160 59.94
  #2 3840x2160 50.00
  #3 3840x2160 30.00
  #4 3840x2160 29.97
  #5 3840x2160 25.00
  torvalds#6 3840x2160 24.00
  torvalds#7 3840x2160 23.98
  torvalds#8 1920x1080 60.00
  torvalds#9 1920x1080 60.00
  torvalds#10 1920x1080 59.94
  torvalds#11 1920x1080i 30.00
  torvalds#12 1920x1080i 29.97
  torvalds#13 1920x1080 50.00
  torvalds#14 1920x1080i 25.00
  torvalds#15 1920x1080 30.00
  torvalds#16 1920x1080 29.97
  torvalds#17 1920x1080 25.00
  torvalds#18 1920x1080 24.00
  torvalds#19 1920x1080 23.98
  torvalds#20 1280x1024 60.02
  torvalds#21 1152x864 59.97
  torvalds#22 1280x720 60.00
  torvalds#23 1280x720 59.94
  torvalds#24 1280x720 50.00
  torvalds#25 1024x768 60.00
  torvalds#26 800x600 60.32
  torvalds#27 720x576 50.00
  torvalds#28 720x480 59.94

No regression to report.

This is part of an effort to clean up Amlogic HDMI related drivers which
should eventually allow to stop using the component API and HHI syscon.

Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
RevySR pushed a commit to revyos/linux that referenced this pull request Dec 25, 2025
The usage of BPF_TRAMP_F_ORIG_STACK in __arch_prepare_bpf_trampoline() is
wrong, and it should be BPF_TRAMP_F_CALL_ORIG, which caused crash as
Andreas reported:

  Insufficient stack space to handle exception!
  Task stack:     [0xff20000000010000..0xff20000000014000]
  Overflow stack: [0xff600000ffdad070..0xff600000ffdae070]
  CPU: 1 UID: 0 PID: 1 Comm: systemd Not tainted 6.18.0-rc5+ torvalds#15 PREEMPT(voluntary)
  Hardware name: riscv-virtio qemu/qemu, BIOS 2025.10 10/01/2025
  epc : copy_from_kernel_nofault+0xa/0x198
   ra : bpf_probe_read_kernel+0x20/0x60
  epc : ffffffff802b732a ra : ffffffff801e6070 sp : ff2000000000ffe0
   gp : ffffffff82262ed0 tp : 0000000000000000 t0 : ffffffff80022320
   t1 : ffffffff801e6056 t2 : 0000000000000000 s0 : ff20000000010040
   s1 : 0000000000000008 a0 : ff20000000010050 a1 : ff60000083b3d320
   a2 : 0000000000000008 a3 : 0000000000000097 a4 : 0000000000000000
   a5 : 0000000000000000 a6 : 0000000000000021 a7 : 0000000000000003
   s2 : ff20000000010050 s3 : ff6000008459fc18 s4 : ff60000083b3d340
   s5 : ff20000000010060 s6 : 0000000000000000 s7 : ff20000000013aa8
   s8 : 0000000000000000 s9 : 0000000000008000 s10: 000000000058dcb0
   s11: 000000000058dca7 t3 : 000000006925116d t4 : ff6000008090f026
   t5 : 00007fff9b0cbaa8 t6 : 0000000000000016
  status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000005
  Kernel panic - not syncing: Kernel stack overflow
  CPU: 1 UID: 0 PID: 1 Comm: systemd Not tainted 6.18.0-rc5+ torvalds#15 PREEMPT(voluntary)
  Hardware name: riscv-virtio qemu/qemu, BIOS 2025.10 10/01/2025
  Call Trace:
  [<ffffffff8001a1f8>] dump_backtrace+0x28/0x38
  [<ffffffff80002502>] show_stack+0x3a/0x50
  [<ffffffff800122be>] dump_stack_lvl+0x56/0x80
  [<ffffffff80012300>] dump_stack+0x18/0x22
  [<ffffffff80002abe>] vpanic+0xf6/0x328
  [<ffffffff80002d2e>] panic+0x3e/0x40
  [<ffffffff80019ef0>] handle_bad_stack+0x98/0xa0
  [<ffffffff801e6070>] bpf_probe_read_kernel+0x20/0x60

Just fix it.

Fixes: 47c9214 ("bpf: fix the usage of BPF_TRAMP_F_SKIP_FRAME")
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Closes: https://lore.kernel.org/bpf/874ipnkfvt.fsf@igel.home/
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20251219142948.204312-1-dongml2@chinatelecom.cn
Signed-off-by: Han Gao <gaohan@iscas.ac.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants