Skip to content

Conversation

@reejk
Copy link

@reejk reejk commented May 11, 2012

No description provided.

bootc pushed a commit to bootc/linux that referenced this pull request May 12, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8d
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 14, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
bergwolf referenced this pull request in bergwolf/linux May 15, 2012
…S block during isolation for migration

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
#10 [d72d3db4] try_to_compact_pages at c030bc84
#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
#14 [d72d3eb8] alloc_pages_vma at c030a845
#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
#16 [d72d3f00] handle_mm_fault at c02f36c6
#17 [d72d3f30] do_page_fault at c05c70ed
#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
bergwolf referenced this pull request in bergwolf/linux May 15, 2012
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not
update the real num tx queues. netdev_queue_update_kobjects() is already
called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when
upper layer driver, e.g., FCoE protocol stack is monitoring the netdev
event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove
extra queues allocated for FCoE, the associated txq sysfs kobjects are already
removed, and trying to update the real num queues would cause something like
below:

...
PID: 25138  TASK: ffff88021e64c440  CPU: 3   COMMAND: "kworker/3:3"
 #0 [ffff88021f007760] machine_kexec at ffffffff810226d9
 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d
 #2 [ffff88021f0078a0] oops_end at ffffffff813bca78
 #3 [ffff88021f0078d0] no_context at ffffffff81029e72
 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155
 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e
 #6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e
 #7 [ffff88021f007b10] page_fault at ffffffff813bc045
    [exception RIP: sysfs_find_dirent+17]
    RIP: ffffffff81178611  RSP: ffff88021f007bc0  RFLAGS: 00010246
    RAX: ffff88021e64c440  RBX: ffffffff8156cc63  RCX: 0000000000000004
    RDX: ffffffff8156cc63  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88021f007be0   R8: 0000000000000004   R9: 0000000000000008
    R10: ffffffff816fed00  R11: 0000000000000004  R12: 0000000000000000
    R13: ffffffff8156cc63  R14: 0000000000000000  R15: ffff8802222a0000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07
 #9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27
#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9
#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38
#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe]
#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe]
#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe]
#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q]
#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe]
#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe]
#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca
#19 [ffff88021f007e68] worker_thread at ffffffff81060513
#20 [ffff88021f007ee8] kthread at ffffffff810648b6
#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4

Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
bergwolf referenced this pull request in bergwolf/linux May 15, 2012
Printing the "start_ip" for every secondary cpu is very noisy on a large
system - and doesn't add any value. Drop this message.

Console log before:
Booting Node   0, Processors  #1
smpboot cpu 1: start_ip = 96000
 #2
smpboot cpu 2: start_ip = 96000
 #3
smpboot cpu 3: start_ip = 96000
 #4
smpboot cpu 4: start_ip = 96000
       ...
 #31
smpboot cpu 31: start_ip = 96000
Brought up 32 CPUs

Console log after:
Booting Node   0, Processors  #1 #2 #3 #4 #5 #6 #7 Ok.
Booting Node   1, Processors  #8 #9 #10 #11 #12 #13 #14 #15 Ok.
Booting Node   0, Processors  #16 #17 #18 #19 #20 #21 #22 #23 Ok.
Booting Node   1, Processors  #24 #25 #26 #27 #28 #29 #30 #31
Brought up 32 CPUs

Acked-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4f452eb42507460426@agluck-desktop.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
bergwolf referenced this pull request in bergwolf/linux May 15, 2012
This patch simplifies the buffer allocation functions
for MCI and removes unneeded memset calls. Also, a couple
of unused variables are removed and a memory leak in DMA
allocation is fixed.

[ 1263.788267] WARNING: at /home/sujith/dev/wireless-testing/lib/dma-debug.c:875 check_unmap+0x173/0x7e0()
[ 1263.788273] ath9k 0000:06:00.0: DMA-API: device driver frees DMA memory with different size
               [device address=0x0000000071908000] [map size=512 bytes] [unmap size=256 bytes]
[ 1263.788345] Pid: 774, comm: rmmod Tainted: G        W  O 3.3.0-rc3-wl #18
[ 1263.788348] Call Trace:
[ 1263.788355]  [<ffffffff8105110f>] warn_slowpath_common+0x7f/0xc0
[ 1263.788359]  [<ffffffff81051206>] warn_slowpath_fmt+0x46/0x50
[ 1263.788363]  [<ffffffff8125a713>] check_unmap+0x173/0x7e0
[ 1263.788368]  [<ffffffff8123fc22>] ? prio_tree_left+0x32/0xc0
[ 1263.788373]  [<ffffffff8125aded>] debug_dma_free_coherent+0x6d/0x80
[ 1263.788381]  [<ffffffffa0701c3c>] ath_mci_cleanup+0x7c/0x110 [ath9k]
[ 1263.788387]  [<ffffffffa06f4033>] ath9k_deinit_softc+0x113/0x120 [ath9k]
[ 1263.788392]  [<ffffffffa06f55cc>] ath9k_deinit_device+0x5c/0x70 [ath9k]
[ 1263.788397]  [<ffffffffa0704934>] ath_pci_remove+0x54/0xa0 [ath9k]
[ 1263.788401]  [<ffffffff81267d06>] pci_device_remove+0x46/0x110
[ 1263.788406]  [<ffffffff813102bc>] __device_release_driver+0x7c/0xe0
[ 1263.788410]  [<ffffffff81310a00>] driver_detach+0xd0/0xe0
[ 1263.788414]  [<ffffffff81310118>] bus_remove_driver+0x88/0xe0
[ 1263.788418]  [<ffffffff813111c2>] driver_unregister+0x62/0xa0
[ 1263.788421]  [<ffffffff812680c4>] pci_unregister_driver+0x44/0xc0
[ 1263.788427]  [<ffffffffa0705015>] ath_pci_exit+0x15/0x20 [ath9k]
[ 1263.788432]  [<ffffffffa070a92d>] ath9k_exit+0x15/0x31 [ath9k]
[ 1263.788436]  [<ffffffff810b971c>] sys_delete_module+0x18c/0x270
[ 1263.788441]  [<ffffffff81436edd>] ? retint_swapgs+0x13/0x1b
[ 1263.788446]  [<ffffffff812483be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1263.788450]  [<ffffffff814378e9>] system_call_fastpath+0x16/0x1b
[ 1263.788453] ---[ end trace 3ab4d030ffde40d4 ]---

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
nomis pushed a commit to nomis/linux that referenced this pull request May 15, 2012
commit 87121ca upstream.

Oprofile may crash in a KVM guest while unlaoding modules. This
happens if oprofile_arch_init() fails and oprofile switches to the hr
timer mode as a fallback. In this case oprofile_arch_exit() is called,
but it never was initialized properly which causes the crash. This
patch fixes this.

oprofile: using timer interrupt.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
PGD 41da3f067 PUD 41d80e067 PMD 0
Oops: 0002 [#1] PREEMPT SMP
CPU 5
Modules linked in: oprofile(-)

Pid: 2382, comm: modprobe Not tainted 3.1.0-rc7-00018-g709a39d torvalds#18 Advanced Micro Device Anaheim/Anaheim
RIP: 0010:[<ffffffff8123c226>]  [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
RSP: 0018:ffff88041de1de98  EFLAGS: 00010296
RAX: 0000000000000000 RBX: ffffffffa00060e0 RCX: dead000000200200
RDX: 0000000000000000 RSI: dead000000100100 RDI: ffffffff8178c620
RBP: ffff88041de1dea8 R08: 0000000000000001 R09: 0000000000000082
R10: 0000000000000000 R11: ffff88041de1dde8 R12: 0000000000000080
R13: fffffffffffffff5 R14: 0000000000000001 R15: 0000000000610210
FS:  00007f9ae5bef700(0000) GS:ffff88042fd40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 000000041ca44000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 2382, threadinfo ffff88041de1c000, task ffff88042db6d040)
Stack:
 ffff88041de1deb8 ffffffffa0006770 ffff88041de1deb8 ffffffffa000251e
 ffff88041de1dec8 ffffffffa00022c2 ffff88041de1ded8 ffffffffa0004993
 ffff88041de1df78 ffffffff81073115 656c69666f72706f 0000000000610200
Call Trace:
 [<ffffffffa000251e>] op_nmi_exit+0x15/0x17 [oprofile]
 [<ffffffffa00022c2>] oprofile_arch_exit+0xe/0x10 [oprofile]
 [<ffffffffa0004993>] oprofile_exit+0x13/0x15 [oprofile]
 [<ffffffff81073115>] sys_delete_module+0x1c3/0x22f
 [<ffffffff811bf09e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8148070b>] system_call_fastpath+0x16/0x1b
Code: 20 c6 78 81 e8 c5 cc 23 00 48 8b 13 48 8b 43 08 48 be 00 01 10 00 00 00 ad de 48 b9 00 02 20 00 00 00 ad de 48 c7 c7 20 c6 78 81
 89 42 08 48 89 10 48 89 33 48 89 4b 08 e8 a6 c0 23 00 5a 5b
RIP  [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
 RSP <ffff88041de1de98>
CR2: 0000000000000008
---[ end trace 06d4e95b6aa3b437 ]---

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
@reejk reejk closed this May 16, 2012
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 16, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 17, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
turl pushed a commit to allwinner-dev-team/linux-allwinner that referenced this pull request May 19, 2012
commit 87121ca upstream.

Oprofile may crash in a KVM guest while unlaoding modules. This
happens if oprofile_arch_init() fails and oprofile switches to the hr
timer mode as a fallback. In this case oprofile_arch_exit() is called,
but it never was initialized properly which causes the crash. This
patch fixes this.

oprofile: using timer interrupt.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
PGD 41da3f067 PUD 41d80e067 PMD 0
Oops: 0002 [#1] PREEMPT SMP
CPU 5
Modules linked in: oprofile(-)

Pid: 2382, comm: modprobe Not tainted 3.1.0-rc7-00018-g709a39d torvalds#18 Advanced Micro Device Anaheim/Anaheim
RIP: 0010:[<ffffffff8123c226>]  [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
RSP: 0018:ffff88041de1de98  EFLAGS: 00010296
RAX: 0000000000000000 RBX: ffffffffa00060e0 RCX: dead000000200200
RDX: 0000000000000000 RSI: dead000000100100 RDI: ffffffff8178c620
RBP: ffff88041de1dea8 R08: 0000000000000001 R09: 0000000000000082
R10: 0000000000000000 R11: ffff88041de1dde8 R12: 0000000000000080
R13: fffffffffffffff5 R14: 0000000000000001 R15: 0000000000610210
FS:  00007f9ae5bef700(0000) GS:ffff88042fd40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 000000041ca44000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 2382, threadinfo ffff88041de1c000, task ffff88042db6d040)
Stack:
 ffff88041de1deb8 ffffffffa0006770 ffff88041de1deb8 ffffffffa000251e
 ffff88041de1dec8 ffffffffa00022c2 ffff88041de1ded8 ffffffffa0004993
 ffff88041de1df78 ffffffff81073115 656c69666f72706f 0000000000610200
Call Trace:
 [<ffffffffa000251e>] op_nmi_exit+0x15/0x17 [oprofile]
 [<ffffffffa00022c2>] oprofile_arch_exit+0xe/0x10 [oprofile]
 [<ffffffffa0004993>] oprofile_exit+0x13/0x15 [oprofile]
 [<ffffffff81073115>] sys_delete_module+0x1c3/0x22f
 [<ffffffff811bf09e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8148070b>] system_call_fastpath+0x16/0x1b
Code: 20 c6 78 81 e8 c5 cc 23 00 48 8b 13 48 8b 43 08 48 be 00 01 10 00 00 00 ad de 48 b9 00 02 20 00 00 00 ad de 48 c7 c7 20 c6 78 81
 89 42 08 48 89 10 48 89 33 48 89 4b 08 e8 a6 c0 23 00 5a 5b
RIP  [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
 RSP <ffff88041de1de98>
CR2: 0000000000000008
---[ end trace 06d4e95b6aa3b437 ]---

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
turl pushed a commit to allwinner-dev-team/linux-allwinner that referenced this pull request May 19, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d14] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8d
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 21, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 22, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 22, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 23, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 24, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
nwnk pushed a commit to nwnk/linux that referenced this pull request May 29, 2012
Oprofile may crash in a KVM guest while unlaoding modules. This
happens if oprofile_arch_init() fails and oprofile switches to the hr
timer mode as a fallback. In this case oprofile_arch_exit() is called,
but it never was initialized properly which causes the crash. This
patch fixes this.

oprofile: using timer interrupt.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
PGD 41da3f067 PUD 41d80e067 PMD 0
Oops: 0002 [#1] PREEMPT SMP
CPU 5
Modules linked in: oprofile(-)

Pid: 2382, comm: modprobe Not tainted 3.1.0-rc7-00018-g709a39d torvalds#18 Advanced Micro Device Anaheim/Anaheim
RIP: 0010:[<ffffffff8123c226>]  [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
RSP: 0018:ffff88041de1de98  EFLAGS: 00010296
RAX: 0000000000000000 RBX: ffffffffa00060e0 RCX: dead000000200200
RDX: 0000000000000000 RSI: dead000000100100 RDI: ffffffff8178c620
RBP: ffff88041de1dea8 R08: 0000000000000001 R09: 0000000000000082
R10: 0000000000000000 R11: ffff88041de1dde8 R12: 0000000000000080
R13: fffffffffffffff5 R14: 0000000000000001 R15: 0000000000610210
FS:  00007f9ae5bef700(0000) GS:ffff88042fd40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 000000041ca44000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 2382, threadinfo ffff88041de1c000, task ffff88042db6d040)
Stack:
 ffff88041de1deb8 ffffffffa0006770 ffff88041de1deb8 ffffffffa000251e
 ffff88041de1dec8 ffffffffa00022c2 ffff88041de1ded8 ffffffffa0004993
 ffff88041de1df78 ffffffff81073115 656c69666f72706f 0000000000610200
Call Trace:
 [<ffffffffa000251e>] op_nmi_exit+0x15/0x17 [oprofile]
 [<ffffffffa00022c2>] oprofile_arch_exit+0xe/0x10 [oprofile]
 [<ffffffffa0004993>] oprofile_exit+0x13/0x15 [oprofile]
 [<ffffffff81073115>] sys_delete_module+0x1c3/0x22f
 [<ffffffff811bf09e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8148070b>] system_call_fastpath+0x16/0x1b
Code: 20 c6 78 81 e8 c5 cc 23 00 48 8b 13 48 8b 43 08 48 be 00 01 10 00 00 00 ad de 48 b9 00 02 20 00 00 00 ad de 48 c7 c7 20 c6 78 81
 89 42 08 48 89 10 48 89 33 48 89 4b 08 e8 a6 c0 23 00 5a 5b
RIP  [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
 RSP <ffff88041de1de98>
CR2: 0000000000000008
---[ end trace 06d4e95b6aa3b437 ]---

CC: stable@kernel.org # 2.6.37+
Signed-off-by: Robert Richter <robert.richter@amd.com>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 30, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jadonk pushed a commit to jadonk/linux that referenced this pull request May 31, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d14] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8d
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Jun 5, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Jun 6, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Jun 7, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Jun 14, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
torvalds pushed a commit that referenced this pull request Jun 16, 2012
The warning below triggers on AMD MCM packages because physical package
IDs on the cores of a _physical_ socket are the same. I.e., this field
says which CPUs belong to the same physical package.

However, the same two CPUs belong to two different internal, i.e.
"logical" nodes in the same physical socket which is reflected in the
CPU-to-node map on x86 with NUMA.

Which makes this check wrong on the above topologies so circumvent it.

[    0.444413] Booting Node   0, Processors  #1 #2 #3 #4 #5 Ok.
[    0.461388] ------------[ cut here ]------------
[    0.465997] WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81()
[    0.473960] Hardware name: Dinar
[    0.477170] sched: CPU #6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[    0.486860] Booting Node   1, Processors  #6
[    0.491104] Modules linked in:
[    0.494141] Pid: 0, comm: swapper/6 Not tainted 3.4.0+ #1
[    0.499510] Call Trace:
[    0.501946]  [<ffffffff8144bf92>] ? topology_sane.clone.1+0x6e/0x81
[    0.508185]  [<ffffffff8102f1fc>] warn_slowpath_common+0x85/0x9d
[    0.514163]  [<ffffffff8102f2b7>] warn_slowpath_fmt+0x46/0x48
[    0.519881]  [<ffffffff8144bf92>] topology_sane.clone.1+0x6e/0x81
[    0.525943]  [<ffffffff8144c234>] set_cpu_sibling_map+0x251/0x371
[    0.532004]  [<ffffffff8144c4ee>] start_secondary+0x19a/0x218
[    0.537729] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.628197]  #7 #8 #9 #10 #11 Ok.
[    0.807108] Booting Node   3, Processors  #12 #13 #14 #15 #16 #17 Ok.
[    0.897587] Booting Node   2, Processors  #18 #19 #20 #21 #22 #23 Ok.
[    0.917443] Brought up 24 CPUs

We ran a topology sanity check test we have here on it and
it all looks ok... hopefully :).

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120529135442.GE29157@aftab.osrc.amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Jun 20, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Jun 26, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jul 17, 2025
We can write quite large journal entries.

It's possible that JOURNAL_ENTRY_SIZE_MAX is far larger than it needs to
be and should be reduced to something more reasonable (1-2MB), but
performance can be sensitive to anything that affects journal
pipelining - we've had issues there in the past, so we shouldn't change
that without first doing performance testing.

For now, apply the simple fix to deal with the following, which caused a
mount to fail:

[43579.348135] mount.bcachefs: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_ZERO|__GFP_COMP), nodemask=(null),cpuset=openrc.sshd,mems_allowed=0
[43579.349792] CPU: 3 UID: 0 PID: 14702 Comm: mount.bcachefs Tainted: G     U              6.16.0-rc4 torvalds#18 VOLUNTARY
[43579.349798] Tainted: [U]=USER
[43579.349799] Hardware name: MSI MS-7982/B150M PRO-VDH (MS-7982), BIOS 3.H0 07/10/2018
[43579.349800] Call Trace:
[43579.349803]  <TASK>
[43579.349806] dump_stack_lvl (lib/dump_stack.c:122)
[43579.349814] warn_alloc (mm/page_alloc.c:3744)
[43579.349821] ? __alloc_pages_direct_compact (./arch/x86/include/asm/jump_label.h:36 ./include/linux/delayacct.h:211 mm/page_alloc.c:3882)
[43579.349827] __alloc_pages_slowpath.constprop.0 (mm/page_alloc.c:4699)
[43579.349833] __alloc_frozen_pages_noprof (mm/page_alloc.c:4972)
[43579.349838] __alloc_pages_noprof (mm/page_alloc.c:4994)
[43579.349843] ___kmalloc_large_node (./include/linux/gfp.h:284 ./include/linux/gfp.h:311 mm/slub.c:4272)
[43579.349848] __kmalloc_large_noprof (./arch/x86/include/asm/bitops.h:414 ./include/asm-generic/getorder.h:46 mm/slub.c:4292)
[43579.349852] bch2_dev_journal_init (fs/bcachefs/journal.c:1629 (discriminator 4))
[43579.349857] __bch2_dev_attach_bdev (fs/bcachefs/super.c:1588)
[43579.349861] ? kernfs_create_link (fs/kernfs/symlink.c:48)
[43579.349865] bch2_dev_attach_bdev (fs/bcachefs/super.c:1630)
[43579.349868] bch2_fs_open (fs/bcachefs/super.c:2440)
[43579.349874] bch2_fs_get_tree (fs/bcachefs/fs.c:2474)
[43579.349880] vfs_get_tree (fs/super.c:1805)

Reported-by: Marcin Mirosław <marcin@mejor.pl>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Jul 21, 2025
We can write quite large journal entries.

It's possible that JOURNAL_ENTRY_SIZE_MAX is far larger than it needs to
be and should be reduced to something more reasonable (1-2MB), but
performance can be sensitive to anything that affects journal
pipelining - we've had issues there in the past, so we shouldn't change
that without first doing performance testing.

For now, apply the simple fix to deal with the following, which caused a
mount to fail:

[43579.348135] mount.bcachefs: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_ZERO|__GFP_COMP), nodemask=(null),cpuset=openrc.sshd,mems_allowed=0
[43579.349792] CPU: 3 UID: 0 PID: 14702 Comm: mount.bcachefs Tainted: G     U              6.16.0-rc4 torvalds#18 VOLUNTARY
[43579.349798] Tainted: [U]=USER
[43579.349799] Hardware name: MSI MS-7982/B150M PRO-VDH (MS-7982), BIOS 3.H0 07/10/2018
[43579.349800] Call Trace:
[43579.349803]  <TASK>
[43579.349806] dump_stack_lvl (lib/dump_stack.c:122)
[43579.349814] warn_alloc (mm/page_alloc.c:3744)
[43579.349821] ? __alloc_pages_direct_compact (./arch/x86/include/asm/jump_label.h:36 ./include/linux/delayacct.h:211 mm/page_alloc.c:3882)
[43579.349827] __alloc_pages_slowpath.constprop.0 (mm/page_alloc.c:4699)
[43579.349833] __alloc_frozen_pages_noprof (mm/page_alloc.c:4972)
[43579.349838] __alloc_pages_noprof (mm/page_alloc.c:4994)
[43579.349843] ___kmalloc_large_node (./include/linux/gfp.h:284 ./include/linux/gfp.h:311 mm/slub.c:4272)
[43579.349848] __kmalloc_large_noprof (./arch/x86/include/asm/bitops.h:414 ./include/asm-generic/getorder.h:46 mm/slub.c:4292)
[43579.349852] bch2_dev_journal_init (fs/bcachefs/journal.c:1629 (discriminator 4))
[43579.349857] __bch2_dev_attach_bdev (fs/bcachefs/super.c:1588)
[43579.349861] ? kernfs_create_link (fs/kernfs/symlink.c:48)
[43579.349865] bch2_dev_attach_bdev (fs/bcachefs/super.c:1630)
[43579.349868] bch2_fs_open (fs/bcachefs/super.c:2440)
[43579.349874] bch2_fs_get_tree (fs/bcachefs/fs.c:2474)
[43579.349880] vfs_get_tree (fs/super.c:1805)

Reported-by: Marcin Mirosław <marcin@mejor.pl>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jul 23, 2025
We can write quite large journal entries.

It's possible that JOURNAL_ENTRY_SIZE_MAX is far larger than it needs to
be and should be reduced to something more reasonable (1-2MB), but
performance can be sensitive to anything that affects journal
pipelining - we've had issues there in the past, so we shouldn't change
that without first doing performance testing.

For now, apply the simple fix to deal with the following, which caused a
mount to fail:

[43579.348135] mount.bcachefs: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_ZERO|__GFP_COMP), nodemask=(null),cpuset=openrc.sshd,mems_allowed=0
[43579.349792] CPU: 3 UID: 0 PID: 14702 Comm: mount.bcachefs Tainted: G     U              6.16.0-rc4 torvalds#18 VOLUNTARY
[43579.349798] Tainted: [U]=USER
[43579.349799] Hardware name: MSI MS-7982/B150M PRO-VDH (MS-7982), BIOS 3.H0 07/10/2018
[43579.349800] Call Trace:
[43579.349803]  <TASK>
[43579.349806] dump_stack_lvl (lib/dump_stack.c:122)
[43579.349814] warn_alloc (mm/page_alloc.c:3744)
[43579.349821] ? __alloc_pages_direct_compact (./arch/x86/include/asm/jump_label.h:36 ./include/linux/delayacct.h:211 mm/page_alloc.c:3882)
[43579.349827] __alloc_pages_slowpath.constprop.0 (mm/page_alloc.c:4699)
[43579.349833] __alloc_frozen_pages_noprof (mm/page_alloc.c:4972)
[43579.349838] __alloc_pages_noprof (mm/page_alloc.c:4994)
[43579.349843] ___kmalloc_large_node (./include/linux/gfp.h:284 ./include/linux/gfp.h:311 mm/slub.c:4272)
[43579.349848] __kmalloc_large_noprof (./arch/x86/include/asm/bitops.h:414 ./include/asm-generic/getorder.h:46 mm/slub.c:4292)
[43579.349852] bch2_dev_journal_init (fs/bcachefs/journal.c:1629 (discriminator 4))
[43579.349857] __bch2_dev_attach_bdev (fs/bcachefs/super.c:1588)
[43579.349861] ? kernfs_create_link (fs/kernfs/symlink.c:48)
[43579.349865] bch2_dev_attach_bdev (fs/bcachefs/super.c:1630)
[43579.349868] bch2_fs_open (fs/bcachefs/super.c:2440)
[43579.349874] bch2_fs_get_tree (fs/bcachefs/fs.c:2474)
[43579.349880] vfs_get_tree (fs/super.c:1805)

Reported-by: Marcin Mirosław <marcin@mejor.pl>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
roxell pushed a commit to roxell/linux that referenced this pull request Jul 25, 2025
We can write quite large journal entries.

It's possible that JOURNAL_ENTRY_SIZE_MAX is far larger than it needs to
be and should be reduced to something more reasonable (1-2MB), but
performance can be sensitive to anything that affects journal
pipelining - we've had issues there in the past, so we shouldn't change
that without first doing performance testing.

For now, apply the simple fix to deal with the following, which caused a
mount to fail:

[43579.348135] mount.bcachefs: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_ZERO|__GFP_COMP), nodemask=(null),cpuset=openrc.sshd,mems_allowed=0
[43579.349792] CPU: 3 UID: 0 PID: 14702 Comm: mount.bcachefs Tainted: G     U              6.16.0-rc4 torvalds#18 VOLUNTARY
[43579.349798] Tainted: [U]=USER
[43579.349799] Hardware name: MSI MS-7982/B150M PRO-VDH (MS-7982), BIOS 3.H0 07/10/2018
[43579.349800] Call Trace:
[43579.349803]  <TASK>
[43579.349806] dump_stack_lvl (lib/dump_stack.c:122)
[43579.349814] warn_alloc (mm/page_alloc.c:3744)
[43579.349821] ? __alloc_pages_direct_compact (./arch/x86/include/asm/jump_label.h:36 ./include/linux/delayacct.h:211 mm/page_alloc.c:3882)
[43579.349827] __alloc_pages_slowpath.constprop.0 (mm/page_alloc.c:4699)
[43579.349833] __alloc_frozen_pages_noprof (mm/page_alloc.c:4972)
[43579.349838] __alloc_pages_noprof (mm/page_alloc.c:4994)
[43579.349843] ___kmalloc_large_node (./include/linux/gfp.h:284 ./include/linux/gfp.h:311 mm/slub.c:4272)
[43579.349848] __kmalloc_large_noprof (./arch/x86/include/asm/bitops.h:414 ./include/asm-generic/getorder.h:46 mm/slub.c:4292)
[43579.349852] bch2_dev_journal_init (fs/bcachefs/journal.c:1629 (discriminator 4))
[43579.349857] __bch2_dev_attach_bdev (fs/bcachefs/super.c:1588)
[43579.349861] ? kernfs_create_link (fs/kernfs/symlink.c:48)
[43579.349865] bch2_dev_attach_bdev (fs/bcachefs/super.c:1630)
[43579.349868] bch2_fs_open (fs/bcachefs/super.c:2440)
[43579.349874] bch2_fs_get_tree (fs/bcachefs/fs.c:2474)
[43579.349880] vfs_get_tree (fs/super.c:1805)

Reported-by: Marcin Mirosław <marcin@mejor.pl>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
roxell pushed a commit to roxell/linux that referenced this pull request Jul 28, 2025
We can write quite large journal entries.

It's possible that JOURNAL_ENTRY_SIZE_MAX is far larger than it needs to
be and should be reduced to something more reasonable (1-2MB), but
performance can be sensitive to anything that affects journal
pipelining - we've had issues there in the past, so we shouldn't change
that without first doing performance testing.

For now, apply the simple fix to deal with the following, which caused a
mount to fail:

[43579.348135] mount.bcachefs: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_ZERO|__GFP_COMP), nodemask=(null),cpuset=openrc.sshd,mems_allowed=0
[43579.349792] CPU: 3 UID: 0 PID: 14702 Comm: mount.bcachefs Tainted: G     U              6.16.0-rc4 torvalds#18 VOLUNTARY
[43579.349798] Tainted: [U]=USER
[43579.349799] Hardware name: MSI MS-7982/B150M PRO-VDH (MS-7982), BIOS 3.H0 07/10/2018
[43579.349800] Call Trace:
[43579.349803]  <TASK>
[43579.349806] dump_stack_lvl (lib/dump_stack.c:122)
[43579.349814] warn_alloc (mm/page_alloc.c:3744)
[43579.349821] ? __alloc_pages_direct_compact (./arch/x86/include/asm/jump_label.h:36 ./include/linux/delayacct.h:211 mm/page_alloc.c:3882)
[43579.349827] __alloc_pages_slowpath.constprop.0 (mm/page_alloc.c:4699)
[43579.349833] __alloc_frozen_pages_noprof (mm/page_alloc.c:4972)
[43579.349838] __alloc_pages_noprof (mm/page_alloc.c:4994)
[43579.349843] ___kmalloc_large_node (./include/linux/gfp.h:284 ./include/linux/gfp.h:311 mm/slub.c:4272)
[43579.349848] __kmalloc_large_noprof (./arch/x86/include/asm/bitops.h:414 ./include/asm-generic/getorder.h:46 mm/slub.c:4292)
[43579.349852] bch2_dev_journal_init (fs/bcachefs/journal.c:1629 (discriminator 4))
[43579.349857] __bch2_dev_attach_bdev (fs/bcachefs/super.c:1588)
[43579.349861] ? kernfs_create_link (fs/kernfs/symlink.c:48)
[43579.349865] bch2_dev_attach_bdev (fs/bcachefs/super.c:1630)
[43579.349868] bch2_fs_open (fs/bcachefs/super.c:2440)
[43579.349874] bch2_fs_get_tree (fs/bcachefs/fs.c:2474)
[43579.349880] vfs_get_tree (fs/super.c:1805)

Reported-by: Marcin Mirosław <marcin@mejor.pl>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
vincent-mailhol added a commit to vincent-mailhol/linux that referenced this pull request Sep 3, 2025
In November last year, I sent an RFC to introduce CAN XL [1]. That
RFC, despite positive feedback, was put on hold due to some unanswered
question concerning the PWM encoding [2].

While stuck, some small preparation work was done in parallel in [3]
by refactoring the struct can_priv and doing some trivial clean-up and
renaming. [3] received zero feedback but was eventually merged after
splitting it in smaller parts and resending it.

Finally, in July this year, we clarified the remaining mysteries about
PWM calculation, thus unlocking the series. Summer being a bit busy
because of some personal matters brings us to now.

After doing all the refactoring and adding all the CAN XL features,
the final result is roughly 30 patches, probably too much for a single
series. So I am splitting it in two:

  - preparation (this series)
  - CAN XL

And so, this series continues and finishes the preparation work done
in [3]. It contains all the refactoring needed to smoothly introduce
CAN XL. The goal is to:

  - split the function in smaller pieces: CAN XL will introduce a fair
    amount of code. And some functions which are already fairly long
    (86 lines for can_validate(), 216 lines for can_changelink())
    would grow to disproportionate sizes if the CAN XL logic were to
    be inlined in those functions.

  - repurpose the existing code to handle both CAN FD and CAN XL: a
    huge part of CAN XL simply reuses the CAN FD logic. All the
    existing CAN FD logic is made more generic to handle both CAN FD
    and XL.

In more details:

  - Patch #1 moves struct data_bittiming_params from dev.h to
    bittiming.h and patch #2 makes can_get_relative_tdco() FD agnostic
    before also moving it to bittiming.h.

  - Patch #3 adds some comments to netlink.h tagging which IFLA
    symbols are FD specific.

  - Patch #4 to torvalds#7 are refactoring can_validate() and
    can_validate_bittiming().

  - Patch torvalds#8 to torvalds#12 are refactoring can_changelink() and
    can_tdc_changelink().

  - Patch torvalds#13 and torvalds#14 are refactoring can_get_size() and
    can_tdc_get_size().

  - Patch torvalds#15 to torvalds#18 are refactoring can_fill_info() and
    can_tdc_fill_info().

  - Patch torvalds#19 makes can_calc_tdco() FD agnostic.

  - Patch torvalds#20 adds can_get_ctrlmode_str() which converts control mode
    flags into strings. This is done in preparation of patch torvalds#20.

  - Patch torvalds#21 is the final patch and improves the user experience by
    providing detailed error messages whenever invalid parameters are
    provided. All those error messages came into handy when debugging
    the upcoming CAN XL patches.

Aside from the last patch, the other changes do not impact any of the
existing functionalities.

The follow up series which introduces CAN XL is nearly completed but
will be sent only once this one is approved: one thing at a time, I do
not want to overwhelm people (including myself).

[1] https://lore.kernel.org/linux-can/20241110155902.72807-16-mailhol.vincent@wanadoo.fr/
[2] https://lore.kernel.org/linux-can/c4771c16-c578-4a6d-baee-918fe276dbe9@wanadoo.fr/
[3] https://lore.kernel.org/linux-can/20241110155902.72807-16-mailhol.vincent@wanadoo.fr/

To: Marc Kleine-Budde <mkl@pengutronix.de>
To: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Vincent Mailhol <mailhol@kernel.org>
Cc: Stéphane Grosjean <stephane.grosjean@hms-networks.com>
Cc: Robert Nawrath <mbro1689@gmail.com>
Cc: Minh Le <minh.le.aj@renesas.com>
Cc: Duy Nguyen <duy.nguyen.rh@renesas.com>
Cc: linux-can@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Signed-off-by: Vincent Mailhol <mailhol@kernel.org>

---
Changes in v2:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
- Link to v1: https://lore.kernel.org/r/20250903-canxl-netlink-prep-v1-0-904bd6037cd9@kernel.org



--- b4-submit-tracking ---
# This section is used internally by b4 prep for tracking purposes.
{
  "series": {
    "revision": 2,
    "change-id": "20250831-canxl-netlink-prep-9dbf8498fd9d",
    "prefixes": [],
    "prerequisites": [
      "base-commit: net-next/main"
    ],
    "history": {
      "v1": [
        "20250903-canxl-netlink-prep-v1-0-904bd6037cd9@kernel.org"
      ]
    }
  }
}
vincent-mailhol added a commit to vincent-mailhol/linux that referenced this pull request Sep 7, 2025
In November last year, I sent an RFC to introduce CAN XL [1]. That
RFC, despite positive feedback, was put on hold due to some unanswered
question concerning the PWM encoding [2].

While stuck, some small preparation work was done in parallel in [3]
by refactoring the struct can_priv and doing some trivial clean-up and
renaming. [3] received zero feedback but was eventually merged after
splitting it in smaller parts and resending it.

Finally, in July this year, we clarified the remaining mysteries about
PWM calculation, thus unlocking the series. Summer being a bit busy
because of some personal matters brings us to now.

After doing all the refactoring and adding all the CAN XL features,
the final result is roughly 30 patches, probably too much for a single
series. So I am splitting it in two:

  - preparation (this series)
  - CAN XL

And so, this series continues and finishes the preparation work done
in [3]. It contains all the refactoring needed to smoothly introduce
CAN XL. The goal is to:

  - split the function in smaller pieces: CAN XL will introduce a fair
    amount of code. And some functions which are already fairly long
    (86 lines for can_validate(), 216 lines for can_changelink())
    would grow to disproportionate sizes if the CAN XL logic were to
    be inlined in those functions.

  - repurpose the existing code to handle both CAN FD and CAN XL: a
    huge part of CAN XL simply reuses the CAN FD logic. All the
    existing CAN FD logic is made more generic to handle both CAN FD
    and XL.

In more details:

  - Patch #1 moves struct data_bittiming_params from dev.h to
    bittiming.h and patch #2 makes can_get_relative_tdco() FD agnostic
    before also moving it to bittiming.h.

  - Patch #3 adds some comments to netlink.h tagging which IFLA
    symbols are FD specific.

  - Patch #4 to torvalds#6 are refactoring can_validate() and
    can_validate_bittiming().

  - Patch torvalds#7 to torvalds#11 are refactoring can_changelink() and
    can_tdc_changelink().

  - Patch torvalds#12 and torvalds#13 are refactoring can_get_size() and
    can_tdc_get_size().

  - Patch torvalds#14 to torvalds#17 are refactoring can_fill_info() and
    can_tdc_fill_info().

  - Patch torvalds#18 makes can_calc_tdco() FD agnostic.

  - Patch torvalds#19 adds can_get_ctrlmode_str() which converts control mode
    flags into strings. This is done in preparation of patch torvalds#20.

  - Patch torvalds#20 is the final patch and improves the user experience by
    providing detailed error messages whenever invalid parameters are
    provided. All those error messages came into handy when debugging
    the upcoming CAN XL patches.

Aside from the last patch, the other changes do not impact any of the
existing functionalities.

The follow up series which introduces CAN XL is nearly completed but
will be sent only once this one is approved: one thing at a time, I do
not want to overwhelm people (including myself).

[1] https://lore.kernel.org/linux-can/20241110155902.72807-16-mailhol.vincent@wanadoo.fr/
[2] https://lore.kernel.org/linux-can/c4771c16-c578-4a6d-baee-918fe276dbe9@wanadoo.fr/
[3] https://lore.kernel.org/linux-can/20241110155902.72807-16-mailhol.vincent@wanadoo.fr/

To: Marc Kleine-Budde <mkl@pengutronix.de>
To: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Vincent Mailhol <mailhol@kernel.org>
Cc: Stéphane Grosjean <stephane.grosjean@hms-networks.com>
Cc: Robert Nawrath <mbro1689@gmail.com>
Cc: Minh Le <minh.le.aj@renesas.com>
Cc: Duy Nguyen <duy.nguyen.rh@renesas.com>
Cc: linux-can@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Signed-off-by: Vincent Mailhol <mailhol@kernel.org>

---
Changes in v2:

  - Move can_validate()'s comment block to can_validate_databittiming().
    Consequently,

      [PATCH 07/21] can: netlink: remove comment in can_validate()

    from v1 is removed.

  - Change any occurrences of WARN_ON(1) into return -EOPNOTSUPP to
    suppress the three gcc warnings which were reported by the kernel
    test robot:
    Link: https://lore.kernel.org/linux-can/202509050259.NjPdQyAD-lkp@intel.com/
    Link: https://lore.kernel.org/linux-can/202509050404.ZLQknagH-lkp@intel.com/
    Link: https://lore.kernel.org/linux-can/202509050541.1FKRbqOi-lkp@intel.com/

  - Link to v1: https://lore.kernel.org/r/20250903-canxl-netlink-prep-v1-0-904bd6037cd9@kernel.org

--- b4-submit-tracking ---
{
  "series": {
    "revision": 2,
    "change-id": "20250831-canxl-netlink-prep-9dbf8498fd9d",
    "prefixes": [],
    "prerequisites": [
      "base-commit: net-next/main"
    ],
    "history": {
      "v1": [
        "20250903-canxl-netlink-prep-v1-0-904bd6037cd9@kernel.org"
      ]
    }
  }
}
vincent-mailhol added a commit to vincent-mailhol/linux that referenced this pull request Sep 20, 2025
In November last year, I sent an RFC to introduce CAN XL [1]. That
RFC, despite positive feedback, was put on hold due to some unanswered
question concerning the PWM encoding [2].

While stuck, some small preparation work was done in parallel in [3]
by refactoring the struct can_priv and doing some trivial clean-up and
renaming. Initially, [3] received zero feedback but was eventually
merged after splitting it in smaller parts and resending it.

Finally, in July this year, we clarified the remaining mysteries about
PWM calculation, thus unlocking the series. Summer being a bit busy
because of some personal matters brings us to now.

After doing all the refactoring and adding all the CAN XL features,
the final result is roughly 30 patches, probably too much for a single
series. So I am splitting it in two:

  - preparation (this series)
  - CAN XL (will come later, after this series get ACK-ed)

And so, this series continues and finishes the preparation work done
in [3]. It contains all the refactoring needed to smoothly introduce
CAN XL. The goal is to:

  - split the functions in smaller pieces: CAN XL will introduce a
    fair amount of code. And some functions which are already fairly
    long (86 lines for can_validate(), 216 lines for can_changelink())
    would grow to disproportionate sizes if the CAN XL logic were to
    be inlined in those functions.

  - repurpose the existing code to handle both CAN FD and CAN XL: a
    huge part of CAN XL simply reuses the CAN FD logic. All the
    existing CAN FD logic is made more generic to handle both CAN FD
    and XL.

In more details:

  - Patch #1 moves struct data_bittiming_params from dev.h to
    bittiming.h and patch #2 makes can_get_relative_tdco() FD agnostic
    before also moving it to bittiming.h.

  - Patch #3 adds some comments to netlink.h tagging which IFLA
    symbols are FD specific.

  - Patches #4 to torvalds#6 are refactoring can_validate() and
    can_validate_bittiming().

  - Patches torvalds#7 to torvalds#11 are refactoring can_changelink() and
    can_tdc_changelink().

  - Patches torvalds#12 and torvalds#13 are refactoring can_get_size() and
    can_tdc_get_size().

  - Patches torvalds#14 to torvalds#17 are refactoring can_fill_info() and
    can_tdc_fill_info().

  - Patch torvalds#18 makes can_calc_tdco() FD agnostic.

  - Patch torvalds#19 adds can_get_ctrlmode_str() which converts control mode
    flags into strings. This is done in preparation of patch torvalds#20.

  - Patch torvalds#20 is the final patch and improves the user experience by
    providing detailed error messages whenever invalid parameters are
    provided. All those error messages came into handy when debugging
    the upcoming CAN XL patches.

Aside from the last patch, the other changes do not impact any of the
existing functionalities.

The follow up series which introduces CAN XL is nearly completed but
will be sent only once this one is approved: one thing at a time, I do
not want to overwhelm people (including myself).

[1] https://lore.kernel.org/linux-can/20241110155902.72807-16-mailhol.vincent@wanadoo.fr/
[2] https://lore.kernel.org/linux-can/c4771c16-c578-4a6d-baee-918fe276dbe9@wanadoo.fr/
[3] https://lore.kernel.org/linux-can/20241110155902.72807-16-mailhol.vincent@wanadoo.fr/

To: Marc Kleine-Budde <mkl@pengutronix.de>
To: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Vincent Mailhol <mailhol@kernel.org>
Cc: Stéphane Grosjean <stephane.grosjean@hms-networks.com>
Cc: Robert Nawrath <mbro1689@gmail.com>
Cc: Minh Le <minh.le.aj@renesas.com>
Cc: Duy Nguyen <duy.nguyen.rh@renesas.com>
Cc: linux-can@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Signed-off-by: Vincent Mailhol <mailhol@kernel.org>

---
Changes in v3:

  - Add a static_assert() in can_validate_databittiming() to prove
    that the nla attributes were already correctly aligned.

Link to v2: https://lore.kernel.org/r/20250910-canxl-netlink-prep-v2-0-f128d4083721@kernel.org

Changes in v2:

  - Move can_validate()'s comment block to can_validate_databittiming().
    Consequently,

      [PATCH 07/21] can: netlink: remove comment in can_validate()

    from v1 is removed.

  - Change any occurrences of WARN_ON(1) into return -EOPNOTSUPP to
    suppress the three gcc warnings which were reported by the kernel
    test robot:
    Link: https://lore.kernel.org/linux-can/202509050259.NjPdQyAD-lkp@intel.com/
    Link: https://lore.kernel.org/linux-can/202509050404.ZLQknagH-lkp@intel.com/
    Link: https://lore.kernel.org/linux-can/202509050541.1FKRbqOi-lkp@intel.com/

  - Small rewrite of patch torvalds#12 "can: netlink: make can_tdc_get_size()
    FD agnostic" description to add more details.

Link to v1: https://lore.kernel.org/r/20250903-canxl-netlink-prep-v1-0-904bd6037cd9@kernel.org

--- b4-submit-tracking ---
{
  "series": {
    "revision": 3,
    "change-id": "20250831-canxl-netlink-prep-9dbf8498fd9d",
    "prefixes": [],
    "prerequisites": [
      "base-commit: net-next/main"
    ],
    "history": {
      "v1": [
        "20250903-canxl-netlink-prep-v1-0-904bd6037cd9@kernel.org"
      ],
      "v2": [
        "20250910-canxl-netlink-prep-v2-0-f128d4083721@kernel.org"
      ]
    }
  }
}
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Sep 24, 2025
…CAN XL step 3/3"

Vincent Mailhol <mailhol@kernel.org> says:

In November last year, I sent an RFC to introduce CAN XL [1]. That
RFC, despite positive feedback, was put on hold due to some unanswered
question concerning the PWM encoding [2].

While stuck, some small preparation work was done in parallel in [3]
by refactoring the struct can_priv and doing some trivial clean-up and
renaming. Initially, [3] received zero feedback but was eventually
merged after splitting it in smaller parts and resending it.

Finally, in July this year, we clarified the remaining mysteries about
PWM calculation, thus unlocking the series. Summer being a bit busy
because of some personal matters brings us to now.

After doing all the refactoring and adding all the CAN XL features,
the final result is more than 30 patches, definitively too much for a
single series. So I am splitting the remaining changes three:

  - can: rework the CAN MTU logic [4]
  - can: netlink: preparation before introduction of CAN XL (this series)
  - CAN XL (will come right after the two preparation series get merged)

And thus, this series continues and finishes the preparation work done
in [3] and [4]. It contains all the refactoring needed to smoothly
introduce CAN XL. The goal is to:

  - split the functions in smaller pieces: CAN XL will introduce a
    fair amount of code. And some functions which are already fairly
    long (86 lines for can_validate(), 215 lines for can_changelink())
    would grow to disproportionate sizes if the CAN XL logic were to
    be inlined in those functions.

  - repurpose the existing code to handle both CAN FD and CAN XL: a
    huge part of CAN XL simply reuses the CAN FD logic. All the
    existing CAN FD logic is made more generic to handle both CAN FD
    and XL.

In more details:

  - Patch #1 moves struct data_bittiming_params from dev.h to
    bittiming.h and patch #2 makes can_get_relative_tdco() FD agnostic
    before also moving it to bittiming.h.

  - Patch #3 adds some comments to netlink.h tagging which IFLA
    symbols are FD specific.

  - Patches #4 to torvalds#6 are refactoring can_validate() and
    can_validate_bittiming().

  - Patches torvalds#7 to torvalds#11 are refactoring can_changelink() and
    can_tdc_changelink().

  - Patches torvalds#12 and torvalds#13 are refactoring can_get_size() and
    can_tdc_get_size().

  - Patches torvalds#14 to torvalds#17 are refactoring can_fill_info() and
    can_tdc_fill_info().

  - Patch torvalds#18 makes can_calc_tdco() FD agnostic.

  - Patch torvalds#19 adds can_get_ctrlmode_str() which converts control mode
    flags into strings. This is done in preparation of patch torvalds#20.

  - Patch torvalds#20 is the final patch and improves the user experience by
    providing detailed error messages whenever invalid parameters are
    provided. All those error messages came into handy when debugging
    the upcoming CAN XL patches.

Aside from the last patch, the other changes do not impact any of the
existing functionalities.

The follow up series which introduces CAN XL is nearly completed but
will be sent only once this one is approved: one thing at a time, I do
not want to overwhelm people (including myself).

[1] https://lore.kernel.org/linux-can/20241110155902.72807-16-mailhol.vincent@wanadoo.fr/
[2] https://lore.kernel.org/linux-can/c4771c16-c578-4a6d-baee-918fe276dbe9@wanadoo.fr/
[3] https://lore.kernel.org/linux-can/20241110155902.72807-16-mailhol.vincent@wanadoo.fr/
[4] https://lore.kernel.org/linux-can/20250923-can-fix-mtu-v2-0-984f9868db69@kernel.org/

Link: https://patch.msgid.link/20250923-canxl-netlink-prep-v4-0-e720d28f66fe@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Sep 25, 2025
We can write quite large journal entries.

It's possible that JOURNAL_ENTRY_SIZE_MAX is far larger than it needs to
be and should be reduced to something more reasonable (1-2MB), but
performance can be sensitive to anything that affects journal
pipelining - we've had issues there in the past, so we shouldn't change
that without first doing performance testing.

For now, apply the simple fix to deal with the following, which caused a
mount to fail:

[43579.348135] mount.bcachefs: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_ZERO|__GFP_COMP), nodemask=(null),cpuset=openrc.sshd,mems_allowed=0
[43579.349792] CPU: 3 UID: 0 PID: 14702 Comm: mount.bcachefs Tainted: G     U              6.16.0-rc4 torvalds#18 VOLUNTARY
[43579.349798] Tainted: [U]=USER
[43579.349799] Hardware name: MSI MS-7982/B150M PRO-VDH (MS-7982), BIOS 3.H0 07/10/2018
[43579.349800] Call Trace:
[43579.349803]  <TASK>
[43579.349806] dump_stack_lvl (lib/dump_stack.c:122)
[43579.349814] warn_alloc (mm/page_alloc.c:3744)
[43579.349821] ? __alloc_pages_direct_compact (./arch/x86/include/asm/jump_label.h:36 ./include/linux/delayacct.h:211 mm/page_alloc.c:3882)
[43579.349827] __alloc_pages_slowpath.constprop.0 (mm/page_alloc.c:4699)
[43579.349833] __alloc_frozen_pages_noprof (mm/page_alloc.c:4972)
[43579.349838] __alloc_pages_noprof (mm/page_alloc.c:4994)
[43579.349843] ___kmalloc_large_node (./include/linux/gfp.h:284 ./include/linux/gfp.h:311 mm/slub.c:4272)
[43579.349848] __kmalloc_large_noprof (./arch/x86/include/asm/bitops.h:414 ./include/asm-generic/getorder.h:46 mm/slub.c:4292)
[43579.349852] bch2_dev_journal_init (fs/bcachefs/journal.c:1629 (discriminator 4))
[43579.349857] __bch2_dev_attach_bdev (fs/bcachefs/super.c:1588)
[43579.349861] ? kernfs_create_link (fs/kernfs/symlink.c:48)
[43579.349865] bch2_dev_attach_bdev (fs/bcachefs/super.c:1630)
[43579.349868] bch2_fs_open (fs/bcachefs/super.c:2440)
[43579.349874] bch2_fs_get_tree (fs/bcachefs/fs.c:2474)
[43579.349880] vfs_get_tree (fs/super.c:1805)

Reported-by: Marcin Mirosław <marcin@mejor.pl>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Sep 26, 2025
…CAN XL step 3/3"

Vincent Mailhol <mailhol@kernel.org> says:

In November last year, I sent an RFC to introduce CAN XL [1]. That
RFC, despite positive feedback, was put on hold due to some unanswered
question concerning the PWM encoding [2].

While stuck, some small preparation work was done in parallel in [3]
by refactoring the struct can_priv and doing some trivial clean-up and
renaming. Initially, [3] received zero feedback but was eventually
merged after splitting it in smaller parts and resending it.

Finally, in July this year, we clarified the remaining mysteries about
PWM calculation, thus unlocking the series. Summer being a bit busy
because of some personal matters brings us to now.

After doing all the refactoring and adding all the CAN XL features,
the final result is more than 30 patches, definitively too much for a
single series. So I am splitting the remaining changes three:

  - can: rework the CAN MTU logic [4]
  - can: netlink: preparation before introduction of CAN XL (this series)
  - CAN XL (will come right after the two preparation series get merged)

And thus, this series continues and finishes the preparation work done
in [3] and [4]. It contains all the refactoring needed to smoothly
introduce CAN XL. The goal is to:

  - split the functions in smaller pieces: CAN XL will introduce a
    fair amount of code. And some functions which are already fairly
    long (86 lines for can_validate(), 215 lines for can_changelink())
    would grow to disproportionate sizes if the CAN XL logic were to
    be inlined in those functions.

  - repurpose the existing code to handle both CAN FD and CAN XL: a
    huge part of CAN XL simply reuses the CAN FD logic. All the
    existing CAN FD logic is made more generic to handle both CAN FD
    and XL.

In more details:

  - Patch #1 moves struct data_bittiming_params from dev.h to
    bittiming.h and patch #2 makes can_get_relative_tdco() FD agnostic
    before also moving it to bittiming.h.

  - Patch #3 adds some comments to netlink.h tagging which IFLA
    symbols are FD specific.

  - Patches #4 to torvalds#6 are refactoring can_validate() and
    can_validate_bittiming().

  - Patches torvalds#7 to torvalds#11 are refactoring can_changelink() and
    can_tdc_changelink().

  - Patches torvalds#12 and torvalds#13 are refactoring can_get_size() and
    can_tdc_get_size().

  - Patches torvalds#14 to torvalds#17 are refactoring can_fill_info() and
    can_tdc_fill_info().

  - Patch torvalds#18 makes can_calc_tdco() FD agnostic.

  - Patch torvalds#19 adds can_get_ctrlmode_str() which converts control mode
    flags into strings. This is done in preparation of patch torvalds#20.

  - Patch torvalds#20 is the final patch and improves the user experience by
    providing detailed error messages whenever invalid parameters are
    provided. All those error messages came into handy when debugging
    the upcoming CAN XL patches.

Aside from the last patch, the other changes do not impact any of the
existing functionalities.

The follow up series which introduces CAN XL is nearly completed but
will be sent only once this one is approved: one thing at a time, I do
not want to overwhelm people (including myself).

[1] https://lore.kernel.org/linux-can/20241110155902.72807-16-mailhol.vincent@wanadoo.fr/
[2] https://lore.kernel.org/linux-can/c4771c16-c578-4a6d-baee-918fe276dbe9@wanadoo.fr/
[3] https://lore.kernel.org/linux-can/20241110155902.72807-16-mailhol.vincent@wanadoo.fr/
[4] https://lore.kernel.org/linux-can/20250923-can-fix-mtu-v2-0-984f9868db69@kernel.org/

Link: https://patch.msgid.link/20250923-canxl-netlink-prep-v4-0-e720d28f66fe@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Oct 13, 2025
…s' and 'T'

When perf report with annotation for a symbol, press 's' and 'T', then exit
the annotate browser. Once annoate the same symbol, the annoate browser
will crash. Stack trace as below:

Perf: Segmentation fault
-------- backtrace --------
    #0 0x55d365 in ui__signal_backtrace setup.c:0
    #1 0x7f5ff1a3e930 in __restore_rt libc.so.6[3e930]
    #2 0x570f08 in arch__is perf[570f08]
    #3 0x562186 in annotate_get_insn_location perf[562186]
    #4 0x562626 in __hist_entry__get_data_type annotate.c:0
    #5 0x56476d in annotation_line__write perf[56476d]
    torvalds#6 0x54e2db in annotate_browser__write annotate.c:0
    torvalds#7 0x54d061 in ui_browser__list_head_refresh perf[54d061]
    torvalds#8 0x54dc9e in annotate_browser__refresh annotate.c:0
    torvalds#9 0x54c03d in __ui_browser__refresh browser.c:0
    torvalds#10 0x54ccf8 in ui_browser__run perf[54ccf8]
    torvalds#11 0x54eb92 in __hist_entry__tui_annotate perf[54eb92]
    torvalds#12 0x552293 in do_annotate hists.c:0
    torvalds#13 0x55941c in evsel__hists_browse hists.c:0
    torvalds#14 0x55b00f in evlist__tui_browse_hists perf[55b00f]
    torvalds#15 0x42ff02 in cmd_report perf[42ff02]
    torvalds#16 0x494008 in run_builtin perf.c:0
    torvalds#17 0x494305 in handle_internal_command perf.c:0
    torvalds#18 0x410547 in main perf[410547]
    torvalds#19 0x7f5ff1a295d0 in __libc_start_call_main libc.so.6[295d0]
    torvalds#20 0x7f5ff1a29680 in __libc_start_main@@GLIBC_2.34 libc.so.6[29680]
    torvalds#21 0x410b75 in _start perf[410b75]

Signed-off-by: Tianyou Li <tianyou.li@intel.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Oct 14, 2025
Commit 1d2da79 ("pinctrl: renesas: rzg2l: Avoid configuring ISEL in
gpio_irq_{en,dis}able*()") dropped the configuration of ISEL from
struct irq_chip::{irq_enable, irq_disable} APIs and moved it to
struct gpio_chip::irq::{child_to_parent_hwirq,
child_irq_domain_ops::free} APIs to fix spurious IRQs.

After commit 1d2da79 ("pinctrl: renesas: rzg2l: Avoid configuring ISEL
in gpio_irq_{en,dis}able*()"), ISEL was no longer configured properly on
resume. This is because the pinctrl resume code used
struct irq_chip::irq_enable  (called from rzg2l_gpio_irq_restore()) to
reconfigure the wakeup interrupts. Some drivers (e.g. Ethernet) may also
reconfigure non-wakeup interrupts on resume through their own code,
eventually calling struct irq_chip::irq_enable.

Fix this by adding ISEL configuration back into the
struct irq_chip::irq_enable API and on resume path for wakeup interrupts.

As struct irq_chip::irq_enable needs now to lock to update the ISEL,
convert the struct rzg2l_pinctrl::lock to a raw spinlock and replace the
locking API calls with the raw variants. Otherwise the lockdep reports
invalid wait context when probing the adv7511 module on RZ/G2L:

 [ BUG: Invalid wait context ]
 6.17.0-rc5-next-20250911-00001-gfcfac22533c9 torvalds#18 Not tainted
 -----------------------------
 (udev-worker)/165 is trying to lock:
 ffff00000e3664a8 (&pctrl->lock){....}-{3:3}, at: rzg2l_gpio_irq_enable+0x38/0x78
 other info that might help us debug this:
 context-{5:5}
 3 locks held by (udev-worker)/165:
 #0: ffff00000e890108 (&dev->mutex){....}-{4:4}, at: __driver_attach+0x90/0x1ac
 #1: ffff000011c07240 (request_class){+.+.}-{4:4}, at: __setup_irq+0xb4/0x6dc
 #2: ffff000011c070c8 (lock_class){....}-{2:2}, at: __setup_irq+0xdc/0x6dc
 stack backtrace:
 CPU: 1 UID: 0 PID: 165 Comm: (udev-worker) Not tainted 6.17.0-rc5-next-20250911-00001-gfcfac22533c9 torvalds#18 PREEMPT
 Hardware name: Renesas SMARC EVK based on r9a07g044l2 (DT)
 Call trace:
 show_stack+0x18/0x24 (C)
 dump_stack_lvl+0x90/0xd0
 dump_stack+0x18/0x24
 __lock_acquire+0xa14/0x20b4
 lock_acquire+0x1c8/0x354
 _raw_spin_lock_irqsave+0x60/0x88
 rzg2l_gpio_irq_enable+0x38/0x78
 irq_enable+0x40/0x8c
 __irq_startup+0x78/0xa4
 irq_startup+0x108/0x16c
 __setup_irq+0x3c0/0x6dc
 request_threaded_irq+0xec/0x1ac
 devm_request_threaded_irq+0x80/0x134
 adv7511_probe+0x928/0x9a4 [adv7511]
 i2c_device_probe+0x22c/0x3dc
 really_probe+0xbc/0x2a0
 __driver_probe_device+0x78/0x12c
 driver_probe_device+0x40/0x164
 __driver_attach+0x9c/0x1ac
 bus_for_each_dev+0x74/0xd0
 driver_attach+0x24/0x30
 bus_add_driver+0xe4/0x208
 driver_register+0x60/0x128
 i2c_register_driver+0x48/0xd0
 adv7511_init+0x5c/0x1000 [adv7511]
 do_one_initcall+0x64/0x30c
 do_init_module+0x58/0x23c
 load_module+0x1bcc/0x1d40
 init_module_from_file+0x88/0xc4
 idempotent_init_module+0x188/0x27c
 __arm64_sys_finit_module+0x68/0xac
 invoke_syscall+0x48/0x110
 el0_svc_common.constprop.0+0xc0/0xe0
 do_el0_svc+0x1c/0x28
 el0_svc+0x4c/0x160
 el0t_64_sync_handler+0xa0/0xe4
 el0t_64_sync+0x198/0x19c

Having ISEL configuration back into the struct irq_chip::irq_enable API
should be safe with respect to spurious IRQs, as in the probe case IRQs
are enabled anyway in struct gpio_chip::irq::child_to_parent_hwirq. No
spurious IRQs were detected on suspend/resume, boot, ethernet link
insert/remove tests (executed on RZ/G3S). Boot, ethernet link
insert/remove tests were also executed successfully on RZ/G2L.

Fixes: 1d2da79 ("pinctrl: renesas: rzg2l: Avoid configuring ISEL in gpio_irq_{en,dis}able*(")
Cc: stable@vger.kernel.org
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250912095308.3603704-1-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Oct 15, 2025
…s' and 'T'

When perf report with annotation for a symbol, press 's' and 'T', then exit
the annotate browser. Once annotate the same symbol, the annotate browser
will crash.

The browser.arch was required to be correctly updated when data type
feature was enabled by 'T'. Usually it was initialized by symbol__annotate2
function. If a symbol has already been correctly annotated at the first
time, it should not call the symbol__annotate2 function again, thus the
browser.arch will not get initialized. Then at the second time to show the
annotate browser, the data type needs to be displayed but the browser.arch
is empty.

Stack trace as below:

Perf: Segmentation fault
-------- backtrace --------
    #0 0x55d365 in ui__signal_backtrace setup.c:0
    #1 0x7f5ff1a3e930 in __restore_rt libc.so.6[3e930]
    #2 0x570f08 in arch__is perf[570f08]
    #3 0x562186 in annotate_get_insn_location perf[562186]
    #4 0x562626 in __hist_entry__get_data_type annotate.c:0
    #5 0x56476d in annotation_line__write perf[56476d]
    torvalds#6 0x54e2db in annotate_browser__write annotate.c:0
    torvalds#7 0x54d061 in ui_browser__list_head_refresh perf[54d061]
    torvalds#8 0x54dc9e in annotate_browser__refresh annotate.c:0
    torvalds#9 0x54c03d in __ui_browser__refresh browser.c:0
    torvalds#10 0x54ccf8 in ui_browser__run perf[54ccf8]
    torvalds#11 0x54eb92 in __hist_entry__tui_annotate perf[54eb92]
    torvalds#12 0x552293 in do_annotate hists.c:0
    torvalds#13 0x55941c in evsel__hists_browse hists.c:0
    torvalds#14 0x55b00f in evlist__tui_browse_hists perf[55b00f]
    torvalds#15 0x42ff02 in cmd_report perf[42ff02]
    torvalds#16 0x494008 in run_builtin perf.c:0
    torvalds#17 0x494305 in handle_internal_command perf.c:0
    torvalds#18 0x410547 in main perf[410547]
    torvalds#19 0x7f5ff1a295d0 in __libc_start_call_main libc.so.6[295d0]
    torvalds#20 0x7f5ff1a29680 in __libc_start_main@@GLIBC_2.34 libc.so.6[29680]
    torvalds#21 0x410b75 in _start perf[410b75]

Fixes: 1d4374a ("perf annotate: Add 'T' hot key to toggle data type display")
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Oct 16, 2025
…s' and 'T'

When perf report with annotation for a symbol, press 's' and 'T', then exit
the annotate browser. Once annotate the same symbol, the annotate browser
will crash.

The browser.arch was required to be correctly updated when data type
feature was enabled by 'T'. Usually it was initialized by symbol__annotate2
function. If a symbol has already been correctly annotated at the first
time, it should not call the symbol__annotate2 function again, thus the
browser.arch will not get initialized. Then at the second time to show the
annotate browser, the data type needs to be displayed but the browser.arch
is empty.

Stack trace as below:

Perf: Segmentation fault
-------- backtrace --------
    #0 0x55d365 in ui__signal_backtrace setup.c:0
    #1 0x7f5ff1a3e930 in __restore_rt libc.so.6[3e930]
    #2 0x570f08 in arch__is perf[570f08]
    #3 0x562186 in annotate_get_insn_location perf[562186]
    #4 0x562626 in __hist_entry__get_data_type annotate.c:0
    #5 0x56476d in annotation_line__write perf[56476d]
    torvalds#6 0x54e2db in annotate_browser__write annotate.c:0
    torvalds#7 0x54d061 in ui_browser__list_head_refresh perf[54d061]
    torvalds#8 0x54dc9e in annotate_browser__refresh annotate.c:0
    torvalds#9 0x54c03d in __ui_browser__refresh browser.c:0
    torvalds#10 0x54ccf8 in ui_browser__run perf[54ccf8]
    torvalds#11 0x54eb92 in __hist_entry__tui_annotate perf[54eb92]
    torvalds#12 0x552293 in do_annotate hists.c:0
    torvalds#13 0x55941c in evsel__hists_browse hists.c:0
    torvalds#14 0x55b00f in evlist__tui_browse_hists perf[55b00f]
    torvalds#15 0x42ff02 in cmd_report perf[42ff02]
    torvalds#16 0x494008 in run_builtin perf.c:0
    torvalds#17 0x494305 in handle_internal_command perf.c:0
    torvalds#18 0x410547 in main perf[410547]
    torvalds#19 0x7f5ff1a295d0 in __libc_start_call_main libc.so.6[295d0]
    torvalds#20 0x7f5ff1a29680 in __libc_start_main@@GLIBC_2.34 libc.so.6[29680]
    torvalds#21 0x410b75 in _start perf[410b75]

Fixes: 1d4374a ("perf annotate: Add 'T' hot key to toggle data type display")
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Oct 20, 2025
…s' and 'T'

When perf report with annotation for a symbol, press 's' and 'T', then exit
the annotate browser. Once annotate the same symbol, the annotate browser
will crash.

The browser.arch was required to be correctly updated when data type
feature was enabled by 'T'. Usually it was initialized by symbol__annotate2
function. If a symbol has already been correctly annotated at the first
time, it should not call the symbol__annotate2 function again, thus the
browser.arch will not get initialized. Then at the second time to show the
annotate browser, the data type needs to be displayed but the browser.arch
is empty.

Stack trace as below:

Perf: Segmentation fault
-------- backtrace --------
    #0 0x55d365 in ui__signal_backtrace setup.c:0
    #1 0x7f5ff1a3e930 in __restore_rt libc.so.6[3e930]
    #2 0x570f08 in arch__is perf[570f08]
    #3 0x562186 in annotate_get_insn_location perf[562186]
    #4 0x562626 in __hist_entry__get_data_type annotate.c:0
    #5 0x56476d in annotation_line__write perf[56476d]
    torvalds#6 0x54e2db in annotate_browser__write annotate.c:0
    torvalds#7 0x54d061 in ui_browser__list_head_refresh perf[54d061]
    torvalds#8 0x54dc9e in annotate_browser__refresh annotate.c:0
    torvalds#9 0x54c03d in __ui_browser__refresh browser.c:0
    torvalds#10 0x54ccf8 in ui_browser__run perf[54ccf8]
    torvalds#11 0x54eb92 in __hist_entry__tui_annotate perf[54eb92]
    torvalds#12 0x552293 in do_annotate hists.c:0
    torvalds#13 0x55941c in evsel__hists_browse hists.c:0
    torvalds#14 0x55b00f in evlist__tui_browse_hists perf[55b00f]
    torvalds#15 0x42ff02 in cmd_report perf[42ff02]
    torvalds#16 0x494008 in run_builtin perf.c:0
    torvalds#17 0x494305 in handle_internal_command perf.c:0
    torvalds#18 0x410547 in main perf[410547]
    torvalds#19 0x7f5ff1a295d0 in __libc_start_call_main libc.so.6[295d0]
    torvalds#20 0x7f5ff1a29680 in __libc_start_main@@GLIBC_2.34 libc.so.6[29680]
    torvalds#21 0x410b75 in _start perf[410b75]

Fixes: 1d4374a ("perf annotate: Add 'T' hot key to toggle data type display")
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Oct 20, 2025
…s' and 'T'

When perf report with annotation for a symbol, press 's' and 'T', then exit
the annotate browser. Once annotate the same symbol, the annotate browser
will crash.

The browser.arch was required to be correctly updated when data type
feature was enabled by 'T'. Usually it was initialized by symbol__annotate2
function. If a symbol has already been correctly annotated at the first
time, it should not call the symbol__annotate2 function again, thus the
browser.arch will not get initialized. Then at the second time to show the
annotate browser, the data type needs to be displayed but the browser.arch
is empty.

Stack trace as below:

Perf: Segmentation fault
-------- backtrace --------
    #0 0x55d365 in ui__signal_backtrace setup.c:0
    #1 0x7f5ff1a3e930 in __restore_rt libc.so.6[3e930]
    #2 0x570f08 in arch__is perf[570f08]
    #3 0x562186 in annotate_get_insn_location perf[562186]
    #4 0x562626 in __hist_entry__get_data_type annotate.c:0
    #5 0x56476d in annotation_line__write perf[56476d]
    torvalds#6 0x54e2db in annotate_browser__write annotate.c:0
    torvalds#7 0x54d061 in ui_browser__list_head_refresh perf[54d061]
    torvalds#8 0x54dc9e in annotate_browser__refresh annotate.c:0
    torvalds#9 0x54c03d in __ui_browser__refresh browser.c:0
    torvalds#10 0x54ccf8 in ui_browser__run perf[54ccf8]
    torvalds#11 0x54eb92 in __hist_entry__tui_annotate perf[54eb92]
    torvalds#12 0x552293 in do_annotate hists.c:0
    torvalds#13 0x55941c in evsel__hists_browse hists.c:0
    torvalds#14 0x55b00f in evlist__tui_browse_hists perf[55b00f]
    torvalds#15 0x42ff02 in cmd_report perf[42ff02]
    torvalds#16 0x494008 in run_builtin perf.c:0
    torvalds#17 0x494305 in handle_internal_command perf.c:0
    torvalds#18 0x410547 in main perf[410547]
    torvalds#19 0x7f5ff1a295d0 in __libc_start_call_main libc.so.6[295d0]
    torvalds#20 0x7f5ff1a29680 in __libc_start_main@@GLIBC_2.34 libc.so.6[29680]
    torvalds#21 0x410b75 in _start perf[410b75]

Fixes: 1d4374a ("perf annotate: Add 'T' hot key to toggle data type display")
Reviewed-by: James Clark <james.clark@linaro.org>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Oct 22, 2025
… 'T'

When perf report with annotation for a symbol, press 's' and 'T', then exit
the annotate browser. Once annotate the same symbol, the annotate browser
will crash.

The browser.arch was required to be correctly updated when data type
feature was enabled by 'T'. Usually it was initialized by symbol__annotate2
function. If a symbol has already been correctly annotated at the first
time, it should not call the symbol__annotate2 function again, thus the
browser.arch will not get initialized. Then at the second time to show the
annotate browser, the data type needs to be displayed but the browser.arch
is empty.

Stack trace as below:

Perf: Segmentation fault
-------- backtrace --------
    #0 0x55d365 in ui__signal_backtrace setup.c:0
    #1 0x7f5ff1a3e930 in __restore_rt libc.so.6[3e930]
    #2 0x570f08 in arch__is perf[570f08]
    #3 0x562186 in annotate_get_insn_location perf[562186]
    #4 0x562626 in __hist_entry__get_data_type annotate.c:0
    #5 0x56476d in annotation_line__write perf[56476d]
    torvalds#6 0x54e2db in annotate_browser__write annotate.c:0
    torvalds#7 0x54d061 in ui_browser__list_head_refresh perf[54d061]
    torvalds#8 0x54dc9e in annotate_browser__refresh annotate.c:0
    torvalds#9 0x54c03d in __ui_browser__refresh browser.c:0
    torvalds#10 0x54ccf8 in ui_browser__run perf[54ccf8]
    torvalds#11 0x54eb92 in __hist_entry__tui_annotate perf[54eb92]
    torvalds#12 0x552293 in do_annotate hists.c:0
    torvalds#13 0x55941c in evsel__hists_browse hists.c:0
    torvalds#14 0x55b00f in evlist__tui_browse_hists perf[55b00f]
    torvalds#15 0x42ff02 in cmd_report perf[42ff02]
    torvalds#16 0x494008 in run_builtin perf.c:0
    torvalds#17 0x494305 in handle_internal_command perf.c:0
    torvalds#18 0x410547 in main perf[410547]
    torvalds#19 0x7f5ff1a295d0 in __libc_start_call_main libc.so.6[295d0]
    torvalds#20 0x7f5ff1a29680 in __libc_start_main@@GLIBC_2.34 libc.so.6[29680]
    torvalds#21 0x410b75 in _start perf[410b75]

Fixes: 1d4374a ("perf annotate: Add 'T' hot key to toggle data type display")
Reviewed-by: James Clark <james.clark@linaro.org>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Oct 22, 2025
The following command will hang consistently when the GPU is being used
due to non regular files (e.g. /dev/dri/renderD129, /dev/dri/card2)
being opened to read build IDs:

 $ perf record -asdg -e cpu-clock -- true

Change to non blocking reads to avoid the hang here:

  #0  __libc_pread64 (offset=<optimised out>, count=0, buf=0x7fffffffa4a0, fd=237) at ../sysdeps/unix/sysv/linux/pread64.c:25
  #1  __libc_pread64 (fd=237, buf=0x7fffffffa4a0, count=0, offset=0) at ../sysdeps/unix/sysv/linux/pread64.c:23
  #2  ?? () from /lib/x86_64-linux-gnu/libelf.so.1
  #3  read_build_id (filename=0x5555562df333 "/dev/dri/card2", bid=0x7fffffffb680, block=true) at util/symbol-elf.c:880
  #4  filename__read_build_id (filename=0x5555562df333 "/dev/dri/card2", bid=0x7fffffffb680, block=true) at util/symbol-elf.c:924
  #5  dsos__read_build_ids_cb (dso=0x5555562df1d0, data=0x7fffffffb750) at util/dsos.c:84
  torvalds#6  __dsos__for_each_dso (dsos=0x55555623de68, cb=0x5555557e7030 <dsos__read_build_ids_cb>, data=0x7fffffffb750) at util/dsos.c:59
  torvalds#7  dsos__for_each_dso (dsos=0x55555623de68, cb=0x5555557e7030 <dsos__read_build_ids_cb>, data=0x7fffffffb750) at util/dsos.c:503
  torvalds#8  dsos__read_build_ids (dsos=0x55555623de68, with_hits=true) at util/dsos.c:107
  torvalds#9  machine__read_build_ids (machine=0x55555623da58, with_hits=true) at util/build-id.c:950
  torvalds#10 perf_session__read_build_ids (session=0x55555623d840, with_hits=true) at util/build-id.c:956
  torvalds#11 write_build_id (ff=0x7fffffffb958, evlist=0x5555562323d0) at util/header.c:327
  torvalds#12 do_write_feat (ff=0x7fffffffb958, type=2, p=0x7fffffffb950, evlist=0x5555562323d0, fc=0x0) at util/header.c:3588
  torvalds#13 perf_header__adds_write (header=0x55555623d840, evlist=0x5555562323d0, fd=3, fc=0x0) at util/header.c:3632
  torvalds#14 perf_session__do_write_header (session=0x55555623d840, evlist=0x5555562323d0, fd=3, at_exit=true, fc=0x0, write_attrs_after_data=false) at util/header.c:3756
  torvalds#15 perf_session__write_header (session=0x55555623d840, evlist=0x5555562323d0, fd=3, at_exit=true) at util/header.c:3796
  torvalds#16 record__finish_output (rec=0x5555561838d8 <record>) at builtin-record.c:1899
  torvalds#17 __cmd_record (rec=0x5555561838d8 <record>, argc=2, argv=0x7fffffffe320) at builtin-record.c:2967
  torvalds#18 cmd_record (argc=2, argv=0x7fffffffe320) at builtin-record.c:4453
  torvalds#19 run_builtin (p=0x55555618cbb0 <commands+288>, argc=9, argv=0x7fffffffe320) at perf.c:349
  torvalds#20 handle_internal_command (argc=9, argv=0x7fffffffe320) at perf.c:401
  torvalds#21 run_argv (argcp=0x7fffffffe16c, argv=0x7fffffffe160) at perf.c:445
  torvalds#22 main (argc=9, argv=0x7fffffffe320) at perf.c:553

Fixes: 53b00ff ("perf record: Make --buildid-mmap the default")
Signed-off-by: James Clark <james.clark@linaro.org>
mwilczy pushed a commit to mwilczy/linux that referenced this pull request Nov 10, 2025
This series enables the display subsystem on the StarFive JH7110 SoC.
This hardware has a complex set of dependencies that this series aims to
solve.

The dom_vout (Video Output) block is a wrapper containing the display
controller (dc8200), the clock generator (voutcrg), and the HDMI IP, all
of which are managed by a single power domain (PD_VOUT).

More importantly, the HDMI IP is a monolithic block (controller and PHY
in one register space) that has a circular dependency with voutcrg:
1. The HDMI Controller needs clocks (like sysclk, mclk) from voutcrg to
   function.
2. The voutcrg (for its pixel MUXes) needs the variable pixel clock,
   which is generated by the HDMI PHY.

This series breaks this dependency loop by modeling the hardware
correctly:
1. A new vout-subsystem wrapper driver is added. It manages the shared
   PD_VOUT power domain and top level bus clocks. It uses
   of_platform_populate() to ensure its children (hdmi_mfd, voutcrg,
   dc8200) are probed only after power is on.
2. The monolithic hdmi node is refactored into an MFD. A new hdmi-mfd
   parent driver is added, which maps the shared register space and
   creates a regmap.
3. The MFD populates two children:
   - hdmi-phy: A new PHY driver that binds to the MFD. Its only
     dependency is the xin24m reference clock. It acts as the clock
     provider for the variable pixel clock (hdmi_pclk).
   - hdmi-controller: A new DRM bridge driver. It consumes clocks from
     voutcrg and the hdmi_pclk/PHY from its sibling hdmi-phy driver.
4. The generic inno-hdmi bridge library is refactored to accept a regmap
   from a parent MFD, making this model possible.

This MFD split breaks the circular dependency, as the kernel's deferred
probe can now find a correct, linear probe order: hdmi-phy (probes
first) -> voutcrg (probes second) -> hdmi-controller (probes third).

This series provides all the necessary dt-bindings, the new drivers, the
modification to inno-hdmi, and the final device tree changes to enable
the display.

Series depends on patchsets that are not merged yet:
 - dc8200 driver [1]
 - th1520 reset (dependency of dc8200 series) [2]
 - inno-hdmi bridge [3]

Testing:
I've tested on my monitor using `modetest` for following modes:
  #0 2560x1440 59.95 2560 2608 2640 2720 1440 1443 1448 1481 241500
     flags: phsync, nvsync; type: preferred, driver [DOESN"T WORK]
  #1 2048x1080 60.00 2048 2096 2128 2208 1080 1083 1093 1111 147180
     flags: phsync, nvsync; type: driver    [DOESN"T WORK]
  #2 2048x1080 24.00 2048 2096 2128 2208 1080 1083 1093 1099 58230
     flags: phsync, nvsync; type: driver     [DOESN'T WORK]
  #3 1920x1080 60.00 1920 2008 2052 2200 1080 1084 1089 1125 148500
     flags: phsync, pvsync; type: driver    [WORKS]
  #4 1920x1080 59.94 1920 2008 2052 2200 1080 1084 1089 1125 148352
     flags: phsync, pvsync; type: driver    [WORKS]
  #5 1920x1080 50.00 1920 2448 2492 2640 1080 1084 1089 1125 148500
     flags: phsync, pvsync; type: driver    [WORKS]
  torvalds#6 1600x1200 60.00 1600 1664 1856 2160 1200 1201 1204 1250 162000
     flags: phsync, pvsync; type: driver    [WORKS]
  torvalds#7 1280x1024 75.02 1280 1296 1440 1688 1024 1025 1028 1066 135000
     flags: phsync, pvsync; type: driver    [WORKS]
  torvalds#8 1280x1024 60.02 1280 1328 1440 1688 1024 1025 1028 1066 108000
     flags: phsync, pvsync; type: driver    [WORKS]
  torvalds#9 1152x864 75.00 1152 1216 1344 1600 864 865 868 900 108000 flags:
     phsync, pvsync; type: driver   [WORKS]
  torvalds#10 1280x720 60.00 1280 1390 1430 1650 720 725 730 750 74250 flags:
      phsync, pvsync; type: driver   [WORKS]
  torvalds#11 1280x720 59.94 1280 1390 1430 1650 720 725 730 750 74176 flags:
      phsync, pvsync; type: driver   [WORKS]
  torvalds#12 1280x720 50.00 1280 1720 1760 1980 720 725 730 750 74250 flags:
      phsync, pvsync; type: driver   [WORKS]
  torvalds#13 1024x768 75.03 1024 1040 1136 1312 768 769 772 800 78750 flags:
      phsync, pvsync; type: driver   [WORKS]
  torvalds#14 1024x768 60.00 1024 1048 1184 1344 768 771 777 806 65000 flags:
      nhsync, nvsync; type: driver   [WORKS]
  torvalds#15 800x600 75.00 800 816 896 1056 600 601 604 625 49500 flags:
      phsync, pvsync; type: driver  [WORKS]
  torvalds#16 800x600 60.32 800 840 968 1056 600 601 605 628 40000 flags:
      phsync, pvsync; type: driver  [WORKS]
  torvalds#17 720x576 50.00 720 732 796 864 576 581 586 625 27000 flags: nhsync,
      nvsync; type: driver   [WORKS]
  torvalds#18 720x480 60.00 720 736 798 858 480 489 495 525 27027 flags: nhsync,
      nvsync; type: driver   [WORKS]
  torvalds#19 720x480 59.94 720 736 798 858 480 489 495 525 27000 flags: nhsync,
      nvsync; type: driver   [WORKS]
  torvalds#20 640x480 75.00 640 656 720 840 480 481 484 500 31500 flags: nhsync,
      nvsync; type: driver   [WORKS]
  torvalds#21 640x480 60.00 640 656 752 800 480 490 492 525 25200 flags: nhsync,
      nvsync; type: driver   [WORKS]
  torvalds#22 640x480 59.94 640 656 752 800 480 490 492 525 25175 flags: nhsync,
      nvsync; type: driver   [WORKS]
  torvalds#23 720x400 70.08 720 738 846 900 400 412 414 449 28320 flags: nhsync,
      pvsync; type: driver   [DOESN'T WORK]

I believe this is a PHY tuning issue that can be fixed in the new
phy-jh7110-inno-hdmi.c driver without changing the overall architecture.
I plan to continue debugging these modes and will submit follow up fixes
as needed.

The core architectural plumbing is sound and ready for review.

Notes:
- The JH7110 does not have a centralized MAINTAINERS entry like the
  TH1520, and driver maintainership seems fragmented. I have therefore
  added a MAINTAINERS entry for the display subsystem and am willing to
  help with its maintenance.
- I am aware that the new phy-jh7110-inno-hdmi.c driver (patch 12) is a
  near duplicate of the existing phy-rockchip-inno-hdmi.c. This
  duplication is intentional and temporary for this RFC series.  My goal
  is to first get feedback on the overall architecture (the vout-subsystem
  wrapper, the hdmi-mfd split, and the dual-function PHY/CLK driver).

  If this architectural approach is acceptable, I will rework the PHY
  driver for a formal v1 submission. This will involve refactoring the
  common logic from the Rockchip PHY into a generic core driver that both
  the Rockchip and this new StarFive PHY driver will use. 

Many thanks to the Icenowy Zheng who developed a dc8200 driver, as well
as helped me understand how the SoC and the display pipeline works.

[1] - https://lore.kernel.org/all/20250921083446.790374-1-uwu@icenowy.me/
[2] - https://lore.kernel.org/all/20251014131032.49616-1-ziyao@disroot.org/
[3] - https://lore.kernel.org/all/20251016083843.76675-1-andyshrk@163.com/

# Describe the purpose of this series. The information you put here
# will be used by the project maintainer to make a decision whether
# your patches should be reviewed, and in what priority order. Please be
# very detailed and link to any relevant discussions or sites that the
# maintainer can review to better understand your proposed changes. If you
# only have a single patch in your series, the contents of the cover
# letter will be appended to the "under-the-cut" portion of the patch.

# Lines starting with # will be removed from the cover letter. You can
# use them to add notes or reminders to yourself. If you want to use
# markdown headers in your cover letter, start the line with ">#".

# You can add trailers to the cover letter. Any email addresses found in
# these trailers will be added to the addresses specified/generated
# during the b4 send stage. You can also run "b4 prep --auto-to-cc" to
# auto-populate the To: and Cc: trailers based on the code being
# modified.

To: Michal Wilczynski <m.wilczynski@samsung.com>
To: Conor Dooley <conor@kernel.org>
To: Rob Herring <robh@kernel.org>
To: Krzysztof Kozlowski <krzk+dt@kernel.org>
To: Emil Renner Berthing <kernel@esmil.dk>
To: Hal Feng <hal.feng@starfivetech.com>
To: Michael Turquette <mturquette@baylibre.com>
To: Stephen Boyd <sboyd@kernel.org>
To: Conor Dooley <conor+dt@kernel.org>
To: Xingyu Wu <xingyu.wu@starfivetech.com>
To: Vinod Koul <vkoul@kernel.org>
To: Kishon Vijay Abraham I <kishon@kernel.org>
To: Andrzej Hajda <andrzej.hajda@intel.com>
To: Neil Armstrong <neil.armstrong@linaro.org>
To: Robert Foss <rfoss@kernel.org>
To: Laurent Pinchart <Laurent.pinchart@ideasonboard.com>
To: Jonas Karlman <jonas@kwiboo.se>
To: Jernej Skrabec <jernej.skrabec@gmail.com>
To: David Airlie <airlied@gmail.com>
To: Simona Vetter <simona@ffwll.ch>
To: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
To: Maxime Ripard <mripard@kernel.org>
To: Thomas Zimmermann <tzimmermann@suse.de>
To: Lee Jones <lee@kernel.org>
To: Philipp Zabel <p.zabel@pengutronix.de>
To: Paul Walmsley <paul.walmsley@sifive.com>
To: Palmer Dabbelt <palmer@dabbelt.com>
To: Albert Ou <aou@eecs.berkeley.edu>
To: Alexandre Ghiti <alex@ghiti.fr>
To: Marek Szyprowski <m.szyprowski@samsung.com>
To: Icenowy Zheng <uwu@icenowy.me>
To: Maud Spierings <maudspierings@gocontroll.com>
To: Andy Yan <andyshrk@163.com>
To: Heiko Stuebner <heiko@sntech.de>
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-clk@vger.kernel.org
Cc: linux-phy@lists.infradead.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-riscv@lists.infradead.org

---
Changes in v2:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
- Link to v1: https://lore.kernel.org/r/20251108-jh7110-clean-send-v1-0-06bf43bb76b1@samsung.com



--- b4-submit-tracking ---
# This section is used internally by b4 prep for tracking purposes.
{
  "series": {
    "revision": 2,
    "change-id": "20251031-jh7110-clean-send-7d2242118026",
    "prefixes": [
      "RFC"
    ],
    "prerequisites": [
      "message-id: <20251014131032.49616-1-ziyao@disroot.org>",
      "message-id: <20251016083843.76675-1-andyshrk@163.com>",
      "message-id: <20250921083446.790374-1-uwu@icenowy.me>",
      "base-commit: v6.17-rc6"
    ],
    "history": {
      "v1": [
        "20251108-jh7110-clean-send-v1-0-06bf43bb76b1@samsung.com"
      ]
    }
  }
}
ruslanbay pushed a commit to ruslanbay/linux that referenced this pull request Nov 21, 2025
Uprobe needs to fetch args into a percpu buffer, and then copy to ring
buffer to avoid non-atomic context problem.

Sometimes user-space strings, arrays can be very large, but the size of
percpu buffer is only page size. And store_trace_args() won't check
whether these data exceeds a single page or not, caused out-of-bounds
memory access.

It could be reproduced by following steps:
1. build kernel with CONFIG_KASAN enabled
2. save follow program as test.c

```
\#include <stdio.h>
\#include <stdlib.h>
\#include <string.h>

// If string length large than MAX_STRING_SIZE, the fetch_store_strlen()
// will return 0, cause __get_data_size() return shorter size, and
// store_trace_args() will not trigger out-of-bounds access.
// So make string length less than 4096.
\#define STRLEN 4093

void generate_string(char *str, int n)
{
     int i;
     for (i = 0; i < n; ++i)
     {
         char c = i % 26 + 'a';
         str[i] = c;
     }
     str[n-1] = '\0';
}

void print_string(char *str)
{
     printf("%s\n", str);
}

int main()
{
     char tmp[STRLEN];

     generate_string(tmp, STRLEN);
     print_string(tmp);

     return 0;
}
```
3. compile program
`gcc -o test test.c`

4. get the offset of `print_string()`
```
objdump -t test | grep -w print_string
0000000000401199 g     F .text  000000000000001b              print_string
```

5. configure uprobe with offset 0x1199
```
off=0x1199

cd/sys/kernel/debug/tracing/
echo "p /root/test:${off} arg1=+0(%di):ustring arg2=\$comm arg3=+0(%di):ustring"
  > uprobe_events
echo 1 > events/uprobes/enable
echo 1 > tracing_on
```

6. run `test`, and kasan will report error.
==================================================================
BUG: KASAN: use-after-free in strncpy_from_user+0x1d6/0x1f0
Write of size 8 at addr ffff88812311c004 by task test/499CPU: 0 UID: 0 PID: 499 Comm: test Not tainted 6.12.0-rc3+ torvalds#18
Hardware name: Red Hat KVM, BIOS 1.16.0-4.al8 04/01/2014
Call Trace:
  <TASK>
  dump_stack_lvl+0x55/0x70
  print_address_description.constprop.0+0x27/0x310
  kasan_report+0x10f/0x120
  ? strncpy_from_user+0x1d6/0x1f0
  strncpy_from_user+0x1d6/0x1f0
  ? rmqueue.constprop.0+0x70d/0x2ad0
  process_fetch_insn+0xb26/0x1470
  ? __pfx_process_fetch_insn+0x10/0x10
  ? _raw_spin_lock+0x85/0xe0
  ? __pfx__raw_spin_lock+0x10/0x10
  ? __pte_offset_map+0x1f/0x2d0
  ? unwind_next_frame+0xc5f/0x1f80
  ? arch_stack_walk+0x68/0xf0
  ? is_bpf_text_address+0x23/0x30
  ? kernel_text_address.part.0+0xbb/0xd0
  ? __kernel_text_address+0x66/0xb0
  ? unwind_get_return_address+0x5e/0xa0
  ? __pfx_stack_trace_consume_entry+0x10/0x10
  ? arch_stack_walk+0xa2/0xf0
  ? _raw_spin_lock_irqsave+0x8b/0xf0
  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
  ? depot_alloc_stack+0x4c/0x1f0
  ? _raw_spin_unlock_irqrestore+0xe/0x30
  ? stack_depot_save_flags+0x35d/0x4f0
  ? kasan_save_stack+0x34/0x50
  ? kasan_save_stack+0x24/0x50
  ? mutex_lock+0x91/0xe0
  ? __pfx_mutex_lock+0x10/0x10
  prepare_uprobe_buffer.part.0+0x2cd/0x500
  uprobe_dispatcher+0x2c3/0x6a0
  ? __pfx_uprobe_dispatcher+0x10/0x10
  ? __kasan_slab_alloc+0x4d/0x90
  handler_chain+0xdd/0x3e0
  handle_swbp+0x26e/0x3d0
  ? __pfx_handle_swbp+0x10/0x10
  ? uprobe_pre_sstep_notifier+0x151/0x1b0
  irqentry_exit_to_user_mode+0xe2/0x1b0
  asm_exc_int3+0x39/0x40
RIP: 0033:0x401199
Code: 01 c2 0f b6 45 fb 88 02 83 45 fc 01 8b 45 fc 3b 45 e4 7c b7 8b 45 e4 48 98 48 8d 50 ff 48 8b 45 e8 48 01 d0 ce
RSP: 002b:00007ffdf00576a8 EFLAGS: 00000206
RAX: 00007ffdf00576b0 RBX: 0000000000000000 RCX: 0000000000000ff2
RDX: 0000000000000ffc RSI: 0000000000000ffd RDI: 00007ffdf00576b0
RBP: 00007ffdf00586b0 R08: 00007feb2f9c0d20 R09: 00007feb2f9c0d20
R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000401040
R13: 00007ffdf0058780 R14: 0000000000000000 R15: 0000000000000000
  </TASK>

This commit enforces the buffer's maxlen less than a page-size to avoid
store_trace_args() out-of-memory access.

Link:https://lore.kernel.org/all/20241015060148.1108331-1-mqaio@linux.alibaba.com/

Fixes: dcad1a2 ("tracing/uprobes: Fetch args before reserving a ring buffer")
Signed-off-by: Qiao Ma<mqaio@linux.alibaba.com>
Signed-off-by: Masami Hiramatsu (Google)<mhiramat@kernel.org>

CVE-2024-50067
(backported from commit 373b933)
[mpellizzer: Context adjusted due to missing commit:
 - 25f00e4 tracing/probes: Support $argN in return probe (kprobe and fprobe)]
Signed-off-by: Massimiliano Pellizzer<massimiliano.pellizzer@canonical.com>
Acked-by: Paolo Pisati <paolo.pisati@canonical.com>
Acked-by: Sarah Emery <sarah.emery@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
dasty pushed a commit to StreamUnlimited/linux that referenced this pull request Nov 27, 2025
…the Crashkernel Scenario

commit 4058c39 upstream.

The issue is that before entering the crash kernel, the DWC USB controller
did not perform operations such as resetting the interrupt mask bits.
After entering the crash kernel,before the USB interrupt handler
registration was completed while loading the DWC USB driver,an GINTSTS_SOF
interrupt was received.This triggered the misroute_irq process within the
GIC handling framework,ultimately leading to the misrouting of the
interrupt,causing it to be handled by the wrong interrupt handler
and resulting in the issue.

Summary:In a scenario where the kernel triggers a panic and enters
the crash kernel,it is necessary to ensure that the interrupt mask
bit is not enabled before the interrupt registration is complete.
If an interrupt reaches the CPU at this moment,it will certainly
not be handled correctly,especially in cases where this interrupt
is reported frequently.

Please refer to the Crashkernel dmesg information as follows
(the message on line 3 was added before devm_request_irq is
called by the dwc2_driver_probe function):
[    5.866837][    T1] dwc2 JMIC0010:01: supply vusb_d not found, using dummy regulator
[    5.874588][    T1] dwc2 JMIC0010:01: supply vusb_a not found, using dummy regulator
[    5.882335][    T1] dwc2 JMIC0010:01: before devm_request_irq  irq: [71], gintmsk[0xf300080e], gintsts[0x04200009]
[    5.892686][    C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-jmnd1.2_RC torvalds#18
[    5.900327][    C0] Hardware name: CMSS HyperCard4-25G/HyperCard4-25G, BIOS 1.6.4 Jul  8 2024
[    5.908836][    C0] Call trace:
[    5.911965][    C0]  dump_backtrace+0x0/0x1f0
[    5.916308][    C0]  show_stack+0x20/0x30
[    5.920304][    C0]  dump_stack+0xd8/0x140
[    5.924387][    C0]  pcie_xxx_handler+0x3c/0x1d8
[    5.930121][    C0]  __handle_irq_event_percpu+0x64/0x1e0
[    5.935506][    C0]  handle_irq_event+0x80/0x1d0
[    5.940109][    C0]  try_one_irq+0x138/0x174
[    5.944365][    C0]  misrouted_irq+0x134/0x140
[    5.948795][    C0]  note_interrupt+0x1d0/0x30c
[    5.953311][    C0]  handle_irq_event+0x13c/0x1d0
[    5.958001][    C0]  handle_fasteoi_irq+0xd4/0x260
[    5.962779][    C0]  __handle_domain_irq+0x88/0xf0
[    5.967555][    C0]  gic_handle_irq+0x9c/0x2f0
[    5.971985][    C0]  el1_irq+0xb8/0x140
[    5.975807][    C0]  __setup_irq+0x3dc/0x7cc
[    5.980064][    C0]  request_threaded_irq+0xf4/0x1b4
[    5.985015][    C0]  devm_request_threaded_irq+0x80/0x100
[    5.990400][    C0]  dwc2_driver_probe+0x1b8/0x6b0
[    5.995178][    C0]  platform_drv_probe+0x5c/0xb0
[    5.999868][    C0]  really_probe+0xf8/0x51c
[    6.004125][    C0]  driver_probe_device+0xfc/0x170
[    6.008989][    C0]  device_driver_attach+0xc8/0xd0
[    6.013853][    C0]  __driver_attach+0xe8/0x1b0
[    6.018369][    C0]  bus_for_each_dev+0x7c/0xdc
[    6.022886][    C0]  driver_attach+0x2c/0x3c
[    6.027143][    C0]  bus_add_driver+0xdc/0x240
[    6.031573][    C0]  driver_register+0x80/0x13c
[    6.036090][    C0]  __platform_driver_register+0x50/0x5c
[    6.041476][    C0]  dwc2_platform_driver_init+0x24/0x30
[    6.046774][    C0]  do_one_initcall+0x50/0x25c
[    6.051291][    C0]  do_initcall_level+0xe4/0xfc
[    6.055894][    C0]  do_initcalls+0x80/0xa4
[    6.060064][    C0]  kernel_init_freeable+0x198/0x240
[    6.065102][    C0]  kernel_init+0x1c/0x12c

Signed-off-by: Shawn Shao <shawn.shao@jaguarmicro.com>
Link: https://lore.kernel.org/r/20240830031709.134-1-shawn.shao@jaguarmicro.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Peter Suti <peter.suti@streamunlimited.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Dec 2, 2025
Commit a5fb3ff ("PCI: Allow PCI bridges to go to D3Hot on all
non-x86") was bisected to break various non-x86 RISC Unix systems,
e.g. sparc64, see two example oopses below. Fix by only allowing D3Hot
on modern ARM64, PPC64 and RISCV ISAs besides new enough x86.

Sun Blade 1000:
ERROR(0): Cheetah error trap taken afsr[0010080005000000] afar[000007f900800000] TL1(0)
ERROR(0): TPC[100a05a4] TNPC[100a05a8] O7[42acc8] TSTATE[4411001603]
ERROR(0):
TPC<MakeIocReady+0xc/0x278 [mptbase]>
ERROR(0): M_SYND(0),  E_SYND(0), Privileged
ERROR(0): Highest priority error (0000080000000000) "Bus error response from system bus"
ERROR(0): D-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000]
ERROR(0): D-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[0000000000000000]
ERROR(0): I-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] u[0000000000000000] l[0000000000000000]
ERROR(0): I-cache INSN0[0000000000000000] INSN1[0000000000000000] INSN2[0000000000000000] INSN3[0000000000000000]
ERROR(0): I-cache INSN4[0000000000000000] INSN5[0000000000000000] INSN6[0000000000000000] INSN7[0000000000000000]
ERROR(0): E-cache idx[b08040] tag[000000001e008fa0]
ERROR(0): E-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[ffffffffffffffff]
Kernel panic - not syncing: Irrecoverable deferred error trap.
CPU: 0 UID: 0 PID: 46 Comm: (udev-worker) Not tainted 6.14.0-rc1-00001-ga5fb3ff63287 torvalds#18
Call Trace:
[<00000000004294b0>] panic+0xf0/0x370
[<0000000000435bc4>] cheetah_deferred_handler+0x2c8/0x2d8
[<0000000000405e88>] c_deferred+0x18/0x24
[<00000000100a05a4>] MakeIocReady+0xc/0x278 [mptbase]
[<00000000100a089c>] mpt_do_ioc_recovery+0x8c/0x1054 [mptbase]
[<000000001009f2d4>] mpt_attach+0x920/0xa68 [mptbase]
[<000000001012424c>] mptsas_probe+0x8/0x3e8 [mptsas]
[<0000000000788308>] local_pci_probe+0x24/0x70
[<0000000000788dac>] pci_device_probe+0x1c0/0x1d0
[<000000000082633c>] really_probe+0x13c/0x29c
[<0000000000826590>] __driver_probe_device+0xf4/0x104
[<0000000000826614>] driver_probe_device+0x24/0xa0
[<000000000082683c>] __driver_attach+0xe8/0x104
[<0000000000824da0>] bus_for_each_dev+0x58/0x84
[<0000000000825508>] bus_add_driver+0xdc/0x1f8
[<0000000000827110>] driver_register+0x70/0x120

Niagara T1:
mptsas 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
NON-RESUMABLE ERROR: Reporting on cpu 31
NON-RESUMABLE ERROR: TPC [0x0000000010184034] <MakeIocReady+0x10/0x298 [mptbase]>
NON-RESUMABLE ERROR: RAW [1f10000000000007:0000000e3179235c:0000000202000004:000000ea00300000
NON-RESUMABLE ERROR: 00000000001f0000:0000000000000000:0000000000000000:0000000000000000]
NON-RESUMABLE ERROR: handle [0x1f10000000000007] stick [0x0000000e3179235c]
NON-RESUMABLE ERROR: type [precise nonresumable]
NON-RESUMABLE ERROR: attrs [0x02000004] < PIO sp-faulted priv >
NON-RESUMABLE ERROR: raddr [0x000000ea00300000]
Kernel panic - not syncing: Non-resumable error.
CPU: 31 UID: 0 PID: 367 Comm: (udev-worker) Not tainted 6.16.12+3-sparc64-smp #1 NONE  Debian 6.16.12-2+sparc64.1
Call Trace:
[<00000000004373c4>] dump_stack+0x8/0x18
[<0000000000429540>] panic+0xf4/0x398
[<000000000043afcc>] sun4v_nonresum_error+0x16c/0x240
[<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8
[<0000000010184034>] MakeIocReady+0x10/0x298 [mptbase]
[<00000000101844b4>] mpt_do_ioc_recovery+0x9c/0x1110 [mptbase]
[<00000000101836f8>] mpt_attach+0xb58/0xd20 [mptbase]
[<0000000010287f30>] mptsas_probe+0x10/0x440 [mptsas]
[<0000000000b3fab0>] local_pci_probe+0x30/0x80
[<0000000000b405d4>] pci_device_probe+0xb4/0x240
[<0000000000bfd348>] really_probe+0xc8/0x400
[<0000000000bfd70c>] __driver_probe_device+0x8c/0x160
[<0000000000bfd8c8>] driver_probe_device+0x28/0x100
[<0000000000bfdb7c>] __driver_attach+0xbc/0x1e0
[<0000000000bfacfc>] bus_for_each_dev+0x5c/0xc0
[<0000000000bfcafc>] driver_attach+0x1c/0x40
Press Stop-A (L1-A) from sun keyboard or send break
twice on console to return to the boot prom

Fixes: a5fb3ff ("PCI: Allow PCI bridges to go to D3Hot on all non-x86")
Signed-off-by: René Rebe <rene@exactco.de>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Dec 3, 2025
Commit a5fb3ff ("PCI: Allow PCI bridges to go to D3Hot on all
non-x86") was bisected to break some SPARC64 systems, see two oopses
below. Fix by not allowing D3Hot on sparc64 while this is being
further analyzed.

Sun Blade 1000:
ERROR(0): Cheetah error trap taken afsr[0010080005000000] afar[000007f900800000] TL1(0)
ERROR(0): TPC[100a05a4] TNPC[100a05a8] O7[42acc8] TSTATE[4411001603]
ERROR(0):
TPC<MakeIocReady+0xc/0x278 [mptbase]>
ERROR(0): M_SYND(0),  E_SYND(0), Privileged
ERROR(0): Highest priority error (0000080000000000) "Bus error response from system bus"
ERROR(0): D-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000]
ERROR(0): D-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[0000000000000000]
ERROR(0): I-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] u[0000000000000000] l[0000000000000000]
ERROR(0): I-cache INSN0[0000000000000000] INSN1[0000000000000000] INSN2[0000000000000000] INSN3[0000000000000000]
ERROR(0): I-cache INSN4[0000000000000000] INSN5[0000000000000000] INSN6[0000000000000000] INSN7[0000000000000000]
ERROR(0): E-cache idx[b08040] tag[000000001e008fa0]
ERROR(0): E-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[ffffffffffffffff]
Kernel panic - not syncing: Irrecoverable deferred error trap.
CPU: 0 UID: 0 PID: 46 Comm: (udev-worker) Not tainted 6.14.0-rc1-00001-ga5fb3ff63287 torvalds#18
Call Trace:
[<00000000004294b0>] panic+0xf0/0x370
[<0000000000435bc4>] cheetah_deferred_handler+0x2c8/0x2d8
[<0000000000405e88>] c_deferred+0x18/0x24
[<00000000100a05a4>] MakeIocReady+0xc/0x278 [mptbase]
[<00000000100a089c>] mpt_do_ioc_recovery+0x8c/0x1054 [mptbase]
[<000000001009f2d4>] mpt_attach+0x920/0xa68 [mptbase]
[<000000001012424c>] mptsas_probe+0x8/0x3e8 [mptsas]
[<0000000000788308>] local_pci_probe+0x24/0x70
[<0000000000788dac>] pci_device_probe+0x1c0/0x1d0
[<000000000082633c>] really_probe+0x13c/0x29c
[<0000000000826590>] __driver_probe_device+0xf4/0x104
[<0000000000826614>] driver_probe_device+0x24/0xa0
[<000000000082683c>] __driver_attach+0xe8/0x104
[<0000000000824da0>] bus_for_each_dev+0x58/0x84
[<0000000000825508>] bus_add_driver+0xdc/0x1f8
[<0000000000827110>] driver_register+0x70/0x120

Niagara T1:
mptsas 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
NON-RESUMABLE ERROR: Reporting on cpu 31
NON-RESUMABLE ERROR: TPC [0x0000000010184034] <MakeIocReady+0x10/0x298 [mptbase]>
NON-RESUMABLE ERROR: RAW [1f10000000000007:0000000e3179235c:0000000202000004:000000ea00300000
NON-RESUMABLE ERROR: 00000000001f0000:0000000000000000:0000000000000000:0000000000000000]
NON-RESUMABLE ERROR: handle [0x1f10000000000007] stick [0x0000000e3179235c]
NON-RESUMABLE ERROR: type [precise nonresumable]
NON-RESUMABLE ERROR: attrs [0x02000004] < PIO sp-faulted priv >
NON-RESUMABLE ERROR: raddr [0x000000ea00300000]
Kernel panic - not syncing: Non-resumable error.
CPU: 31 UID: 0 PID: 367 Comm: (udev-worker) Not tainted 6.16.12+3-sparc64-smp #1 NONE  Debian 6.16.12-2+sparc64.1
Call Trace:
[<00000000004373c4>] dump_stack+0x8/0x18
[<0000000000429540>] panic+0xf4/0x398
[<000000000043afcc>] sun4v_nonresum_error+0x16c/0x240
[<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8
[<0000000010184034>] MakeIocReady+0x10/0x298 [mptbase]
[<00000000101844b4>] mpt_do_ioc_recovery+0x9c/0x1110 [mptbase]
[<00000000101836f8>] mpt_attach+0xb58/0xd20 [mptbase]
[<0000000010287f30>] mptsas_probe+0x10/0x440 [mptsas]
[<0000000000b3fab0>] local_pci_probe+0x30/0x80
[<0000000000b405d4>] pci_device_probe+0xb4/0x240
[<0000000000bfd348>] really_probe+0xc8/0x400
[<0000000000bfd70c>] __driver_probe_device+0x8c/0x160
[<0000000000bfd8c8>] driver_probe_device+0x28/0x100
[<0000000000bfdb7c>] __driver_attach+0xbc/0x1e0
[<0000000000bfacfc>] bus_for_each_dev+0x5c/0xc0
[<0000000000bfcafc>] driver_attach+0x1c/0x40
Press Stop-A (L1-A) from sun keyboard or send break
twice on console to return to the boot prom

Fixes: a5fb3ff ("PCI: Allow PCI bridges to go to D3Hot on all non-x86")
Cc: stable@kernel.org
Signed-off-by: René Rebe <rene@exactco.de>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Dec 5, 2025
The xfstests' test-case generic/480 leaves HFS+ volume
in corrupted state:

sudo ./check generic/480
FSTYP -- hfsplus
PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.17.0-rc1+ #4 SMP PREEMPT_DYNAMIC Wed Oct 1 15:02:44 PDT 2025
MKFS_OPTIONS -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch

generic/480 _check_generic_filesystem: filesystem on /dev/loop51 is inconsistent
(see XFSTESTS-2/xfstests-dev/results//generic/480.full for details)

Ran: generic/480
Failures: generic/480
Failed 1 of 1 tests

sudo fsck.hfsplus -d /dev/loop51
** /dev/loop51
Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K.
Executing fsck_hfs (version 540.1-Linux).
** Checking non-journaled HFS Plus Volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
** Checking multi-linked files.
CheckHardLinks: found 1 pre-Leopard file inodes.
Incorrect number of file hard links
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking volume bitmap.
** Checking volume information.
invalid VHB nextCatalogID
Volume header needs minor repair
(2, 0)
Verify Status: VIStat = 0x8000, ABTStat = 0x0000 EBTStat = 0x0000
CBTStat = 0x0000 CatStat = 0x00000002
** Repairing volume.
Incorrect flags for file hard link (id = 19)
(It should be 0x22 instead of 0x2)
Incorrect flags for file inode (id = 18)
(It should be 0x22 instead of 0x2)
first link ID=0 is < 16 for fileinode=18
Error getting first link ID for inode = 18 (result=2)
Invalid first link in hard link chain (id = 18)
(It should be 19 instead of 0)
Indirect node 18 needs link count adjustment
(It should be 1 instead of 2)
** Rechecking volume.
** Checking non-journaled HFS Plus Volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
** Checking multi-linked files.
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking volume bitmap.
** Checking volume information.
** The volume untitled was repaired successfully.

The generic/480 test executes such steps on final phase:

"Now remove of the links of our file and create
a new file with the same name and in the same
parent directory, and finally fsync this new file."

unlink $SCRATCH_MNT/testdir/bar
touch $SCRATCH_MNT/testdir/bar
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/bar

"Simulate a power failure and mount the filesystem
to check that replaying the fsync log/journal
succeeds, that is the mount operation does not fail."

_flakey_drop_and_remount

The key issue in HFS+ logic is that hfsplus_link(),
hfsplus_unlink(), hfsplus_rmdir(), hfsplus_symlink(),
and hfsplus_mknod() methods don't call
hfsplus_cat_write_inode() for the case of modified
inode objects. As a result, even if hfsplus_file_fsync()
is trying to flush the dirty Catalog File, but because of
not calling hfsplus_cat_write_inode() not all modified
inodes save the new state into Catalog File's records.
Finally, simulation of power failure results in inconsistent
state of Catalog File and FSCK tool reports about
volume corruption.

This patch adds calling of hfsplus_cat_write_inode()
method for modified inodes in hfsplus_link(),
hfsplus_unlink(), hfsplus_rmdir(), hfsplus_symlink(),
and hfsplus_mknod() methods. Also, it adds debug output
in several methods.

sudo ./check generic/480
FSTYP         -- hfsplus
PLATFORM      -- Linux/x86_64 hfsplus-testing-0001 6.18.0-rc1+ torvalds#18 SMP PREEMPT_DYNAMIC Thu Dec  4 12:24:45 PST 2025
MKFS_OPTIONS  -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch

generic/480 16s ...  16s
Ran: generic/480
Passed all 1 tests

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Dec 7, 2025
The xfstests' test-case generic/498 leaves HFS+ volume
in corrupted state:

sudo ./check generic/498
FSTYP -- hfsplus
PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.18.0-rc1+ torvalds#18 SMP PREEMPT_DYNAMIC Thu Dec 4 12:24:45 PST 2025
MKFS_OPTIONS -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch

generic/498 _check_generic_filesystem: filesystem on /dev/loop51 is inconsistent
(see XFSTESTS-2/xfstests-dev/results//generic/498.full for details)

Ran: generic/498
Failures: generic/498
Failed 1 of 1 tests

sudo fsck.hfsplus -d /dev/loop51
** /dev/loop51
Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K.
Executing fsck_hfs (version 540.1-Linux).
** Checking non-journaled HFS Plus Volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
Invalid leaf record count
(It should be 16 instead of 2)
** Checking multi-linked files.
CheckHardLinks: found 1 pre-Leopard file inodes.
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking volume bitmap.
** Checking volume information.
Verify Status: VIStat = 0x0000, ABTStat = 0x0000 EBTStat = 0x0000
CBTStat = 0x8000 CatStat = 0x00000000
** Repairing volume.
** Rechecking volume.
** Checking non-journaled HFS Plus Volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
** Checking multi-linked files.
CheckHardLinks: found 1 pre-Leopard file inodes.
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking volume bitmap.
** Checking volume information.
** The volume untitled was repaired successfully.

The generic/498 test executes such steps on final phase:

mkdir $SCRATCH_MNT/A
mkdir $SCRATCH_MNT/B
mkdir $SCRATCH_MNT/A/C
touch $SCRATCH_MNT/B/foo
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/B/foo

ln $SCRATCH_MNT/B/foo $SCRATCH_MNT/A/C/foo
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/A

"Simulate a power failure and mount the filesystem
to check that what we explicitly fsync'ed exists."

_flakey_drop_and_remount

The FSCK tool complains about "Invalid leaf record count".
HFS+ b-tree header contains leaf_count field is updated
by hfs_brec_insert() and hfs_brec_remove(). The hfs_brec_insert()
is involved into hard link creation process. However,
modified in-core leaf_count field is stored into HFS+
b-tree header by hfs_btree_write() method. But,
unfortunately, hfs_btree_write() hasn't been called
by hfsplus_cat_write_inode() and hfsplus_file_fsync()
stores not fully consistent state of the Catalog File's
b-tree.

This patch adds calling hfs_btree_write() method in
the hfsplus_cat_write_inode() with the goal of
storing consistent state of Catalog File's b-tree.
Finally, it makes FSCK tool happy.

sudo ./check generic/498
FSTYP         -- hfsplus
PLATFORM      -- Linux/x86_64 hfsplus-testing-0001 6.18.0-rc1+ torvalds#22 SMP PREEMPT_DYNAMIC Sat Dec  6 17:01:31 PST 2025
MKFS_OPTIONS  -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch

generic/498 33s ...  31s
Ran: generic/498
Passed all 1 tests

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
ZXlieC pushed a commit to Xlie-Electronic-Customs/linux that referenced this pull request Dec 7, 2025
… 'T'

When perf report with annotation for a symbol, press 's' and 'T', then exit
the annotate browser. Once annotate the same symbol, the annotate browser
will crash.

The browser.arch was required to be correctly updated when data type
feature was enabled by 'T'. Usually it was initialized by symbol__annotate2
function. If a symbol has already been correctly annotated at the first
time, it should not call the symbol__annotate2 function again, thus the
browser.arch will not get initialized. Then at the second time to show the
annotate browser, the data type needs to be displayed but the browser.arch
is empty.

Stack trace as below:

Perf: Segmentation fault
-------- backtrace --------
    #0 0x55d365 in ui__signal_backtrace setup.c:0
    #1 0x7f5ff1a3e930 in __restore_rt libc.so.6[3e930]
    #2 0x570f08 in arch__is perf[570f08]
    #3 0x562186 in annotate_get_insn_location perf[562186]
    #4 0x562626 in __hist_entry__get_data_type annotate.c:0
    #5 0x56476d in annotation_line__write perf[56476d]
    torvalds#6 0x54e2db in annotate_browser__write annotate.c:0
    torvalds#7 0x54d061 in ui_browser__list_head_refresh perf[54d061]
    torvalds#8 0x54dc9e in annotate_browser__refresh annotate.c:0
    torvalds#9 0x54c03d in __ui_browser__refresh browser.c:0
    torvalds#10 0x54ccf8 in ui_browser__run perf[54ccf8]
    torvalds#11 0x54eb92 in __hist_entry__tui_annotate perf[54eb92]
    torvalds#12 0x552293 in do_annotate hists.c:0
    torvalds#13 0x55941c in evsel__hists_browse hists.c:0
    torvalds#14 0x55b00f in evlist__tui_browse_hists perf[55b00f]
    torvalds#15 0x42ff02 in cmd_report perf[42ff02]
    torvalds#16 0x494008 in run_builtin perf.c:0
    torvalds#17 0x494305 in handle_internal_command perf.c:0
    torvalds#18 0x410547 in main perf[410547]
    torvalds#19 0x7f5ff1a295d0 in __libc_start_call_main libc.so.6[295d0]
    torvalds#20 0x7f5ff1a29680 in __libc_start_main@@GLIBC_2.34 libc.so.6[29680]
    torvalds#21 0x410b75 in _start perf[410b75]

Fixes: 1d4374a ("perf annotate: Add 'T' hot key to toggle data type display")
Reviewed-by: James Clark <james.clark@linaro.org>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
hbirth pushed a commit to hbirth/linux that referenced this pull request Dec 10, 2025
…k-rhel9_6-570.12.1

fuse: fix memory leak in fuse-over-io-uring argument copies
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Dec 19, 2025
If the BIOS generates a very small ARM Processor Error, or
an incomplete one, the current logic will fail to deferrence

	err->section_length
and
	ctx_info->size

Add checks to avoid that. With such changes, such GHESv2
records won't cause OOPSes like this:

[    1.492129] Internal error: Oops: 0000000096000005 [#1]  SMP
[    1.495449] Modules linked in:
[    1.495820] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.18.0-rc1-00017-gabadcc3553dd-dirty torvalds#18 PREEMPT
[    1.496125] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
[    1.496433] Workqueue: kacpi_notify acpi_os_execute_deferred
[    1.496967] pstate: 814000c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[    1.497199] pc : log_arm_hw_error+0x5c/0x200
[    1.497380] lr : ghes_handle_arm_hw_error+0x94/0x220

0xffff8000811c5324 is in log_arm_hw_error (../drivers/ras/ras.c:75).
70		err_info = (struct cper_arm_err_info *)(err + 1);
71		ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num);
72		ctx_err = (u8 *)ctx_info;
73
74		for (n = 0; n < err->context_info_num; n++) {
75			sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
76			ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
77			ctx_len += sz;
78		}
79

and similar ones while trying to access section_length on an
error dump with too small size.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
jbrun3t added a commit to jbrun3t/linux that referenced this pull request Dec 22, 2025
This relocates register pokes of the HDMI VPU encoder out of the
HDMI phy driver. As far as HDMI is concerned, the sequence in which
the setup is done remains mostly the same.

This was tested with modetest, cycling through the following resolutions:
  #0 3840x2160 60.00
  #1 3840x2160 59.94
  #2 3840x2160 50.00
  #3 3840x2160 30.00
  #4 3840x2160 29.97
  #5 3840x2160 25.00
  torvalds#6 3840x2160 24.00
  torvalds#7 3840x2160 23.98
  torvalds#8 1920x1080 60.00
  torvalds#9 1920x1080 60.00
  torvalds#10 1920x1080 59.94
  torvalds#11 1920x1080i 30.00
  torvalds#12 1920x1080i 29.97
  torvalds#13 1920x1080 50.00
  torvalds#14 1920x1080i 25.00
  torvalds#15 1920x1080 30.00
  torvalds#16 1920x1080 29.97
  torvalds#17 1920x1080 25.00
  torvalds#18 1920x1080 24.00
  torvalds#19 1920x1080 23.98
  torvalds#20 1280x1024 60.02
  torvalds#21 1152x864 59.97
  torvalds#22 1280x720 60.00
  torvalds#23 1280x720 59.94
  torvalds#24 1280x720 50.00
  torvalds#25 1024x768 60.00
  torvalds#26 800x600 60.32
  torvalds#27 720x576 50.00
  torvalds#28 720x480 59.94

No regression to report.

This is part of an effort to clean up Amlogic HDMI related drivers which
should eventually allow to stop using the component API and HHI syscon.

Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant