Skip to content

Conversation

@smartinez87
Copy link

There as an extra 'an' in this doc, so I removed it.

@jacobh
Copy link

jacobh commented Sep 6, 2011

thankyou kind sir, your commit will not go unnoticed.

@bdonlan
Copy link
Contributor

bdonlan commented Sep 6, 2011

Please note that pull requests are not the proper procedure to submit patches to the Linux kernel (Linus put the kernel up here because kernel.org's master mirror is down; it seems that he doesn't like the pull request system[1], but github does not allow him to disable it). Please read Documentation/SubmittingPatches - you must write a proper commit message, add a Signed-Off-By line, and submit to the linux kernel mailing list, CCing the affected maintainers (ie, not Linus in most cases).

[1] - http://blueparen.com/node/12

@smartinez87
Copy link
Author

can you please point me at some url where I can read that submitting patches documentation? thanks!

@snarkyMcSnark
Copy link

smartinez87, this is pretty silly stuffs, these stunt-style pull requests that have been coming into this repo lately. Sure it's open source and you want to help fix it, but as bdonlan notes above, there are proper guidelines to be followed to submit patches to be fixed. A simpler solution (lifted wholesale from reddit here btw): someone volunteers to run the "typo in the readme" branch. People send pull requests to them. When that branch has a delta of more than a couple fucking kilobytes, then a reasonable pull request can be sent to the main project.

Also look at this link to the Kernel Janitors site please in the future for things related to code quality guidelines cleaner-uppers in the kernel.

Let's not distract and annoy Linus with such silly trivialities like this, it just makes you look like a jackass.

@dovydasm
Copy link

dovydasm commented Sep 6, 2011

Bravo!

@smartinez87
Copy link
Author

hey, I just don't care about this, just noticed the typo and wanted the people that can do something about this to know about it and fix it. If no one care about the docs, I care even less.

@smartinez87 smartinez87 closed this Sep 6, 2011
@VM2
Copy link

VM2 commented Sep 7, 2011

@snarkyMcSnark is right. @smartinez87 is just unnecessarily trying to create work for a high profile project just to be part of the commit history. His background points to the same. He claims to be a core contributor for the rails project although his entire commit history consists solely of frivolous grammatical and whitespace changes to the documentation. In fact he has no original commits for documentation either just small formatting changes to existing commits. This is entirely true.

@diegoviola, instead of you two trying to fix whitespace issues and unnecessarily trying to police other contributors you should work on something useful. These are all valid arguments and the original committer has a bad history of doing this and 3 people have already pointed that out.

damentz referenced this pull request in zen-kernel/zen-kernel Sep 27, 2011
commit fe47ae7 upstream.

The lockdep warning below detects a possible A->B/B->A locking
dependency of mm->mmap_sem and dcookie_mutex. The order in
sync_buffer() is mm->mmap_sem/dcookie_mutex, while in
sys_lookup_dcookie() it is vice versa.

Fixing it in sys_lookup_dcookie() by unlocking dcookie_mutex before
copy_to_user().

oprofiled/4432 is trying to acquire lock:
 (&mm->mmap_sem){++++++}, at: [<ffffffff810b444b>] might_fault+0x53/0xa3

but task is already holding lock:
 (dcookie_mutex){+.+.+.}, at: [<ffffffff81124d28>] sys_lookup_dcookie+0x45/0x149

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (dcookie_mutex){+.+.+.}:
       [<ffffffff8106557f>] lock_acquire+0xf8/0x11e
       [<ffffffff814634f0>] mutex_lock_nested+0x63/0x309
       [<ffffffff81124e5c>] get_dcookie+0x30/0x144
       [<ffffffffa0000fba>] sync_buffer+0x196/0x3ec [oprofile]
       [<ffffffffa0001226>] task_exit_notify+0x16/0x1a [oprofile]
       [<ffffffff81467b96>] notifier_call_chain+0x37/0x63
       [<ffffffff8105803d>] __blocking_notifier_call_chain+0x50/0x67
       [<ffffffff81058068>] blocking_notifier_call_chain+0x14/0x16
       [<ffffffff8105a718>] profile_task_exit+0x1a/0x1c
       [<ffffffff81039e8f>] do_exit+0x2a/0x6fc
       [<ffffffff8103a5e4>] do_group_exit+0x83/0xae
       [<ffffffff8103a626>] sys_exit_group+0x17/0x1b
       [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b

-> #0 (&mm->mmap_sem){++++++}:
       [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711
       [<ffffffff8106557f>] lock_acquire+0xf8/0x11e
       [<ffffffff810b4478>] might_fault+0x80/0xa3
       [<ffffffff81124de7>] sys_lookup_dcookie+0x104/0x149
       [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

1 lock held by oprofiled/4432:
 #0:  (dcookie_mutex){+.+.+.}, at: [<ffffffff81124d28>] sys_lookup_dcookie+0x45/0x149

stack backtrace:
Pid: 4432, comm: oprofiled Not tainted 2.6.39-00008-ge5a450d #9
Call Trace:
 [<ffffffff81063193>] print_circular_bug+0xae/0xbc
 [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711
 [<ffffffff8102ef13>] ? get_parent_ip+0x11/0x42
 [<ffffffff810b444b>] ? might_fault+0x53/0xa3
 [<ffffffff8106557f>] lock_acquire+0xf8/0x11e
 [<ffffffff810b444b>] ? might_fault+0x53/0xa3
 [<ffffffff810d7d54>] ? path_put+0x22/0x27
 [<ffffffff810b4478>] might_fault+0x80/0xa3
 [<ffffffff810b444b>] ? might_fault+0x53/0xa3
 [<ffffffff81124de7>] sys_lookup_dcookie+0x104/0x149
 [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b

References: https://bugzilla.kernel.org/show_bug.cgi?id=13809
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
pmundt pushed a commit to pmundt/linux-sh that referenced this pull request Oct 28, 2011
In commit 5ec094c "nfsd4: extend state
lock over seqid replay logic" I modified the exit logic of all the
seqid-based procedures except nfsd4_locku().  Fix the oversight.

The result of the bug was a double-unlock while handling the LOCKU
procedure, and a warning like:

[  142.150014] WARNING: at kernel/mutex-debug.c:78 debug_mutex_unlock+0xda/0xe0()
...
[  142.152927] Pid: 742, comm: nfsd Not tainted 3.1.0-rc1-SLIM+ torvalds#9
[  142.152927] Call Trace:
[  142.152927]  [<ffffffff8105fa4f>] warn_slowpath_common+0x7f/0xc0
[  142.152927]  [<ffffffff8105faaa>] warn_slowpath_null+0x1a/0x20
[  142.152927]  [<ffffffff810960ca>] debug_mutex_unlock+0xda/0xe0
[  142.152927]  [<ffffffff813e4200>] __mutex_unlock_slowpath+0x80/0x140
[  142.152927]  [<ffffffff813e42ce>] mutex_unlock+0xe/0x10
[  142.152927]  [<ffffffffa03bd3f5>] nfs4_lock_state+0x35/0x40 [nfsd]
[  142.152927]  [<ffffffffa03b0b71>] nfsd4_proc_compound+0x2a1/0x690
[nfsd]
[  142.152927]  [<ffffffffa039f9fb>] nfsd_dispatch+0xeb/0x230 [nfsd]
[  142.152927]  [<ffffffffa02b1055>] svc_process_common+0x345/0x690
[sunrpc]
[  142.152927]  [<ffffffff81058d10>] ? try_to_wake_up+0x280/0x280
[  142.152927]  [<ffffffffa02b16e2>] svc_process+0x102/0x150 [sunrpc]
[  142.152927]  [<ffffffffa039f0bd>] nfsd+0xbd/0x160 [nfsd]
[  142.152927]  [<ffffffffa039f000>] ? 0xffffffffa039efff
[  142.152927]  [<ffffffff8108230c>] kthread+0x8c/0xa0
[  142.152927]  [<ffffffff813e8694>] kernel_thread_helper+0x4/0x10
[  142.152927]  [<ffffffff81082280>] ? kthread_worker_fn+0x190/0x190
[  142.152927]  [<ffffffff813e8690>] ? gs_change+0x13/0x13

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Tested-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
torvalds pushed a commit that referenced this pull request Dec 15, 2011
If the pte mapping in generic_perform_write() is unmapped between
iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
"copied" parameter to ->end_write can be zero. ext4 couldn't cope with
it with delayed allocations enabled. This skips the i_disksize
enlargement logic if copied is zero and no new data was appeneded to
the inode.

 gdb> bt
 #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 #2  0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
 ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
 #3  generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
 os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
 #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
 xffff88001e26be40) at mm/filemap.c:2600
 #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
 zed out>, pos=<value optimized out>) at mm/filemap.c:2632
 #6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
 t fs/ext4/file.c:136
 #7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
 ppos=0xffff88001e26bf48) at fs/read_write.c:406
 #8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
 000, pos=0xffff88001e26bf48) at fs/read_write.c:435
 #9  0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
 4000) at fs/read_write.c:487
 #10 <signal handler called>
 #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
 #12 0x0000000000000000 in ?? ()
 gdb> print offset
 $22 = 0xffffffffffffffff
 gdb> print idx
 $23 = 0xffffffff
 gdb> print inode->i_blkbits
 $24 = 0xc
 gdb> up
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 2512                    if (ext4_da_should_update_i_disksize(page, end)) {
 gdb> print start
 $25 = 0x0
 gdb> print end
 $26 = 0xffffffffffffffff
 gdb> print pos
 $27 = 0x108000
 gdb> print new_i_size
 $28 = 0x108000
 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
 $29 = 0xd9000
 gdb> down
 2467            for (i = 0; i < idx; i++)
 gdb> print i
 $30 = 0xd44acbee

This is 100% reproducible with some autonuma development code tuned in
a very aggressive manner (not normal way even for knumad) which does
"exotic" changes to the ptes. It wouldn't normally trigger but I don't
see why it can't happen normally if the page is added to swap cache in
between the two faults leading to "copied" being zero (which then
hangs in ext4). So it should be fixed. Especially possible with lumpy
reclaim (albeit disabled if compaction is enabled) as that would
ignore the young bits in the ptes.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
em-and-m pushed a commit to em-and-m/linux that referenced this pull request Jan 8, 2012
qeth layer3 recovery invokes its set_multicast_list function, which
invokes function __vlan_find_dev_deep requiring rcu_read_lock or
rtnl lock. This causes kernel messages:

kernel: [ INFO: suspicious rcu_dereference_check() usage. ]
kernel: ---------------------------------------------------
kernel: net/8021q/vlan_core.c:70 invoked rcu_dereference_check() without protection!

kernel: stack backtrace:
kernel: CPU: 0 Not tainted 3.1.0 torvalds#9
kernel: Process qeth_recover (pid: 2078, task: 000000007e584680, ksp: 000000007e3e3930)
kernel: 000000007e3e3d08 000000007e3e3c88 0000000000000002 0000000000000000
kernel:       000000007e3e3d28 000000007e3e3ca0 000000007e3e3ca0 00000000005e77ce
kernel:       0000000000000000 0000000000000001 ffffffffffffffff 0000000000000001
kernel:       000000000000000d 000000000000000c 000000007e3e3cf0 0000000000000000
kernel:       0000000000000000 0000000000100a18 000000007e3e3c88 000000007e3e3cc8
kernel: Call Trace:
kernel: ([<0000000000100926>] show_trace+0xee/0x144)
kernel: [<00000000005d395c>] __vlan_find_dev_deep+0xb0/0x108
kernel: [<00000000004acd3a>] qeth_l3_set_multicast_list+0x976/0xe38
kernel: [<00000000004ae0f4>] __qeth_l3_set_online+0x75c/0x1498
kernel: [<00000000004aefec>] qeth_l3_recover+0xc4/0x1d0
kernel: [<0000000000185372>] kthread+0xa6/0xb0
kernel: [<00000000005ed4c6>] kernel_thread_starter+0x6/0xc
kernel: [<00000000005ed4c0>] kernel_thread_starter+0x0/0xc

The patch makes sure the rtnl lock is held once qeth recovery invokes
its set_multicast_list function.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tworaz pushed a commit to tworaz/linux that referenced this pull request Jan 9, 2012
commit fe47ae7 upstream.

The lockdep warning below detects a possible A->B/B->A locking
dependency of mm->mmap_sem and dcookie_mutex. The order in
sync_buffer() is mm->mmap_sem/dcookie_mutex, while in
sys_lookup_dcookie() it is vice versa.

Fixing it in sys_lookup_dcookie() by unlocking dcookie_mutex before
copy_to_user().

oprofiled/4432 is trying to acquire lock:
 (&mm->mmap_sem){++++++}, at: [<ffffffff810b444b>] might_fault+0x53/0xa3

but task is already holding lock:
 (dcookie_mutex){+.+.+.}, at: [<ffffffff81124d28>] sys_lookup_dcookie+0x45/0x149

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (dcookie_mutex){+.+.+.}:
       [<ffffffff8106557f>] lock_acquire+0xf8/0x11e
       [<ffffffff814634f0>] mutex_lock_nested+0x63/0x309
       [<ffffffff81124e5c>] get_dcookie+0x30/0x144
       [<ffffffffa0000fba>] sync_buffer+0x196/0x3ec [oprofile]
       [<ffffffffa0001226>] task_exit_notify+0x16/0x1a [oprofile]
       [<ffffffff81467b96>] notifier_call_chain+0x37/0x63
       [<ffffffff8105803d>] __blocking_notifier_call_chain+0x50/0x67
       [<ffffffff81058068>] blocking_notifier_call_chain+0x14/0x16
       [<ffffffff8105a718>] profile_task_exit+0x1a/0x1c
       [<ffffffff81039e8f>] do_exit+0x2a/0x6fc
       [<ffffffff8103a5e4>] do_group_exit+0x83/0xae
       [<ffffffff8103a626>] sys_exit_group+0x17/0x1b
       [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b

-> #0 (&mm->mmap_sem){++++++}:
       [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711
       [<ffffffff8106557f>] lock_acquire+0xf8/0x11e
       [<ffffffff810b4478>] might_fault+0x80/0xa3
       [<ffffffff81124de7>] sys_lookup_dcookie+0x104/0x149
       [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

1 lock held by oprofiled/4432:
 #0:  (dcookie_mutex){+.+.+.}, at: [<ffffffff81124d28>] sys_lookup_dcookie+0x45/0x149

stack backtrace:
Pid: 4432, comm: oprofiled Not tainted 2.6.39-00008-ge5a450d torvalds#9
Call Trace:
 [<ffffffff81063193>] print_circular_bug+0xae/0xbc
 [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711
 [<ffffffff8102ef13>] ? get_parent_ip+0x11/0x42
 [<ffffffff810b444b>] ? might_fault+0x53/0xa3
 [<ffffffff8106557f>] lock_acquire+0xf8/0x11e
 [<ffffffff810b444b>] ? might_fault+0x53/0xa3
 [<ffffffff810d7d54>] ? path_put+0x22/0x27
 [<ffffffff810b4478>] might_fault+0x80/0xa3
 [<ffffffff810b444b>] ? might_fault+0x53/0xa3
 [<ffffffff81124de7>] sys_lookup_dcookie+0x104/0x149
 [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b

References: https://bugzilla.kernel.org/show_bug.cgi?id=13809
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Pfiver pushed a commit to Pfiver/linux that referenced this pull request Jan 16, 2012
$ wget "http://pkgs.fedoraproject.org/gitweb/?p=kernel.git;a=blob_plain;f=mac80211_offchannel_rework_revert.patch;h=859799714cd85a58450ecde4a1dabc5adffd5100;hb=refs/heads/f16" -O mac80211_offchannel_rework_revert.patch
$ patch -p1 --dry-run < mac80211_offchannel_rework_revert.patch
patching file net/mac80211/ieee80211_i.h
Hunk #1 succeeded at 702 (offset 8 lines).
Hunk #2 succeeded at 712 (offset 8 lines).
Hunk #3 succeeded at 1143 (offset -57 lines).
patching file net/mac80211/main.c
patching file net/mac80211/offchannel.c
Hunk #1 succeeded at 18 (offset 1 line).
Hunk #2 succeeded at 42 (offset 1 line).
Hunk #3 succeeded at 78 (offset 1 line).
Hunk #4 succeeded at 96 (offset 1 line).
Hunk #5 succeeded at 162 (offset 1 line).
Hunk torvalds#6 succeeded at 182 (offset 1 line).
patching file net/mac80211/rx.c
Hunk #1 succeeded at 421 (offset 4 lines).
Hunk #2 succeeded at 2864 (offset 87 lines).
patching file net/mac80211/scan.c
Hunk #1 succeeded at 213 (offset 1 line).
Hunk #2 succeeded at 256 (offset 2 lines).
Hunk #3 succeeded at 288 (offset 2 lines).
Hunk #4 succeeded at 333 (offset 2 lines).
Hunk #5 succeeded at 482 (offset 2 lines).
Hunk torvalds#6 succeeded at 498 (offset 2 lines).
Hunk torvalds#7 succeeded at 516 (offset 2 lines).
Hunk torvalds#8 succeeded at 530 (offset 2 lines).
Hunk torvalds#9 succeeded at 555 (offset 2 lines).
patching file net/mac80211/tx.c
Hunk #1 succeeded at 259 (offset 1 line).
patching file net/mac80211/work.c
Hunk #1 succeeded at 899 (offset -2 lines).
Hunk #2 succeeded at 949 (offset -2 lines).
Hunk #3 succeeded at 1046 (offset -2 lines).
Hunk #4 succeeded at 1054 (offset -2 lines).
jkstrick pushed a commit to jkstrick/linux that referenced this pull request Feb 11, 2012
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not
update the real num tx queues. netdev_queue_update_kobjects() is already
called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when
upper layer driver, e.g., FCoE protocol stack is monitoring the netdev
event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove
extra queues allocated for FCoE, the associated txq sysfs kobjects are already
removed, and trying to update the real num queues would cause something like
below:

...
PID: 25138  TASK: ffff88021e64c440  CPU: 3   COMMAND: "kworker/3:3"
 #0 [ffff88021f007760] machine_kexec at ffffffff810226d9
 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d
 #2 [ffff88021f0078a0] oops_end at ffffffff813bca78
 #3 [ffff88021f0078d0] no_context at ffffffff81029e72
 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155
 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e
 torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e
 torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045
    [exception RIP: sysfs_find_dirent+17]
    RIP: ffffffff81178611  RSP: ffff88021f007bc0  RFLAGS: 00010246
    RAX: ffff88021e64c440  RBX: ffffffff8156cc63  RCX: 0000000000000004
    RDX: ffffffff8156cc63  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88021f007be0   R8: 0000000000000004   R9: 0000000000000008
    R10: ffffffff816fed00  R11: 0000000000000004  R12: 0000000000000000
    R13: ffffffff8156cc63  R14: 0000000000000000  R15: ffff8802222a0000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07
 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27
torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9
torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38
torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe]
torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe]
torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe]
torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q]
torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe]
torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe]
torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca
torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513
torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6
torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4

Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
zachariasmaladroit pushed a commit to galaxys-cm7miui-kernel/linux that referenced this pull request Feb 11, 2012
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not
update the real num tx queues. netdev_queue_update_kobjects() is already
called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when
upper layer driver, e.g., FCoE protocol stack is monitoring the netdev
event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove
extra queues allocated for FCoE, the associated txq sysfs kobjects are already
removed, and trying to update the real num queues would cause something like
below:

...
PID: 25138  TASK: ffff88021e64c440  CPU: 3   COMMAND: "kworker/3:3"
 #0 [ffff88021f007760] machine_kexec at ffffffff810226d9
 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d
 #2 [ffff88021f0078a0] oops_end at ffffffff813bca78
 #3 [ffff88021f0078d0] no_context at ffffffff81029e72
 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155
 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e
 torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e
 torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045
    [exception RIP: sysfs_find_dirent+17]
    RIP: ffffffff81178611  RSP: ffff88021f007bc0  RFLAGS: 00010246
    RAX: ffff88021e64c440  RBX: ffffffff8156cc63  RCX: 0000000000000004
    RDX: ffffffff8156cc63  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88021f007be0   R8: 0000000000000004   R9: 0000000000000008
    R10: ffffffff816fed00  R11: 0000000000000004  R12: 0000000000000000
    R13: ffffffff8156cc63  R14: 0000000000000000  R15: ffff8802222a0000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07
 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27
torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9
torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38
torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe]
torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe]
torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe]
torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q]
torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe]
torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe]
torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca
torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513
torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6
torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4

Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
tworaz pushed a commit to tworaz/linux that referenced this pull request Feb 13, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d1] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8d
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xXorAa pushed a commit to xXorAa/linux that referenced this pull request Feb 17, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d1] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8d
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
torvalds pushed a commit that referenced this pull request Feb 22, 2012
…s are not initialised

Current ARM local timer code registers CPUFREQ notifiers even in case
the twd_timer_setup() isn't called. That seems to be wrong and
would eventually lead to kernel crash on the CPU frequency transitions
on the SOCs where the local timer doesn't exist or broken because of
hardware BUG. Fix it by testing twd_evt and *__this_cpu_ptr(twd_evt).

The issue was observed with v3.3-rc3 and building an OMAP2+ kernel
on OMAP3 SOC which doesn't have TWD.

Below is the dump for reference :

 Unable to handle kernel paging request at virtual address 007e900
 pgd = cdc20000
 [007e9000] *pgd=00000000
 Internal error: Oops: 5 [#1] SMP
 Modules linked in:
 CPU: 0    Not tainted  (3.3.0-rc3-pm+debug+initramfs #9)
 PC is at twd_update_frequency+0x34/0x48
 LR is at twd_update_frequency+0x10/0x48
 pc : [<c001382c>]    lr : [<c0013808>]    psr: 60000093
 sp : ce311dd8  ip : 00000000  fp : 00000000
 r10: 00000000  r9 : 00000001  r8 : ce310000
 r7 : c0440458  r6 : c00137f8  r5 : 00000000  r4 : c0947a74
 r3 : 00000000  r2 : 007e9000  r1 : 00000000  r0 : 00000000
 Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment usr
 Control: 10c5387d  Table: 8dc20019  DAC: 00000015
 Process sh (pid: 599, stack limit = 0xce3102f8)
 Stack: (0xce311dd8 to 0xce312000)
 1dc0:                                                       6000c
 1de0: 00000001 00000002 00000000 00000000 00000000 00000000 00000
 1e00: ffffffff c093d8f0 00000000 ce311ebc 00000001 00000001 ce310
 1e20: c001386c c0437c4c c0e95b60 c0e95ba8 00000001 c0e95bf ffff4
 1e40: 00000000 00000000 c005ef74 ce310000 c0435cf0 ce311ebc 00000
 1e60: ce352b40 0007a120 c08d5108 c08ba040 c08ba040 c005f030 00000
 1e80: c08bc554 c032fe2c 0007a120 c08d4b64 ce352b40 c08d8618 ffff8
 1ea0: c08ba040 c033364c ce311ecc c0433b50 00000002 ffffffea c0330
 1ec0: 0007a120 0007a120 22222201 00000000 22222222 00000000 ce357
 1ee0: ce3d6000 cdc2aed8 ce352ba0 c0470164 00000002 c032f47c 00034
 1f00: c0331cac ce352b40 00000007 c032f6d0 ce352bbc 0003d090 c0930
 1f20: c093d8bc c03306a4 00000007 ce311f80 00000007 cdc2aec0 ce358
 1f40: ce8d20c0 00000007 b6fe5000 ce311f80 00000007 ce310000 0000c
 1f60: c000de74 ce98740 ce8d20c0 b6fe5000 00000000 00000000 0000c
 1f80: 00000000 00000000 001fbac8 00000000 00000007 001fbac8 00004
 1fa0: c000df04 c000dd60 00000007 001fbac8 00000001 b6fe5000 00000
 1fc0: 00000007 001fbac8 00000007 00000004 b6fe5000 00000000 00202
 1fe0: 00000000 beb565f8 00101ffc 00008e8c 60000010 00000001 00000
 [<c001382c>] (twd_update_frequency+0x34/0x48) from [<c008ac4c>] )
 [<c008ac4c>] (smp_call_function_single+0x17c/0x1c8) from [<c0013)
 [<c0013890>] (twd_cpufreq_transition+0x24/0x30) from [<c0437c4c>)
 [<c0437c4c>] (notifier_call_chain+0x44/0x84) from [<c005efe4>] ()
 [<c005efe4>] (__srcu_notifier_call_chain+0x70/0xa4) from [<c005f)
 [<c005f030>] (srcu_notifier_call_chain+0x18/0x20) from [<c032fe2)
 [<c032fe2c>] (cpufreq_notify_transition+0xc8/0x1b0) from [<c0333)
 [<c033364c>] (omap_target+0x1b4/0x28c) from [<c032f47c>] (__cpuf)
 [<c032f47c>] (__cpufreq_driver_target+0x50/0x64) from [<c0331d24)
 [<c0331d24>] (cpufreq_set+0x78/0x98) from [<c032f6d0>] (store_sc)
 [<c032f6d0>] (store_scaling_setspeed+0x5c/0x74) from [<c03306a4>)
 [<c03306a4>] (store+0x58/0x74) from [<c014d868>] (sysfs_write_fi)
 [<c014d868>] (sysfs_write_file+0x80/0xb4) from [<c00f2c2c>] (vfs)
 [<c00f2c2c>] (vfs_write+0xa8/0x138) from [<c00f2e9c>] (sys_write)
 [<c00f2e9c>] (sys_write+0x40/0x6c) from [<c000dd60>] (ret_fast_s)
 Code: e594300c e792210c e1a01000 e5840004 (e7930002)
 ---[ end trace 5da3b5167c1ecdda ]---

Reported-by: Kevin Hilman <khilman@ti.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
koenkooi referenced this pull request in koenkooi/linux Feb 23, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
torvalds pushed a commit that referenced this pull request Feb 24, 2012
There is an issue when memcg unregisters events that were attached to
the same eventfd:

- On the first call mem_cgroup_usage_unregister_event() removes all
  events attached to a given eventfd, and if there were no events left,
  thresholds->primary would become NULL;

- Since there were several events registered, cgroups core will call
  mem_cgroup_usage_unregister_event() again, but now kernel will oops,
  as the function doesn't expect that threshold->primary may be NULL.

That's a good question whether mem_cgroup_usage_unregister_event()
should actually remove all events in one go, but nowadays it can't
do any better as cftype->unregister_event callback doesn't pass
any private event-associated cookie. So, let's fix the issue by
simply checking for threshold->primary.

FWIW, w/o the patch the following oops may be observed:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
 IP: [<ffffffff810be32c>] mem_cgroup_usage_unregister_event+0x9c/0x1f0
 Pid: 574, comm: kworker/0:2 Not tainted 3.3.0-rc4+ #9 Bochs Bochs
 RIP: 0010:[<ffffffff810be32c>]  [<ffffffff810be32c>] mem_cgroup_usage_unregister_event+0x9c/0x1f0
 RSP: 0018:ffff88001d0b9d60  EFLAGS: 00010246
 Process kworker/0:2 (pid: 574, threadinfo ffff88001d0b8000, task ffff88001de91cc0)
 Call Trace:
  [<ffffffff8107092b>] cgroup_event_remove+0x2b/0x60
  [<ffffffff8103db94>] process_one_work+0x174/0x450
  [<ffffffff8103e413>] worker_thread+0x123/0x2d0

Cc: stable <stable@vger.kernel.org>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
koenkooi referenced this pull request in koenkooi/linux Mar 1, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi referenced this pull request in koenkooi/linux Mar 19, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi referenced this pull request in koenkooi/linux Mar 19, 2012
commit 371528c upstream.

There is an issue when memcg unregisters events that were attached to
the same eventfd:

- On the first call mem_cgroup_usage_unregister_event() removes all
  events attached to a given eventfd, and if there were no events left,
  thresholds->primary would become NULL;

- Since there were several events registered, cgroups core will call
  mem_cgroup_usage_unregister_event() again, but now kernel will oops,
  as the function doesn't expect that threshold->primary may be NULL.

That's a good question whether mem_cgroup_usage_unregister_event()
should actually remove all events in one go, but nowadays it can't
do any better as cftype->unregister_event callback doesn't pass
any private event-associated cookie. So, let's fix the issue by
simply checking for threshold->primary.

FWIW, w/o the patch the following oops may be observed:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
 IP: [<ffffffff810be32c>] mem_cgroup_usage_unregister_event+0x9c/0x1f0
 Pid: 574, comm: kworker/0:2 Not tainted 3.3.0-rc4+ #9 Bochs Bochs
 RIP: 0010:[<ffffffff810be32c>]  [<ffffffff810be32c>] mem_cgroup_usage_unregister_event+0x9c/0x1f0
 RSP: 0018:ffff88001d0b9d60  EFLAGS: 00010246
 Process kworker/0:2 (pid: 574, threadinfo ffff88001d0b8000, task ffff88001de91cc0)
 Call Trace:
  [<ffffffff8107092b>] cgroup_event_remove+0x2b/0x60
  [<ffffffff8103db94>] process_one_work+0x174/0x450
  [<ffffffff8103e413>] worker_thread+0x123/0x2d0

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi referenced this pull request in koenkooi/linux Mar 22, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi referenced this pull request in koenkooi/linux Mar 22, 2012
commit 371528c upstream.

There is an issue when memcg unregisters events that were attached to
the same eventfd:

- On the first call mem_cgroup_usage_unregister_event() removes all
  events attached to a given eventfd, and if there were no events left,
  thresholds->primary would become NULL;

- Since there were several events registered, cgroups core will call
  mem_cgroup_usage_unregister_event() again, but now kernel will oops,
  as the function doesn't expect that threshold->primary may be NULL.

That's a good question whether mem_cgroup_usage_unregister_event()
should actually remove all events in one go, but nowadays it can't
do any better as cftype->unregister_event callback doesn't pass
any private event-associated cookie. So, let's fix the issue by
simply checking for threshold->primary.

FWIW, w/o the patch the following oops may be observed:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
 IP: [<ffffffff810be32c>] mem_cgroup_usage_unregister_event+0x9c/0x1f0
 Pid: 574, comm: kworker/0:2 Not tainted 3.3.0-rc4+ #9 Bochs Bochs
 RIP: 0010:[<ffffffff810be32c>]  [<ffffffff810be32c>] mem_cgroup_usage_unregister_event+0x9c/0x1f0
 RSP: 0018:ffff88001d0b9d60  EFLAGS: 00010246
 Process kworker/0:2 (pid: 574, threadinfo ffff88001d0b8000, task ffff88001de91cc0)
 Call Trace:
  [<ffffffff8107092b>] cgroup_event_remove+0x2b/0x60
  [<ffffffff8103db94>] process_one_work+0x174/0x450
  [<ffffffff8103e413>] worker_thread+0x123/0x2d0

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi referenced this pull request in koenkooi/linux Apr 2, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi referenced this pull request in koenkooi/linux Apr 2, 2012
commit 371528c upstream.

There is an issue when memcg unregisters events that were attached to
the same eventfd:

- On the first call mem_cgroup_usage_unregister_event() removes all
  events attached to a given eventfd, and if there were no events left,
  thresholds->primary would become NULL;

- Since there were several events registered, cgroups core will call
  mem_cgroup_usage_unregister_event() again, but now kernel will oops,
  as the function doesn't expect that threshold->primary may be NULL.

That's a good question whether mem_cgroup_usage_unregister_event()
should actually remove all events in one go, but nowadays it can't
do any better as cftype->unregister_event callback doesn't pass
any private event-associated cookie. So, let's fix the issue by
simply checking for threshold->primary.

FWIW, w/o the patch the following oops may be observed:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
 IP: [<ffffffff810be32c>] mem_cgroup_usage_unregister_event+0x9c/0x1f0
 Pid: 574, comm: kworker/0:2 Not tainted 3.3.0-rc4+ #9 Bochs Bochs
 RIP: 0010:[<ffffffff810be32c>]  [<ffffffff810be32c>] mem_cgroup_usage_unregister_event+0x9c/0x1f0
 RSP: 0018:ffff88001d0b9d60  EFLAGS: 00010246
 Process kworker/0:2 (pid: 574, threadinfo ffff88001d0b8000, task ffff88001de91cc0)
 Call Trace:
  [<ffffffff8107092b>] cgroup_event_remove+0x2b/0x60
  [<ffffffff8103db94>] process_one_work+0x174/0x450
  [<ffffffff8103e413>] worker_thread+0x123/0x2d0

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
aotot pushed a commit to jove-decompiler/linux that referenced this pull request Oct 26, 2025
The evlist has the maps with its own refcounts so we don't need to set
the pointers to NULL.  Otherwise following error was reported by Asan.

  # perf test -v 4
   4: Read samples using the mmap interface      :
  --- start ---
  test child forked, pid 139782
  mmap size 528384B

  =================================================================
  ==139782==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7f1f76daee8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x564ba21a0fea in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79
    #2 0x564ba21a1a0f in perf_cpu_map__read /home/namhyung/project/linux/tools/lib/perf/cpumap.c:149
    #3 0x564ba21a21cf in cpu_map__read_all_cpu_map /home/namhyung/project/linux/tools/lib/perf/cpumap.c:166
    #4 0x564ba21a21cf in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:181
    #5 0x564ba1e48298 in test__basic_mmap tests/mmap-basic.c:55
    torvalds#6 0x564ba1e278fb in run_test tests/builtin-test.c:428
    torvalds#7 0x564ba1e278fb in test_and_print tests/builtin-test.c:458
    torvalds#8 0x564ba1e29a53 in __cmd_test tests/builtin-test.c:679
    torvalds#9 0x564ba1e29a53 in cmd_test tests/builtin-test.c:825
    torvalds#10 0x564ba1e95cb4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    torvalds#11 0x564ba1d1fa88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    torvalds#12 0x564ba1d1fa88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    torvalds#13 0x564ba1d1fa88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    torvalds#14 0x7f1f768e4d09 in __libc_start_main ../csu/libc-start.c:308

    ...
  test child finished with 1
  ---- end ----
  Read samples using the mmap interface: FAILED!
  failed to open shell test directory: /home/namhyung/libexec/perf-core/tests/shell

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://lore.kernel.org/r/20210301140409.184570-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
aotot pushed a commit to jove-decompiler/linux that referenced this pull request Oct 26, 2025
The evlist has the maps with its own refcounts so we don't need to set
the pointers to NULL.  Otherwise following error was reported by Asan.

Also change the goto label since it doesn't need to have two.

  # perf test -v 24
  24: Number of exit events of a simple workload :
  --- start ---
  test child forked, pid 145915
  mmap size 528384B

  =================================================================
  ==145915==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7fc44e50d1f8 in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:164
    #1 0x561cf50f4d2e in perf_thread_map__realloc /home/namhyung/project/linux/tools/lib/perf/threadmap.c:23
    #2 0x561cf4eeb949 in thread_map__new_by_tid util/thread_map.c:63
    #3 0x561cf4db7fd2 in test__task_exit tests/task-exit.c:74
    #4 0x561cf4d798fb in run_test tests/builtin-test.c:428
    #5 0x561cf4d798fb in test_and_print tests/builtin-test.c:458
    torvalds#6 0x561cf4d7ba53 in __cmd_test tests/builtin-test.c:679
    torvalds#7 0x561cf4d7ba53 in cmd_test tests/builtin-test.c:825
    torvalds#8 0x561cf4de7d04 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    torvalds#9 0x561cf4c71a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    torvalds#10 0x561cf4c71a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    torvalds#11 0x561cf4c71a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    torvalds#12 0x7fc44e042d09 in __libc_start_main ../csu/libc-start.c:308

    ...
  test child finished with 1
  ---- end ----
  Number of exit events of a simple workload: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
aotot pushed a commit to jove-decompiler/linux that referenced this pull request Oct 26, 2025
The evlist has the maps with its own refcounts so we don't need to set
the pointers to NULL.  Otherwise following error was reported by Asan.

Also change the goto label since it doesn't need to have two.

  # perf test -v 25
  25: Software clock events period values        :
  --- start ---
  test child forked, pid 149154
  mmap size 528384B
  mmap size 528384B

  =================================================================
  ==149154==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7fef5cd071f8 in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:164
    #1 0x56260d5e8b8e in perf_thread_map__realloc /home/namhyung/project/linux/tools/lib/perf/threadmap.c:23
    #2 0x56260d3df7a9 in thread_map__new_by_tid util/thread_map.c:63
    #3 0x56260d2ac6b2 in __test__sw_clock_freq tests/sw-clock.c:65
    #4 0x56260d26d8fb in run_test tests/builtin-test.c:428
    #5 0x56260d26d8fb in test_and_print tests/builtin-test.c:458
    torvalds#6 0x56260d26fa53 in __cmd_test tests/builtin-test.c:679
    torvalds#7 0x56260d26fa53 in cmd_test tests/builtin-test.c:825
    torvalds#8 0x56260d2dbb64 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    torvalds#9 0x56260d165a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    torvalds#10 0x56260d165a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    torvalds#11 0x56260d165a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    torvalds#12 0x7fef5c83cd09 in __libc_start_main ../csu/libc-start.c:308

    ...
  test child finished with 1
  ---- end ----
  Software clock events period values      : FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
aotot pushed a commit to jove-decompiler/linux that referenced this pull request Oct 26, 2025
The evlist and the cpu/thread maps should be released together.
Otherwise following error was reported by Asan.

Note that this test still has memory leaks in DSOs so it still fails
even after this change.  I'll take a look at that too.

  # perf test -v 26
  26: Object code reading                        :
  --- start ---
  test child forked, pid 154184
  Looking at the vmlinux_path (8 entries long)
  symsrc__init: build id mismatch for vmlinux.
  symsrc__init: cannot get elf header.
  Using /proc/kcore for kernel data
  Using /proc/kallsyms for symbols
  Parsing event 'cycles'
  mmap size 528384B
  ...
  =================================================================
  ==154184==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 439 byte(s) in 1 object(s) allocated from:
    #0 0x7fcb66e77037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x55ad9b7e821e in dso__new_id util/dso.c:1256
    #2 0x55ad9b8cfd4a in __machine__addnew_vdso util/vdso.c:132
    #3 0x55ad9b8cfd4a in machine__findnew_vdso util/vdso.c:347
    #4 0x55ad9b845b7e in map__new util/map.c:176
    #5 0x55ad9b8415a2 in machine__process_mmap2_event util/machine.c:1787
    torvalds#6 0x55ad9b8fab16 in perf_tool__process_synth_event util/synthetic-events.c:64
    torvalds#7 0x55ad9b8fab16 in perf_event__synthesize_mmap_events util/synthetic-events.c:499
    torvalds#8 0x55ad9b8fbfdf in __event__synthesize_thread util/synthetic-events.c:741
    torvalds#9 0x55ad9b8ff3e3 in perf_event__synthesize_thread_map util/synthetic-events.c:833
    torvalds#10 0x55ad9b738585 in do_test_code_reading tests/code-reading.c:608
    torvalds#11 0x55ad9b73b25d in test__code_reading tests/code-reading.c:722
    torvalds#12 0x55ad9b6f28fb in run_test tests/builtin-test.c:428
    torvalds#13 0x55ad9b6f28fb in test_and_print tests/builtin-test.c:458
    torvalds#14 0x55ad9b6f4a53 in __cmd_test tests/builtin-test.c:679
    torvalds#15 0x55ad9b6f4a53 in cmd_test tests/builtin-test.c:825
    torvalds#16 0x55ad9b760cc4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    torvalds#17 0x55ad9b5eaa88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    torvalds#18 0x55ad9b5eaa88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    torvalds#19 0x55ad9b5eaa88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    torvalds#20 0x7fcb669acd09 in __libc_start_main ../csu/libc-start.c:308

    ...
  SUMMARY: AddressSanitizer: 471 byte(s) leaked in 2 allocation(s).
  test child finished with 1
  ---- end ----
  Object code reading: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
aotot pushed a commit to jove-decompiler/linux that referenced this pull request Oct 26, 2025
The evlist and the cpu/thread maps should be released together.
Otherwise following error was reported by Asan.

  $ perf test -v 28
  28: Use a dummy software event to keep tracking:
  --- start ---
  test child forked, pid 156810
  mmap size 528384B

  =================================================================
  ==156810==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7f637d2bce8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x55cc6295cffa in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79
    #2 0x55cc6295da1f in perf_cpu_map__read /home/namhyung/project/linux/tools/lib/perf/cpumap.c:149
    #3 0x55cc6295e1df in cpu_map__read_all_cpu_map /home/namhyung/project/linux/tools/lib/perf/cpumap.c:166
    #4 0x55cc6295e1df in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:181
    #5 0x55cc626287cf in test__keep_tracking tests/keep-tracking.c:84
    torvalds#6 0x55cc625e38fb in run_test tests/builtin-test.c:428
    torvalds#7 0x55cc625e38fb in test_and_print tests/builtin-test.c:458
    torvalds#8 0x55cc625e5a53 in __cmd_test tests/builtin-test.c:679
    torvalds#9 0x55cc625e5a53 in cmd_test tests/builtin-test.c:825
    torvalds#10 0x55cc62651cc4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    torvalds#11 0x55cc624dba88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    torvalds#12 0x55cc624dba88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    torvalds#13 0x55cc624dba88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    torvalds#14 0x7f637cdf2d09 in __libc_start_main ../csu/libc-start.c:308

  SUMMARY: AddressSanitizer: 72 byte(s) leaked in 2 allocation(s).
  test child finished with 1
  ---- end ----
  Use a dummy software event to keep tracking: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
aotot pushed a commit to jove-decompiler/linux that referenced this pull request Oct 26, 2025
The evlist and cpu/thread maps should be released together.
Otherwise the following error was reported by Asan.

  $ perf test -v 35
  35: Track with sched_switch                    :
  --- start ---
  test child forked, pid 159287
  Using CPUID GenuineIntel-6-8E-C
  mmap size 528384B
  1295 events recorded

  =================================================================
  ==159287==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7fa28d9a2e8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x5652f5a5affa in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79
    #2 0x5652f5a5ba1f in perf_cpu_map__read /home/namhyung/project/linux/tools/lib/perf/cpumap.c:149
    #3 0x5652f5a5c1df in cpu_map__read_all_cpu_map /home/namhyung/project/linux/tools/lib/perf/cpumap.c:166
    #4 0x5652f5a5c1df in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:181
    #5 0x5652f5723bbf in test__switch_tracking tests/switch-tracking.c:350
    torvalds#6 0x5652f56e18fb in run_test tests/builtin-test.c:428
    torvalds#7 0x5652f56e18fb in test_and_print tests/builtin-test.c:458
    torvalds#8 0x5652f56e3a53 in __cmd_test tests/builtin-test.c:679
    torvalds#9 0x5652f56e3a53 in cmd_test tests/builtin-test.c:825
    torvalds#10 0x5652f574fcc4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    torvalds#11 0x5652f55d9a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    torvalds#12 0x5652f55d9a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    torvalds#13 0x5652f55d9a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    torvalds#14 0x7fa28d4d8d09 in __libc_start_main ../csu/libc-start.c:308

  SUMMARY: AddressSanitizer: 72 byte(s) leaked in 2 allocation(s).
  test child finished with 1
  ---- end ----
  Track with sched_switch: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
aotot pushed a commit to jove-decompiler/linux that referenced this pull request Oct 26, 2025
It missed to call perf_thread_map__put() after using the map.

  $ perf test -v 43
  43: Synthesize thread map                      :
  --- start ---
  test child forked, pid 162640

  =================================================================
  ==162640==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7fd48cdaa1f8 in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:164
    #1 0x563e6d5f8d0e in perf_thread_map__realloc /home/namhyung/project/linux/tools/lib/perf/threadmap.c:23
    #2 0x563e6d3ef69a in thread_map__new_by_pid util/thread_map.c:46
    #3 0x563e6d2cec90 in test__thread_map_synthesize tests/thread-map.c:97
    #4 0x563e6d27d8fb in run_test tests/builtin-test.c:428
    #5 0x563e6d27d8fb in test_and_print tests/builtin-test.c:458
    torvalds#6 0x563e6d27fa53 in __cmd_test tests/builtin-test.c:679
    torvalds#7 0x563e6d27fa53 in cmd_test tests/builtin-test.c:825
    torvalds#8 0x563e6d2ebce4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    torvalds#9 0x563e6d175a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    torvalds#10 0x563e6d175a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    torvalds#11 0x563e6d175a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    torvalds#12 0x7fd48c8dfd09 in __libc_start_main ../csu/libc-start.c:308

  SUMMARY: AddressSanitizer: 8224 byte(s) leaked in 2 allocation(s).
  test child finished with 1
  ---- end ----
  Synthesize thread map: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
aotot pushed a commit to jove-decompiler/linux that referenced this pull request Oct 26, 2025
It should be released after printing the map.

  $ perf test -v 52
  52: Print cpu map                              :
  --- start ---
  test child forked, pid 172233

  =================================================================
  ==172233==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 156 byte(s) in 1 object(s) allocated from:
    #0 0x7fc472518e8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x55e63b378f7a in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79
    #2 0x55e63b37a05c in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:237
    #3 0x55e63b056d16 in cpu_map_print tests/cpumap.c:102
    #4 0x55e63b056d16 in test__cpu_map_print tests/cpumap.c:120
    #5 0x55e63afff8fb in run_test tests/builtin-test.c:428
    torvalds#6 0x55e63afff8fb in test_and_print tests/builtin-test.c:458
    torvalds#7 0x55e63b001a53 in __cmd_test tests/builtin-test.c:679
    torvalds#8 0x55e63b001a53 in cmd_test tests/builtin-test.c:825
    torvalds#9 0x55e63b06dc44 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    torvalds#10 0x55e63aef7a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    torvalds#11 0x55e63aef7a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    torvalds#12 0x55e63aef7a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    torvalds#13 0x7fc47204ed09 in __libc_start_main ../csu/libc-start.c:308
  ...

  SUMMARY: AddressSanitizer: 448 byte(s) leaked in 7 allocation(s).
  test child finished with 1
  ---- end ----
  Print cpu map: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-11-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
aotot pushed a commit to jove-decompiler/linux that referenced this pull request Oct 26, 2025
As guest_irq is coming from KVM_IRQFD API call, it may trigger
crash in svm_update_pi_irte() due to out-of-bounds:

crash> bt
PID: 22218  TASK: ffff951a6ad74980  CPU: 73  COMMAND: "vcpu8"
 #0 [ffffb1ba6707fa40] machine_kexec at ffffffff8565b397
 #1 [ffffb1ba6707fa90] __crash_kexec at ffffffff85788a6d
 #2 [ffffb1ba6707fb58] crash_kexec at ffffffff8578995d
 #3 [ffffb1ba6707fb70] oops_end at ffffffff85623c0d
 #4 [ffffb1ba6707fb90] no_context at ffffffff856692c9
 #5 [ffffb1ba6707fbf8] exc_page_fault at ffffffff85f95b51
 torvalds#6 [ffffb1ba6707fc50] asm_exc_page_fault at ffffffff86000ace
    [exception RIP: svm_update_pi_irte+227]
    RIP: ffffffffc0761b53  RSP: ffffb1ba6707fd08  RFLAGS: 00010086
    RAX: ffffb1ba6707fd78  RBX: ffffb1ba66d91000  RCX: 0000000000000001
    RDX: 00003c803f63f1c0  RSI: 000000000000019a  RDI: ffffb1ba66db2ab8
    RBP: 000000000000019a   R8: 0000000000000040   R9: ffff94ca41b82200
    R10: ffffffffffffffcf  R11: 0000000000000001  R12: 0000000000000001
    R13: 0000000000000001  R14: ffffffffffffffcf  R15: 000000000000005f
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 torvalds#7 [ffffb1ba6707fdb8] kvm_irq_routing_update at ffffffffc09f19a1 [kvm]
 torvalds#8 [ffffb1ba6707fde0] kvm_set_irq_routing at ffffffffc09f2133 [kvm]
 torvalds#9 [ffffb1ba6707fe18] kvm_vm_ioctl at ffffffffc09ef544 [kvm]
    RIP: 00007f143c36488b  RSP: 00007f143a4e04b8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007f05780041d0  RCX: 00007f143c36488b
    RDX: 00007f05780041d0  RSI: 000000004008ae6a  RDI: 0000000000000020
    RBP: 00000000000004e8   R8: 0000000000000008   R9: 00007f05780041e0
    R10: 00007f0578004560  R11: 0000000000000246  R12: 00000000000004e0
    R13: 000000000000001a  R14: 00007f1424001c60  R15: 00007f0578003bc0
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

Vmx have been fix this in commit 96560e7 (KVM: VMX: Do not BUG() on
out-of-bounds guest IRQ), so we can just copy source from that to fix
this.

Co-developed-by: Yi Liu <liu.yi24@zte.com.cn>
Signed-off-by: Yi Liu <liu.yi24@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Message-Id: <20220309113025.44469-1-wang.yi59@zte.com.cn>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
aotot pushed a commit to jove-decompiler/linux that referenced this pull request Oct 26, 2025
It should release the maps at the end.

  $ perf test -v 71
  71: Convert perf time to TSC                   :
  --- start ---
  test child forked, pid 178744
  mmap size 528384B
  1st event perf time 59207256505278 tsc 13187166645142
  rdtsc          time 59207256542151 tsc 13187166723020
  2nd event perf time 59207256543749 tsc 13187166726393

  =================================================================
  ==178744==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7faf601f9e8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x55b620cfc00a in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79
    #2 0x55b620cfca2f in perf_cpu_map__read /home/namhyung/project/linux/tools/lib/perf/cpumap.c:149
    #3 0x55b620cfd1ef in cpu_map__read_all_cpu_map /home/namhyung/project/linux/tools/lib/perf/cpumap.c:166
    #4 0x55b620cfd1ef in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:181
    #5 0x55b6209ef1b2 in test__perf_time_to_tsc tests/perf-time-to-tsc.c:73
    torvalds#6 0x55b6209828fb in run_test tests/builtin-test.c:428
    torvalds#7 0x55b6209828fb in test_and_print tests/builtin-test.c:458
    torvalds#8 0x55b620984a53 in __cmd_test tests/builtin-test.c:679
    torvalds#9 0x55b620984a53 in cmd_test tests/builtin-test.c:825
    torvalds#10 0x55b6209f0cd4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    torvalds#11 0x55b62087aa88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    torvalds#12 0x55b62087aa88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    torvalds#13 0x55b62087aa88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    torvalds#14 0x7faf5fd2fd09 in __libc_start_main ../csu/libc-start.c:308

  SUMMARY: AddressSanitizer: 72 byte(s) leaked in 2 allocation(s).
  test child finished with 1
  ---- end ----
  Convert perf time to TSC: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-12-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
aotot pushed a commit to jove-decompiler/linux that referenced this pull request Oct 26, 2025
I got a segfault when using -r option with event groups.  The option
makes it run the workload multiple times and it will reuse the evlist
and evsel for each run.

While most of resources are allocated and freed properly, the id hash
in the evlist was not and it resulted in the bug.  You can see it with
the address sanitizer like below:

  $ perf stat -r 100 -e '{cycles,instructions}' true
  =================================================================
  ==693052==ERROR: AddressSanitizer: heap-use-after-free on
      address 0x6080000003d0 at pc 0x558c57732835 bp 0x7fff1526adb0 sp 0x7fff1526ada8
  WRITE of size 8 at 0x6080000003d0 thread T0
    #0 0x558c57732834 in hlist_add_head /home/namhyung/project/linux/tools/include/linux/list.h:644
    #1 0x558c57732834 in perf_evlist__id_hash /home/namhyung/project/linux/tools/lib/perf/evlist.c:237
    #2 0x558c57732834 in perf_evlist__id_add /home/namhyung/project/linux/tools/lib/perf/evlist.c:244
    #3 0x558c57732834 in perf_evlist__id_add_fd /home/namhyung/project/linux/tools/lib/perf/evlist.c:285
    #4 0x558c5747733e in store_evsel_ids util/evsel.c:2765
    #5 0x558c5747733e in evsel__store_ids util/evsel.c:2782
    torvalds#6 0x558c5730b717 in __run_perf_stat /home/namhyung/project/linux/tools/perf/builtin-stat.c:895
    torvalds#7 0x558c5730b717 in run_perf_stat /home/namhyung/project/linux/tools/perf/builtin-stat.c:1014
    torvalds#8 0x558c5730b717 in cmd_stat /home/namhyung/project/linux/tools/perf/builtin-stat.c:2446
    torvalds#9 0x558c57427c24 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    torvalds#10 0x558c572b1a48 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    torvalds#11 0x558c572b1a48 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    torvalds#12 0x558c572b1a48 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    torvalds#13 0x7fcadb9f7d09 in __libc_start_main ../csu/libc-start.c:308
    torvalds#14 0x558c572b60f9 in _start (/home/namhyung/project/linux/tools/perf/perf+0x45d0f9)

Actually the nodes in the hash table are struct perf_stream_id and
they were freed in the previous run.  Fix it by resetting the hash.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210225035148.778569-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
aotot pushed a commit to jove-decompiler/linux that referenced this pull request Oct 26, 2025
Sai Krishna says:

====================
octeontx2: Miscellaneous fixes

This patchset includes following fixes.

Patch #1 Fix for the race condition while updating APR table

Patch #2 Fix end bit position in NPC scan config

Patch #3 Fix depth of CAM, MEM table entries

Patch #4 Fix in increase the size of DMAC filter flows

Patch #5 Fix driver crash resulting from invalid interface type
information retrieved from firmware

Patch torvalds#6 Fix incorrect mask used while installing filters involving
fragmented packets

Patch torvalds#7 Fixes for NPC field hash extract w.r.t IPV6 hash reduction,
         IPV6 filed hash configuration.

Patch torvalds#8 Fix for NPC hardware parser configuration destination
         address hash, IPV6 endianness issues.

Patch torvalds#9 Fix for skipping mbox initialization for PFs disabled by firmware.

Patch torvalds#10 Fix disabling packet I/O in case of mailbox timeout.

Patch torvalds#11 Fix detaching LF resources in case of VF probe fail.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
aotot pushed a commit to jove-decompiler/linux that referenced this pull request Oct 26, 2025
xarray can't support arbitrary page cache size.  the largest and supported
page cache size is defined as MAX_PAGECACHE_ORDER by commit 7f71d5b
("mm/filemap: make MAX_PAGECACHE_ORDER acceptable to xarray").  However,
it's possible to have 512MB page cache in the huge memory's collapsing
path on ARM64 system whose base page size is 64KB.  512MB page cache is
breaking the limitation and a warning is raised when the xarray entry is
split as shown in the following example.

[root@dhcp-10-26-1-207 ~]# cat /proc/1/smaps | grep KernelPageSize
KernelPageSize:       64 kB
[root@dhcp-10-26-1-207 ~]# cat /tmp/test.c
   :
int main(int argc, char **argv)
{
	const char *filename = TEST_XFS_FILENAME;
	int fd = 0;
	void *buf = (void *)-1, *p;
	int pgsize = getpagesize();
	int ret = 0;

	if (pgsize != 0x10000) {
		fprintf(stdout, "System with 64KB base page size is required!\n");
		return -EPERM;
	}

	system("echo 0 > /sys/devices/virtual/bdi/253:0/read_ahead_kb");
	system("echo 1 > /proc/sys/vm/drop_caches");

	/* Open the xfs file */
	fd = open(filename, O_RDONLY);
	assert(fd > 0);

	/* Create VMA */
	buf = mmap(NULL, TEST_MEM_SIZE, PROT_READ, MAP_SHARED, fd, 0);
	assert(buf != (void *)-1);
	fprintf(stdout, "mapped buffer at 0x%p\n", buf);

	/* Populate VMA */
	ret = madvise(buf, TEST_MEM_SIZE, MADV_NOHUGEPAGE);
	assert(ret == 0);
	ret = madvise(buf, TEST_MEM_SIZE, MADV_POPULATE_READ);
	assert(ret == 0);

	/* Collapse VMA */
	ret = madvise(buf, TEST_MEM_SIZE, MADV_HUGEPAGE);
	assert(ret == 0);
	ret = madvise(buf, TEST_MEM_SIZE, MADV_COLLAPSE);
	if (ret) {
		fprintf(stdout, "Error %d to madvise(MADV_COLLAPSE)\n", errno);
		goto out;
	}

	/* Split xarray entry. Write permission is needed */
	munmap(buf, TEST_MEM_SIZE);
	buf = (void *)-1;
	close(fd);
	fd = open(filename, O_RDWR);
	assert(fd > 0);
	fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
 		  TEST_MEM_SIZE - pgsize, pgsize);
out:
	if (buf != (void *)-1)
		munmap(buf, TEST_MEM_SIZE);
	if (fd > 0)
		close(fd);

	return ret;
}

[root@dhcp-10-26-1-207 ~]# gcc /tmp/test.c -o /tmp/test
[root@dhcp-10-26-1-207 ~]# /tmp/test
 ------------[ cut here ]------------
 WARNING: CPU: 25 PID: 7560 at lib/xarray.c:1025 xas_split_alloc+0xf8/0x128
 Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib    \
 nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct      \
 nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4      \
 ip_set rfkill nf_tables nfnetlink vfat fat virtio_balloon drm fuse   \
 xfs libcrc32c crct10dif_ce ghash_ce sha2_ce sha256_arm64 virtio_net  \
 sha1_ce net_failover virtio_blk virtio_console failover dimlib virtio_mmio
 CPU: 25 PID: 7560 Comm: test Kdump: loaded Not tainted 6.10.0-rc7-gavin+ torvalds#9
 Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-1.el9 05/24/2024
 pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
 pc : xas_split_alloc+0xf8/0x128
 lr : split_huge_page_to_list_to_order+0x1c4/0x780
 sp : ffff8000ac32f660
 x29: ffff8000ac32f660 x28: ffff0000e0969eb0 x27: ffff8000ac32f6c0
 x26: 0000000000000c40 x25: ffff0000e0969eb0 x24: 000000000000000d
 x23: ffff8000ac32f6c0 x22: ffffffdfc0700000 x21: 0000000000000000
 x20: 0000000000000000 x19: ffffffdfc0700000 x18: 0000000000000000
 x17: 0000000000000000 x16: ffffd5f3708ffc70 x15: 0000000000000000
 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
 x11: ffffffffffffffc0 x10: 0000000000000040 x9 : ffffd5f3708e692c
 x8 : 0000000000000003 x7 : 0000000000000000 x6 : ffff0000e0969eb8
 x5 : ffffd5f37289e378 x4 : 0000000000000000 x3 : 0000000000000c40
 x2 : 000000000000000d x1 : 000000000000000c x0 : 0000000000000000
 Call trace:
  xas_split_alloc+0xf8/0x128
  split_huge_page_to_list_to_order+0x1c4/0x780
  truncate_inode_partial_folio+0xdc/0x160
  truncate_inode_pages_range+0x1b4/0x4a8
  truncate_pagecache_range+0x84/0xa0
  xfs_flush_unmap_range+0x70/0x90 [xfs]
  xfs_file_fallocate+0xfc/0x4d8 [xfs]
  vfs_fallocate+0x124/0x2f0
  ksys_fallocate+0x4c/0xa0
  __arm64_sys_fallocate+0x24/0x38
  invoke_syscall.constprop.0+0x7c/0xd8
  do_el0_svc+0xb4/0xd0
  el0_svc+0x44/0x1d8
  el0t_64_sync_handler+0x134/0x150
  el0t_64_sync+0x17c/0x180

Fix it by correcting the supported page cache orders, different sets for
DAX and other files.  With it corrected, 512MB page cache becomes
disallowed on all non-DAX files on ARM64 system where the base page size
is 64KB.  After this patch is applied, the test program fails with error
-EINVAL returned from __thp_vma_allowable_orders() and the madvise()
system call to collapse the page caches.

Link: https://lkml.kernel.org/r/20240715000423.316491-1-gshan@redhat.com
Fixes: 9f8828d ("mm: Use multi-index entries in the page cache")
Signed-off-by: Gavin Shan <gshan@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: <stable@vger.kernel.org>	[5.17+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
aotot pushed a commit to jove-decompiler/linux that referenced this pull request Oct 26, 2025
This fixes the following hard lockup in isolate_lru_folios() during memory
reclaim.  If the LRU mostly contains ineligible folios this may trigger
watchdog.

watchdog: Watchdog detected hard LOCKUP on cpu 173
RIP: 0010:native_queued_spin_lock_slowpath+0x255/0x2a0
Call Trace:
	_raw_spin_lock_irqsave+0x31/0x40
	folio_lruvec_lock_irqsave+0x5f/0x90
	folio_batch_move_lru+0x91/0x150
	lru_add_drain_per_cpu+0x1c/0x40
	process_one_work+0x17d/0x350
	worker_thread+0x27b/0x3a0
	kthread+0xe8/0x120
	ret_from_fork+0x34/0x50
	ret_from_fork_asm+0x1b/0x30

lruvec->lru_lock owner:

PID: 2865     TASK: ffff888139214d40  CPU: 40   COMMAND: "kswapd0"
 #0 [fffffe0000945e60] crash_nmi_callback at ffffffffa567a555
 #1 [fffffe0000945e68] nmi_handle at ffffffffa563b171
 #2 [fffffe0000945eb0] default_do_nmi at ffffffffa6575920
 #3 [fffffe0000945ed0] exc_nmi at ffffffffa6575af4
 #4 [fffffe0000945ef0] end_repeat_nmi at ffffffffa6601dde
    [exception RIP: isolate_lru_folios+403]
    RIP: ffffffffa597df53  RSP: ffffc90006fb7c28  RFLAGS: 00000002
    RAX: 0000000000000001  RBX: ffffc90006fb7c60  RCX: ffffea04a2196f88
    RDX: ffffc90006fb7c60  RSI: ffffc90006fb7c60  RDI: ffffea04a2197048
    RBP: ffff88812cbd3010   R8: ffffea04a2197008   R9: 0000000000000001
    R10: 0000000000000000  R11: 0000000000000001  R12: ffffea04a2197008
    R13: ffffea04a2197048  R14: ffffc90006fb7de8  R15: 0000000003e3e937
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    <NMI exception stack>
 #5 [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53
 torvalds#6 [ffffc90006fb7cf8] shrink_active_list at ffffffffa597f788
 torvalds#7 [ffffc90006fb7da8] balance_pgdat at ffffffffa5986db0
 torvalds#8 [ffffc90006fb7ec0] kswapd at ffffffffa5987354
 torvalds#9 [ffffc90006fb7ef8] kthread at ffffffffa5748238
crash>

Scenario:
User processe are requesting a large amount of memory and keep page active.
Then a module continuously requests memory from ZONE_DMA32 area.
Memory reclaim will be triggered due to ZONE_DMA32 watermark alarm reached.
However pages in the LRU(active_anon) list are mostly from
the ZONE_NORMAL area.

Reproduce:
Terminal 1: Construct to continuously increase pages active(anon).
mkdir /tmp/memory
mount -t tmpfs -o size=1024000M tmpfs /tmp/memory
dd if=/dev/zero of=/tmp/memory/block bs=4M
tail /tmp/memory/block

Terminal 2:
vmstat -a 1
active will increase.
procs ---memory--- ---swap-- ---io---- -system-- ---cpu--- ...
 r  b   swpd   free  inact active   si   so    bi    bo
 1  0   0 1445623076 45898836 83646008    0    0     0
 1  0   0 1445623076 43450228 86094616    0    0     0
 1  0   0 1445623076 41003480 88541364    0    0     0
 1  0   0 1445623076 38557088 90987756    0    0     0
 1  0   0 1445623076 36109688 93435156    0    0     0
 1  0   0 1445619552 33663256 95881632    0    0     0
 1  0   0 1445619804 31217140 98327792    0    0     0
 1  0   0 1445619804 28769988 100774944    0    0     0
 1  0   0 1445619804 26322348 103222584    0    0     0
 1  0   0 1445619804 23875592 105669340    0    0     0

cat /proc/meminfo | head
Active(anon) increase.
MemTotal:       1579941036 kB
MemFree:        1445618500 kB
MemAvailable:   1453013224 kB
Buffers:            6516 kB
Cached:         128653956 kB
SwapCached:            0 kB
Active:         118110812 kB
Inactive:       11436620 kB
Active(anon):   115345744 kB
Inactive(anon):   945292 kB

When the Active(anon) is 115345744 kB, insmod module triggers
the ZONE_DMA32 watermark.

perf record -e vmscan:mm_vmscan_lru_isolate -aR
perf script
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=2
nr_skipped=2 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=0
nr_skipped=0 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=28835844
nr_skipped=28835844 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=28835844
nr_skipped=28835844 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=29
nr_skipped=29 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=0
nr_skipped=0 nr_taken=0 lru=active_anon

See nr_scanned=28835844.
28835844 * 4k = 115343376KB approximately equal to 115345744 kB.

If increase Active(anon) to 1000G then insmod module triggers
the ZONE_DMA32 watermark. hard lockup will occur.

In my device nr_scanned = 0000000003e3e937 when hard lockup.
Convert to memory size 0x0000000003e3e937 * 4KB = 261072092 KB.

   [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53
    ffffc90006fb7c30: 0000000000000020 0000000000000000
    ffffc90006fb7c40: ffffc90006fb7d40 ffff88812cbd3000
    ffffc90006fb7c50: ffffc90006fb7d30 0000000106fb7de8
    ffffc90006fb7c60: ffffea04a2197008 ffffea0006ed4a48
    ffffc90006fb7c70: 0000000000000000 0000000000000000
    ffffc90006fb7c80: 0000000000000000 0000000000000000
    ffffc90006fb7c90: 0000000000000000 0000000000000000
    ffffc90006fb7ca0: 0000000000000000 0000000003e3e937
    ffffc90006fb7cb0: 0000000000000000 0000000000000000
    ffffc90006fb7cc0: 8d7c0b56b7874b00 ffff88812cbd3000

About the Fixes:
Why did it take eight years to be discovered?

The problem requires the following conditions to occur:
1. The device memory should be large enough.
2. Pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL area.
3. The memory in ZONE_DMA32 needs to reach the watermark.

If the memory is not large enough, or if the usage design of ZONE_DMA32
area memory is reasonable, this problem is difficult to detect.

notes:
The problem is most likely to occur in ZONE_DMA32 and ZONE_NORMAL,
but other suitable scenarios may also trigger the problem.

Link: https://lkml.kernel.org/r/20241119060842.274072-1-liuye@kylinos.cn
Fixes: 0c17553 ("mm, vmscan: begin reclaiming pages on a per-node basis")
Signed-off-by: liuye <liuye@kylinos.cn>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
aotot pushed a commit to jove-decompiler/linux that referenced this pull request Oct 26, 2025
BUG: kernel NULL pointer dereference, address: 00000000000002ec
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP PTI
CPU: 28 UID: 0 PID: 343 Comm: kworker/28:1 Kdump: loaded Tainted: G        OE       6.17.0-rc2+ torvalds#9 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
Workqueue: smc_hs_wq smc_listen_work [smc]
RIP: 0010:smc_ib_is_sg_need_sync+0x9e/0xd0 [smc]
...
Call Trace:
 <TASK>
 smcr_buf_map_link+0x211/0x2a0 [smc]
 __smc_buf_create+0x522/0x970 [smc]
 smc_buf_create+0x3a/0x110 [smc]
 smc_find_rdma_v2_device_serv+0x18f/0x240 [smc]
 ? smc_vlan_by_tcpsk+0x7e/0xe0 [smc]
 smc_listen_find_device+0x1dd/0x2b0 [smc]
 smc_listen_work+0x30f/0x580 [smc]
 process_one_work+0x18c/0x340
 worker_thread+0x242/0x360
 kthread+0xe7/0x220
 ret_from_fork+0x13a/0x160
 ret_from_fork_asm+0x1a/0x30
 </TASK>

If the software RoCE device is used, ibdev->dma_device is a null pointer.
As a result, the problem occurs. Null pointer detection is added to
prevent problems.

Fixes: 9b9f038 ("net/smc: optimize for smc_sndbuf_sync_sg_for_device and smc_rmb_sync_sg_for_cpu")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Reviewed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>
Link: https://patch.msgid.link/20250828124117.2622624-1-liujian56@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
aotot pushed a commit to jove-decompiler/linux that referenced this pull request Oct 26, 2025
Steven Rostedt reported a crash with "ftrace=function" kernel command
line:

[    0.159269] BUG: kernel NULL pointer dereference, address: 000000000000001c
[    0.160254] #PF: supervisor read access in kernel mode
[    0.160975] #PF: error_code(0x0000) - not-present page
[    0.161697] PGD 0 P4D 0
[    0.162055] Oops: Oops: 0000 [#1] SMP PTI
[    0.162619] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.17.0-rc2-test-00006-g48d06e78b7cb-dirty torvalds#9 PREEMPT(undef)
[    0.164141] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[    0.165439] RIP: 0010:kmem_cache_alloc_noprof (mm/slub.c:4237)
[ 0.166186] Code: 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 e4 f0 48 83 ec 20 8b 05 c9 b6 7e 01 <44> 8b 77 1c 65 4c 8b 2d b5 ea 20 02 4c 89 6c 24 18 41 89 f5 21 f0
[    0.168811] RSP: 0000:ffffffffb2e03b30 EFLAGS: 00010086
[    0.169545] RAX: 0000000001fff33f RBX: 0000000000000000 RCX: 0000000000000000
[    0.170544] RDX: 0000000000002800 RSI: 0000000000002800 RDI: 0000000000000000
[    0.171554] RBP: ffffffffb2e03b80 R08: 0000000000000004 R09: ffffffffb2e03c90
[    0.172549] R10: ffffffffb2e03c90 R11: 0000000000000000 R12: 0000000000000000
[    0.173544] R13: ffffffffb2e03c90 R14: ffffffffb2e03c90 R15: 0000000000000001
[    0.174542] FS:  0000000000000000(0000) GS:ffff9d2808114000(0000) knlGS:0000000000000000
[    0.175684] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.176486] CR2: 000000000000001c CR3: 000000007264c001 CR4: 00000000000200b0
[    0.177483] Call Trace:
[    0.177828]  <TASK>
[    0.178123] mas_alloc_nodes (lib/maple_tree.c:176 (discriminator 2) lib/maple_tree.c:1255 (discriminator 2))
[    0.178692] mas_store_gfp (lib/maple_tree.c:5468)
[    0.179223] execmem_cache_add_locked (mm/execmem.c:207)
[    0.179870] execmem_alloc (mm/execmem.c:213 mm/execmem.c:313 mm/execmem.c:335 mm/execmem.c:475)
[    0.180397] ? ftrace_caller (arch/x86/kernel/ftrace_64.S:169)
[    0.180922] ? __pfx_ftrace_caller (arch/x86/kernel/ftrace_64.S:158)
[    0.181517] execmem_alloc_rw (mm/execmem.c:487)
[    0.182052] arch_ftrace_update_trampoline (arch/x86/kernel/ftrace.c:266 arch/x86/kernel/ftrace.c:344 arch/x86/kernel/ftrace.c:474)
[    0.182778] ? ftrace_caller_op_ptr (arch/x86/kernel/ftrace_64.S:182)
[    0.183388] ftrace_update_trampoline (kernel/trace/ftrace.c:7947)
[    0.184024] __register_ftrace_function (kernel/trace/ftrace.c:368)
[    0.184682] ftrace_startup (kernel/trace/ftrace.c:3048)
[    0.185205] ? __pfx_function_trace_call (kernel/trace/trace_functions.c:210)
[    0.185877] register_ftrace_function_nolock (kernel/trace/ftrace.c:8717)
[    0.186595] register_ftrace_function (kernel/trace/ftrace.c:8745)
[    0.187254] ? __pfx_function_trace_call (kernel/trace/trace_functions.c:210)
[    0.187924] function_trace_init (kernel/trace/trace_functions.c:170)
[    0.188499] tracing_set_tracer (kernel/trace/trace.c:5916 kernel/trace/trace.c:6349)
[    0.189088] register_tracer (kernel/trace/trace.c:2391)
[    0.189642] early_trace_init (kernel/trace/trace.c:11075 kernel/trace/trace.c:11149)
[    0.190204] start_kernel (init/main.c:970)
[    0.190732] x86_64_start_reservations (arch/x86/kernel/head64.c:307)
[    0.191381] x86_64_start_kernel (??:?)
[    0.191955] common_startup_64 (arch/x86/kernel/head_64.S:419)
[    0.192534]  </TASK>
[    0.192839] Modules linked in:
[    0.193267] CR2: 000000000000001c
[    0.193730] ---[ end trace 0000000000000000 ]---

The crash happens because on x86 ftrace allocations from execmem require
maple tree to be initialized.

Move maple tree initialization that depends only on slab availability
earlier in boot so that it will happen right after mm_core_init().

Link: https://lkml.kernel.org/r/20250824130759.1732736-1-rppt@kernel.org
Fixes: 6381150 ("x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations")
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Closes: https://lore.kernel.org/all/20250820184743.0302a8b5@gandalf.local.home/
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 27, 2025
 - treat tailcall count as 32-bit for access and update
 - change out_offset scope from file to function
 - minor format/structure changes for consistency

Testing: (skipping fentry, fexit, freplace)
========

root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls
test_bpf: #0 Tail call leaf jited:1 967 PASS
test_bpf: #1 Tail call 2 jited:1 1427 PASS
test_bpf: #2 Tail call 3 jited:1 2373 PASS
test_bpf: #3 Tail call 4 jited:1 2304 PASS
test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS
test_bpf: #5 Tail call load/store jited:1 2249 PASS
test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS
test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS
test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS
test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31
397/1   tailcalls/tailcall_1:OK
397/2   tailcalls/tailcall_2:OK
397/3   tailcalls/tailcall_3:OK
397/4   tailcalls/tailcall_4:OK
397/5   tailcalls/tailcall_5:OK
397/6   tailcalls/tailcall_6:OK
397/7   tailcalls/tailcall_bpf2bpf_1:OK
397/8   tailcalls/tailcall_bpf2bpf_2:OK
397/9   tailcalls/tailcall_bpf2bpf_3:OK
397/10  tailcalls/tailcall_bpf2bpf_4:OK
397/11  tailcalls/tailcall_bpf2bpf_5:OK
397/12  tailcalls/tailcall_bpf2bpf_6:OK
397/17  tailcalls/tailcall_poke:OK
397/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
397/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
397/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
397/27  tailcalls/tailcall_failure:OK
397/28  tailcalls/reject_tail_call_spin_lock:OK
397/29  tailcalls/reject_tail_call_rcu_lock:OK
397/30  tailcalls/reject_tail_call_preempt_lock:OK
397/31  tailcalls/reject_tail_call_ref:OK
397     tailcalls:OK
Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 27, 2025
 - treat tailcall count as 32-bit for access and update
 - change out_offset scope from file to function
 - minor format/structure changes for consistency

Testing: (skipping fentry, fexit, freplace)
========

root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls
test_bpf: #0 Tail call leaf jited:1 967 PASS
test_bpf: #1 Tail call 2 jited:1 1427 PASS
test_bpf: #2 Tail call 3 jited:1 2373 PASS
test_bpf: #3 Tail call 4 jited:1 2304 PASS
test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS
test_bpf: #5 Tail call load/store jited:1 2249 PASS
test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS
test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS
test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS
test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31
397/1   tailcalls/tailcall_1:OK
397/2   tailcalls/tailcall_2:OK
397/3   tailcalls/tailcall_3:OK
397/4   tailcalls/tailcall_4:OK
397/5   tailcalls/tailcall_5:OK
397/6   tailcalls/tailcall_6:OK
397/7   tailcalls/tailcall_bpf2bpf_1:OK
397/8   tailcalls/tailcall_bpf2bpf_2:OK
397/9   tailcalls/tailcall_bpf2bpf_3:OK
397/10  tailcalls/tailcall_bpf2bpf_4:OK
397/11  tailcalls/tailcall_bpf2bpf_5:OK
397/12  tailcalls/tailcall_bpf2bpf_6:OK
397/17  tailcalls/tailcall_poke:OK
397/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
397/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
397/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
397/27  tailcalls/tailcall_failure:OK
397/28  tailcalls/reject_tail_call_spin_lock:OK
397/29  tailcalls/reject_tail_call_rcu_lock:OK
397/30  tailcalls/reject_tail_call_preempt_lock:OK
397/31  tailcalls/reject_tail_call_ref:OK
397     tailcalls:OK
Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 27, 2025
 - treat tailcall count as 32-bit for access and update
 - change out_offset scope from file to function
 - minor format/structure changes for consistency

Testing: (skipping fentry, fexit, freplace)
========

root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls
test_bpf: #0 Tail call leaf jited:1 967 PASS
test_bpf: #1 Tail call 2 jited:1 1427 PASS
test_bpf: #2 Tail call 3 jited:1 2373 PASS
test_bpf: #3 Tail call 4 jited:1 2304 PASS
test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS
test_bpf: #5 Tail call load/store jited:1 2249 PASS
test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS
test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS
test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS
test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31
397/1   tailcalls/tailcall_1:OK
397/2   tailcalls/tailcall_2:OK
397/3   tailcalls/tailcall_3:OK
397/4   tailcalls/tailcall_4:OK
397/5   tailcalls/tailcall_5:OK
397/6   tailcalls/tailcall_6:OK
397/7   tailcalls/tailcall_bpf2bpf_1:OK
397/8   tailcalls/tailcall_bpf2bpf_2:OK
397/9   tailcalls/tailcall_bpf2bpf_3:OK
397/10  tailcalls/tailcall_bpf2bpf_4:OK
397/11  tailcalls/tailcall_bpf2bpf_5:OK
397/12  tailcalls/tailcall_bpf2bpf_6:OK
397/17  tailcalls/tailcall_poke:OK
397/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
397/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
397/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
397/27  tailcalls/tailcall_failure:OK
397/28  tailcalls/reject_tail_call_spin_lock:OK
397/29  tailcalls/reject_tail_call_rcu_lock:OK
397/30  tailcalls/reject_tail_call_preempt_lock:OK
397/31  tailcalls/reject_tail_call_ref:OK
397     tailcalls:OK
Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 27, 2025
 - treat tailcall count as 32-bit for access and update
 - change out_offset scope from file to function
 - minor format/structure changes for consistency

Testing: (skipping fentry, fexit, freplace)
========

root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls
test_bpf: #0 Tail call leaf jited:1 967 PASS
test_bpf: #1 Tail call 2 jited:1 1427 PASS
test_bpf: #2 Tail call 3 jited:1 2373 PASS
test_bpf: #3 Tail call 4 jited:1 2304 PASS
test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS
test_bpf: #5 Tail call load/store jited:1 2249 PASS
test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS
test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS
test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS
test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31
397/1   tailcalls/tailcall_1:OK
397/2   tailcalls/tailcall_2:OK
397/3   tailcalls/tailcall_3:OK
397/4   tailcalls/tailcall_4:OK
397/5   tailcalls/tailcall_5:OK
397/6   tailcalls/tailcall_6:OK
397/7   tailcalls/tailcall_bpf2bpf_1:OK
397/8   tailcalls/tailcall_bpf2bpf_2:OK
397/9   tailcalls/tailcall_bpf2bpf_3:OK
397/10  tailcalls/tailcall_bpf2bpf_4:OK
397/11  tailcalls/tailcall_bpf2bpf_5:OK
397/12  tailcalls/tailcall_bpf2bpf_6:OK
397/17  tailcalls/tailcall_poke:OK
397/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
397/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
397/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
397/27  tailcalls/tailcall_failure:OK
397/28  tailcalls/reject_tail_call_spin_lock:OK
397/29  tailcalls/reject_tail_call_rcu_lock:OK
397/30  tailcalls/reject_tail_call_preempt_lock:OK
397/31  tailcalls/reject_tail_call_ref:OK
397     tailcalls:OK
Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 27, 2025
 - treat tailcall count as 32-bit for access and update
 - change out_offset scope from file to function
 - minor format/structure changes for consistency

Testing: (skipping fentry, fexit, freplace)
========

root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls
test_bpf: #0 Tail call leaf jited:1 967 PASS
test_bpf: #1 Tail call 2 jited:1 1427 PASS
test_bpf: #2 Tail call 3 jited:1 2373 PASS
test_bpf: #3 Tail call 4 jited:1 2304 PASS
test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS
test_bpf: #5 Tail call load/store jited:1 2249 PASS
test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS
test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS
test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS
test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31
397/1   tailcalls/tailcall_1:OK
397/2   tailcalls/tailcall_2:OK
397/3   tailcalls/tailcall_3:OK
397/4   tailcalls/tailcall_4:OK
397/5   tailcalls/tailcall_5:OK
397/6   tailcalls/tailcall_6:OK
397/7   tailcalls/tailcall_bpf2bpf_1:OK
397/8   tailcalls/tailcall_bpf2bpf_2:OK
397/9   tailcalls/tailcall_bpf2bpf_3:OK
397/10  tailcalls/tailcall_bpf2bpf_4:OK
397/11  tailcalls/tailcall_bpf2bpf_5:OK
397/12  tailcalls/tailcall_bpf2bpf_6:OK
397/17  tailcalls/tailcall_poke:OK
397/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
397/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
397/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
397/27  tailcalls/tailcall_failure:OK
397/28  tailcalls/reject_tail_call_spin_lock:OK
397/29  tailcalls/reject_tail_call_rcu_lock:OK
397/30  tailcalls/reject_tail_call_preempt_lock:OK
397/31  tailcalls/reject_tail_call_ref:OK
397     tailcalls:OK
Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 28, 2025
 - treat tailcall count as 32-bit for access and update
 - change out_offset scope from file to function
 - minor format/structure changes for consistency

Testing: (skipping fentry, fexit, freplace)
========

root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls
test_bpf: #0 Tail call leaf jited:1 967 PASS
test_bpf: #1 Tail call 2 jited:1 1427 PASS
test_bpf: #2 Tail call 3 jited:1 2373 PASS
test_bpf: #3 Tail call 4 jited:1 2304 PASS
test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS
test_bpf: #5 Tail call load/store jited:1 2249 PASS
test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS
test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS
test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS
test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31
397/1   tailcalls/tailcall_1:OK
397/2   tailcalls/tailcall_2:OK
397/3   tailcalls/tailcall_3:OK
397/4   tailcalls/tailcall_4:OK
397/5   tailcalls/tailcall_5:OK
397/6   tailcalls/tailcall_6:OK
397/7   tailcalls/tailcall_bpf2bpf_1:OK
397/8   tailcalls/tailcall_bpf2bpf_2:OK
397/9   tailcalls/tailcall_bpf2bpf_3:OK
397/10  tailcalls/tailcall_bpf2bpf_4:OK
397/11  tailcalls/tailcall_bpf2bpf_5:OK
397/12  tailcalls/tailcall_bpf2bpf_6:OK
397/17  tailcalls/tailcall_poke:OK
397/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
397/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
397/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
397/27  tailcalls/tailcall_failure:OK
397/28  tailcalls/reject_tail_call_spin_lock:OK
397/29  tailcalls/reject_tail_call_rcu_lock:OK
397/30  tailcalls/reject_tail_call_preempt_lock:OK
397/31  tailcalls/reject_tail_call_ref:OK
397     tailcalls:OK
Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Oct 29, 2025
commit 0570327 upstream.

Before disabling SR-IOV via config space accesses to the parent PF,
sriov_disable() first removes the PCI devices representing the VFs.

Since commit 9d16947 ("PCI: Add global pci_lock_rescan_remove()")
such removal operations are serialized against concurrent remove and
rescan using the pci_rescan_remove_lock. No such locking was ever added
in sriov_disable() however. In particular when commit 18f9e9d
("PCI/IOV: Factor out sriov_add_vfs()") factored out the PCI device
removal into sriov_del_vfs() there was still no locking around the
pci_iov_remove_virtfn() calls.

On s390 the lack of serialization in sriov_disable() may cause double
remove and list corruption with the below (amended) trace being observed:

  PSW:  0704c00180000000 0000000c914e4b38 (klist_put+56)
  GPRS: 000003800313fb48 0000000000000000 0000000100000001 0000000000000001
	00000000f9b520a8 0000000000000000 0000000000002fbd 00000000f4cc9480
	0000000000000001 0000000000000000 0000000000000000 0000000180692828
	00000000818e8000 000003800313fe2c 000003800313fb20 000003800313fad8
  #0 [3800313fb20] device_del at c9158ad5c
  #1 [3800313fb88] pci_remove_bus_device at c915105ba
  #2 [3800313fbd0] pci_iov_remove_virtfn at c9152f198
  #3 [3800313fc28] zpci_iov_remove_virtfn at c90fb67c0
  #4 [3800313fc60] zpci_bus_remove_device at c90fb6104
  #5 [3800313fca0] __zpci_event_availability at c90fb3dca
  torvalds#6 [3800313fd08] chsc_process_sei_nt0 at c918fe4a2
  torvalds#7 [3800313fd60] crw_collect_info at c91905822
  torvalds#8 [3800313fe10] kthread at c90feb390
  torvalds#9 [3800313fe68] __ret_from_fork at c90f6aa64
  torvalds#10 [3800313fe98] ret_from_fork at c9194f3f2.

This is because in addition to sriov_disable() removing the VFs, the
platform also generates hot-unplug events for the VFs. This being the
reverse operation to the hotplug events generated by sriov_enable() and
handled via pdev->no_vf_scan. And while the event processing takes
pci_rescan_remove_lock and checks whether the struct pci_dev still exists,
the lack of synchronization makes this checking racy.

Other races may also be possible of course though given that this lack of
locking persisted so long observable races seem very rare. Even on s390 the
list corruption was only observed with certain devices since the platform
events are only triggered by config accesses after the removal, so as long
as the removal finished synchronously they would not race. Either way the
locking is missing so fix this by adding it to the sriov_del_vfs() helper.

Just like PCI rescan-remove, locking is also missing in sriov_add_vfs()
including for the error case where pci_stop_and_remove_bus_device() is
called without the PCI rescan-remove lock being held. Even in the non-error
case, adding new PCI devices and buses should be serialized via the PCI
rescan-remove lock. Add the necessary locking.

Fixes: 18f9e9d ("PCI/IOV: Factor out sriov_add_vfs()")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250826-pci_fix_sriov_disable-v1-1-2d0bc938f2a3@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Oct 29, 2025
commit 0570327 upstream.

Before disabling SR-IOV via config space accesses to the parent PF,
sriov_disable() first removes the PCI devices representing the VFs.

Since commit 9d16947 ("PCI: Add global pci_lock_rescan_remove()")
such removal operations are serialized against concurrent remove and
rescan using the pci_rescan_remove_lock. No such locking was ever added
in sriov_disable() however. In particular when commit 18f9e9d
("PCI/IOV: Factor out sriov_add_vfs()") factored out the PCI device
removal into sriov_del_vfs() there was still no locking around the
pci_iov_remove_virtfn() calls.

On s390 the lack of serialization in sriov_disable() may cause double
remove and list corruption with the below (amended) trace being observed:

  PSW:  0704c00180000000 0000000c914e4b38 (klist_put+56)
  GPRS: 000003800313fb48 0000000000000000 0000000100000001 0000000000000001
	00000000f9b520a8 0000000000000000 0000000000002fbd 00000000f4cc9480
	0000000000000001 0000000000000000 0000000000000000 0000000180692828
	00000000818e8000 000003800313fe2c 000003800313fb20 000003800313fad8
  #0 [3800313fb20] device_del at c9158ad5c
  #1 [3800313fb88] pci_remove_bus_device at c915105ba
  #2 [3800313fbd0] pci_iov_remove_virtfn at c9152f198
  #3 [3800313fc28] zpci_iov_remove_virtfn at c90fb67c0
  #4 [3800313fc60] zpci_bus_remove_device at c90fb6104
  #5 [3800313fca0] __zpci_event_availability at c90fb3dca
  torvalds#6 [3800313fd08] chsc_process_sei_nt0 at c918fe4a2
  torvalds#7 [3800313fd60] crw_collect_info at c91905822
  torvalds#8 [3800313fe10] kthread at c90feb390
  torvalds#9 [3800313fe68] __ret_from_fork at c90f6aa64
  torvalds#10 [3800313fe98] ret_from_fork at c9194f3f2.

This is because in addition to sriov_disable() removing the VFs, the
platform also generates hot-unplug events for the VFs. This being the
reverse operation to the hotplug events generated by sriov_enable() and
handled via pdev->no_vf_scan. And while the event processing takes
pci_rescan_remove_lock and checks whether the struct pci_dev still exists,
the lack of synchronization makes this checking racy.

Other races may also be possible of course though given that this lack of
locking persisted so long observable races seem very rare. Even on s390 the
list corruption was only observed with certain devices since the platform
events are only triggered by config accesses after the removal, so as long
as the removal finished synchronously they would not race. Either way the
locking is missing so fix this by adding it to the sriov_del_vfs() helper.

Just like PCI rescan-remove, locking is also missing in sriov_add_vfs()
including for the error case where pci_stop_and_remove_bus_device() is
called without the PCI rescan-remove lock being held. Even in the non-error
case, adding new PCI devices and buses should be serialized via the PCI
rescan-remove lock. Add the necessary locking.

Fixes: 18f9e9d ("PCI/IOV: Factor out sriov_add_vfs()")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250826-pci_fix_sriov_disable-v1-1-2d0bc938f2a3@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Oct 29, 2025
Provide inline memcpy and memset functions that can be used instead of
the GCC builtins when necessary. The immediate use case is for the text
poking functions to avoid the standard memcpy()/memset() calls because
objtool complains about such dynamic calls within an AC=1 region. See
tools/objtool/Documentation/objtool.txt, warning torvalds#9, regarding function
calls with UACCESS enabled.

Some user copy functions such as copy_user_generic() and __clear_user()
have similar rep_{movs,stos} usages. But, those are highly specialized
and hard to combine or reuse for other things. Define these new helpers
for all other usages that need a completely unoptimized, strictly inline
version of memcpy() or memset().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Oct 29, 2025
For patching, the kernel initializes a temporary mm area in the lower
half of the address range. LASS blocks these accesses because its
enforcement relies on bit 63 of the virtual address as opposed to SMAP
which depends on the _PAGE_BIT_USER bit in the page table. Disable LASS
enforcement by toggling the RFLAGS.AC bit during patching to avoid
triggering a #GP fault.

Introduce LASS-specific STAC/CLAC helpers to set the AC bit only on
platforms that need it. Clarify the usage of the new helpers versus the
existing stac()/clac() helpers for SMAP.

The Text poking functions use standard memcpy()/memset() while patching
kernel code. However, objtool complains about calling such dynamic
functions within an AC=1 region. See warning torvalds#9, regarding function
calls with UACCESS enabled, in tools/objtool/Documentation/objtool.txt.

To pacify objtool, one option is to add memcpy() and memset() to the
list of allowed-functions. However, that would provide a blanket
exemption for all usages of memcpy() and memset(). Instead, replace the
standard calls in the text poking functions with their unoptimized,
always-inlined versions. Considering that patching is usually small,
there is no performance impact expected.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Oct 30, 2025
 - treat tailcall count as 32-bit for access and update
 - change out_offset scope from file to function
 - minor format/structure changes for consistency

Testing: (skipping fentry, fexit, freplace)
========

root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls
test_bpf: #0 Tail call leaf jited:1 967 PASS
test_bpf: #1 Tail call 2 jited:1 1427 PASS
test_bpf: #2 Tail call 3 jited:1 2373 PASS
test_bpf: #3 Tail call 4 jited:1 2304 PASS
test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS
test_bpf: #5 Tail call load/store jited:1 2249 PASS
test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS
test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS
test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS
test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31
397/1   tailcalls/tailcall_1:OK
397/2   tailcalls/tailcall_2:OK
397/3   tailcalls/tailcall_3:OK
397/4   tailcalls/tailcall_4:OK
397/5   tailcalls/tailcall_5:OK
397/6   tailcalls/tailcall_6:OK
397/7   tailcalls/tailcall_bpf2bpf_1:OK
397/8   tailcalls/tailcall_bpf2bpf_2:OK
397/9   tailcalls/tailcall_bpf2bpf_3:OK
397/10  tailcalls/tailcall_bpf2bpf_4:OK
397/11  tailcalls/tailcall_bpf2bpf_5:OK
397/12  tailcalls/tailcall_bpf2bpf_6:OK
397/17  tailcalls/tailcall_poke:OK
397/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
397/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
397/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
397/27  tailcalls/tailcall_failure:OK
397/28  tailcalls/reject_tail_call_spin_lock:OK
397/29  tailcalls/reject_tail_call_rcu_lock:OK
397/30  tailcalls/reject_tail_call_preempt_lock:OK
397/31  tailcalls/reject_tail_call_ref:OK
397     tailcalls:OK
Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Nov 3, 2025
 - treat tailcall count as 32-bit for access and update
 - change out_offset scope from file to function
 - minor format/structure changes for consistency

Testing: (skipping fentry, fexit, freplace)
========

root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls
test_bpf: #0 Tail call leaf jited:1 967 PASS
test_bpf: #1 Tail call 2 jited:1 1427 PASS
test_bpf: #2 Tail call 3 jited:1 2373 PASS
test_bpf: #3 Tail call 4 jited:1 2304 PASS
test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS
test_bpf: #5 Tail call load/store jited:1 2249 PASS
test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS
test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS
test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS
test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31
397/1   tailcalls/tailcall_1:OK
397/2   tailcalls/tailcall_2:OK
397/3   tailcalls/tailcall_3:OK
397/4   tailcalls/tailcall_4:OK
397/5   tailcalls/tailcall_5:OK
397/6   tailcalls/tailcall_6:OK
397/7   tailcalls/tailcall_bpf2bpf_1:OK
397/8   tailcalls/tailcall_bpf2bpf_2:OK
397/9   tailcalls/tailcall_bpf2bpf_3:OK
397/10  tailcalls/tailcall_bpf2bpf_4:OK
397/11  tailcalls/tailcall_bpf2bpf_5:OK
397/12  tailcalls/tailcall_bpf2bpf_6:OK
397/17  tailcalls/tailcall_poke:OK
397/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
397/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
397/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
397/27  tailcalls/tailcall_failure:OK
397/28  tailcalls/reject_tail_call_spin_lock:OK
397/29  tailcalls/reject_tail_call_rcu_lock:OK
397/30  tailcalls/reject_tail_call_preempt_lock:OK
397/31  tailcalls/reject_tail_call_ref:OK
397     tailcalls:OK
Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Nov 3, 2025
Fix a kernel NULL pointer dereference in pick_next_task_fair() caused by
EEVDF scheduler arithmetic overflows when cfs_rq->avg_vruntime
approaches the s64 low.

The issue occurs when:
1. cfs_rq->avg_vruntime is driven downward by dynamic reweight
   operations on se->vruntime combined with frequent enqueue/dequeue of
another sched_entity with large se->vlag values. Note that the presence
of only one other sched_entity (besides the current one) is critical
because having more would average out the effect and prevent the
continuous and rapid decrease of cfs_rq->avg_vruntime.
2. These factors `reweight` and `frequent enqueue/dequeue` persistently
   suppress cfs_rq->min_vruntime, causing cfs_rq->avg_vruntime to
decrease rapidly toward S64_MIN.
3. In vruntime_eligible(), the calculation (int64_t)(vruntime -
   cfs_rq->min_vruntime) * load may overflow downward, becoming a large
positive value.
4. This causes vruntime_eligible() to incorrectly judge all tasks as
   ineligible, leading to NULL pointer dereference in
pick_next_task_fair().

The fix addresses this by adjusting the current sched_entity's vruntime
during reweight operations when:
- The entity is cfs_rq->curr and the only running task
- The entity is on the runqueue
- Its vruntime is below min_vruntime

The most straightforward fix would be to adjust the vruntime during
dequeue, but that would require checking and possibly modifying the
curr's vruntime on every dequeue, which has a broader impact and
concurrency concerns. Therefore, we choose to apply the fix in the
reweight path, which is one of the necessary conditions for the problem
to occur.

BUG: kernel NULL pointer dereference, address: 00000000000000a0
RIP: 0010:pick_next_task_fair+0x39b/0xab03

KERNEL: vmlinux  [TAINTED]
DUMPFILE: 127.0.0.1-2025-10-30-13:52:24/vmcore  [PARTIAL DUMP]
CPUS: 4
DATE: Thu Oct 30 05:52:18 UTC 2025
UPTIME: 02:02:50
LOAD AVERAGE: 15.00, 15.00, 15.00
TASKS: 151
NODENAME: SangforOS.localdomain
RELEASE: 6.6.0+
VERSION: #4 SMP Thu Oct 30 11:25:11 CST 2025
MACHINE: x86_64  (2194 Mhz)
MEMORY: 4 GB
PANIC: "Oops: 0000 [#1] SMP PTI" (check log for details)
 PID: 4702
COMMAND: "test_sched_2/-1"
TASK: ffff8881362dcf80  [THREAD_INFO: ffff8881362dcf80]
 CPU: 1
STATE: TASK_UNINTERRUPTIBLE (PANIC)

crash> bt
PID: 4702   TASK: ffff8881362dcf80  CPU: 1   COMMAND: "test_sched_2/-1"
 #0 [ffffc90000fffab0] machine_kexec at ffffffffb567e767
 #1 [ffffc90000fffb10] __crash_kexec at ffffffffb580474a
 #2 [ffffc90000fffbd0] crash_kexec at ffffffffb5805768
 #3 [ffffc90000fffbd8] oops_end at ffffffffb5639599
 #4 [ffffc90000fffbf8] page_fault_oops at ffffffffb56954a8
 #5 [ffffc90000fffc50] exc_page_fault at ffffffffb63424a9
 torvalds#6 [ffffc90000fffcb0] asm_exc_page_fault at ffffffffb6400c12
    [exception RIP: pick_next_task_fair+923]
    RIP: ffffffffb576f22b  RSP: ffffc90000fffd60  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: ffff8881340b4d80  RCX: 82a3cdbe7f1c7aed
    RDX: 01721730951583fc  RSI: 0000000000015f5f  RDI: 00105468401dc9e3
    RBP: ffffc90000fffe18   R8: 00000000000003fa   R9: 0000000000000002
    R10: 0000000000000002  R11: 0000000000000064  R12: ffff8881362dcf80
    R13: ffffc90000fffdc0  R14: ffff8881340b4e00  R15: ffff8881340b4e00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 torvalds#7 [ffffc90000fffdb0] __schedule at ffffffffb6348cc8
 torvalds#8 [ffffc90000fffe20] schedule at ffffffffb63493ab
 torvalds#9 [ffffc90000fffe38] schedule_timeout at ffffffffb634eeaf
crash>
crash>
crash> p runqueues
PER-CPU DATA TYPE:
  struct rq runqueues;
PER-CPU ADDRESSES:
  [0]: ffff888134034d80
  [1]: ffff8881340b4d80
  [2]: ffff888134134d80
  [3]: ffff8881341b4d80
crash>
crash> struct -o rq.cfs ffff8881340b4d80
struct rq {
  [ffff8881340b4e00] struct cfs_rq cfs;
}
crash> struct cfs_rq.nr_running,curr,next,tasks_timeline,min_vruntime,avg_vruntime,avg_load,load,exec_clock ffff8881340b4e00
  nr_running = 3,
  curr = 0xffff888139b57c00,
  next = 0xffff888139b57c00,
  tasks_timeline = {
    rb_root = {
      rb_node = 0xffff8881362d80d0
    },
    rb_leftmost = 0xffff8881362d9b50
  },
  min_vruntime = 4596406356396515,
  avg_vruntime = -9137321448325056783,
  avg_load = 88933,
  load = {
    weight = 92109859,
    inv_weight = 0
  },
  exec_clock = 0,
crash> struct sched_entity.on_rq,deadline,min_vruntime,vruntime,load,vlag,slice,exec_start,sum_exec_runtime,prev_sum_exec_runtime,my_q,run_node 0xffff888139b57c00
  on_rq = 1,
  deadline = 4705706610399852,
  min_vruntime = 4493662477571149,
  vruntime = 4698735667604793,
  load = {
    weight = 1042467,
    inv_weight = 0
  },
  vlag = 4493662483537817,
  slice = 2250000,
  exec_start = 7308537586004,
  sum_exec_runtime = 7196457582967,
  prev_sum_exec_runtime = 7196456203065,
  my_q = 0xffff888139b55000,
  run_node = {
    __rb_parent_color = 1,
    rb_right = 0xffff8881362d80d0,
    rb_left = 0x0
  },
crash> struct sched_entity.deadline,min_vruntime,vruntime,load,vlag,slice,exec_start,sum_exec_runtime,prev_sum_exec_runtime,my_q,run_node -l sched_entity.run_node 0xffff8881362d80d0
  deadline = 4493662533339551,
  min_vruntime = 4493662476669436,
  vruntime = 4493662519944203,
  load = {
    weight = 176128,
    inv_weight = 24970740
  },
  vlag = 4493662519002535,
  slice = 2250000,
  exec_start = 7308527703195,
  sum_exec_runtime = 4759831,
  prev_sum_exec_runtime = 2351660,
  my_q = 0x0,
  run_node = {
    __rb_parent_color = 1,
    rb_right = 0x0,
    rb_left = 0xffff8881362d9b50
  },
crash> struct sched_entity.deadline,min_vruntime,vruntime,load,vlag,slice,exec_start,sum_exec_runtime,prev_sum_exec_runtime,my_q,run_node -l sched_entity.run_node 0xffff8881362d9b50
  deadline = 4493662476695393,
  min_vruntime = 4493662476669436,
  vruntime = 4493662476669436,
  load = {
    weight = 90891264,
    inv_weight = 48388
  },
  vlag = 51914,
  slice = 2250000,
  exec_start = 7308536206102,
  sum_exec_runtime = 2102797408,
  prev_sum_exec_runtime = 2102198648,
  my_q = 0x0,
  run_node = {
    __rb_parent_color = 18446612687273951440,
    rb_right = 0x0,
    rb_left = 0x0
  },
crash>

In vruntime_eligible():
	for sched_entity curr [0xffff888139b57c00]: 	avg [-9033150209515029779], (int64_t)(vruntime - cfs_rq->min_vruntime) * load [9204623872495814378], so return false
	for sched_entity root [0xffff8881362d80d0]: 	avg [-9033150209515029779], (int64_t)(vruntime - cfs_rq->min_vruntime) * load [9204833240987634904], so return false
	for sched_entity leftmost [0xffff8881362d9b50]: avg [-9033150209515029779], (int64_t)(vruntime - cfs_rq->min_vruntime) * load [9204829348379068487], so return false
Therefore, all sched_entities on this cfs_rq have no eligibility to run
to cause the NULL pointer dereference in pick_next_task_fair().

Fixes: 147f3ef ("sched/fair: Implement an EEVDF-like scheduling policy")
Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
Signed-off-by: wulibin163 <wulibin163@126.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Nov 3, 2025
When using perf record with the `--overwrite` option, a segmentation fault
occurs if an event fails to open. For example:

  perf record -e cycles-ct -F 1000 -a --overwrite
  Error:
  cycles-ct:H: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
  perf: Segmentation fault
      #0 0x6466b6 in dump_stack debug.c:366
      #1 0x646729 in sighandler_dump_stack debug.c:378
      #2 0x453fd1 in sigsegv_handler builtin-record.c:722
      #3 0x7f8454e65090 in __restore_rt libc-2.32.so[54090]
      #4 0x6c5671 in __perf_event__synthesize_id_index synthetic-events.c:1862
      #5 0x6c5ac0 in perf_event__synthesize_id_index synthetic-events.c:1943
      torvalds#6 0x458090 in record__synthesize builtin-record.c:2075
      torvalds#7 0x45a85a in __cmd_record builtin-record.c:2888
      torvalds#8 0x45deb6 in cmd_record builtin-record.c:4374
      torvalds#9 0x4e5e33 in run_builtin perf.c:349
      torvalds#10 0x4e60bf in handle_internal_command perf.c:401
      torvalds#11 0x4e6215 in run_argv perf.c:448
      torvalds#12 0x4e653a in main perf.c:555
      torvalds#13 0x7f8454e4fa72 in __libc_start_main libc-2.32.so[3ea72]
      torvalds#14 0x43a3ee in _start ??:0

The --overwrite option implies --tail-synthesize, which collects non-sample
events reflecting the system status when recording finishes. However, when
evsel opening fails (e.g., unsupported event 'cycles-ct'), session->evlist
is not initialized and remains NULL. The code unconditionally calls
record__synthesize() in the error path, which iterates through the NULL
evlist pointer and causes a segfault.

To fix it, move the record__synthesize() call inside the error check block, so
it's only called when there was no error during recording, ensuring that evlist
is properly initialized.

Fixes: 4ea648a ("perf record: Add --tail-synthesize option")
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants