Skip to content

Conversation

@PlaidCat
Copy link
Collaborator

General Process:

Checking Rebuild Commits for Potentially missing commits:

kernel-4.18.0-553.81.1.el8_10

[jmaple@devbox kernel-src-tree]$ cat ciq/ciq_backports/kernel-4.18.0-553.81.1.el8_10/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..kernel-mainline: 567757
Number of commits in rpm: 102
Number of commits matched with upstream: 95 (93.14%)
Number of commits in upstream but not in rpm: 567662
Number of commits NOT found in upstream: 7 (6.86%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.81.1.el8_10 for kernel-4.18.0-553.81.1.el8_10
Clean Cherry Picks: 76 (80.00%)
Empty Cherry Picks: 18 (18.95%)
_______________________________

__EMPTY COMMITS__________________________
c4028fa2daa059ac9231ab3a4f57cbae814b3625 powerpc/mm: drop #ifdef CONFIG_MMU in is_ioremap_addr()
c852023e6fd4fa5f75175729e0b55abb062ca799 huge tmpfs: move shmem_huge_enabled() upwards
5e6e5a12a44ca5ff2b130d8d39aaf9b8c026de94 huge tmpfs: shmem_is_huge(vma, inode, index)
ac86f547ca1002aec2ef66b9e64d03f45bbbfbb9 mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath()
48731c8436c68ce5597dfe72f3836bd6808bedde mm, compaction: rename compact_control->rescan to finish_pageblock
e0228d590beb0d0af345c58a282f01afac5c57f3 mm: zswap: shrink until can accept
54abe19e00cfcc5a72773d15cd00ed19ab763439 writeback: fix dereferencing NULL mapping->host on writeback_page_template
899c6efe58dbe8cb9768057ffc206d03e5a89ce8 mm/vmalloc: extend __find_vmap_area() with one more argument
9ea9cb00a82b53ec39630eac718776d37e41b35a mm: memcontrol: fix GFP_NOFS recursion in memory.high enforcement
8446a4deb6b6bc998f1d8d2a85d1a0c64b9e3a71 slab: kmalloc_size_roundup() must not return 0 for non-zero size
b958d4d08fbfe938af24ea06ebbf839b48fa18a9 mm: hugetlb: simplify per-node sysfs creation and removal
a4a00b451ef5e1deb959088e25e248f4ee399792 mm: hugetlb: eliminate memory-less nodes handling
48b5928e18dc27e05cab3dc4c78cd8a15baaf1e5 base/node.c: initialize the accessor list before registering
63fd327016fdfca6f2fa27eba3496bd079eb8ed3 mm: memcontrol: don't throttle dying tasks on memory.high
d9b3ce8769e371554a669f262bbc61c02a40efcc mm: writeback: ratelimit stat flush from mem_cgroup_wb_stats
287d5fedb377ddc232b216b882723305b27ae31a mm: memcg: use larger batches for proactive reclaim
67bab13307c83fb742c2556b06cdc39dbad27f07 mm/hugetlb: wait for hugetlb folios to be freed
a995199384347261bb3f21b2e171fa7f988bd2f8 mm: fix apply_to_existing_page_range()

__CHANGES NOT IN UPSTREAM________________
Adding prod certs and changed cert date to 20210620
Adding Rocky secure boot certs
Fixing vmlinuz removal
Fixing UEFI CA path
Porting to 8.10, debranding and Rocky branding
Fixing pesign_key_name values
dlm: move to rinfo for all middle conversion cases

Build

[jmaple@devbox code]$ egrep -B 5 -A 5 "\[TIMER\]|^Starting Build" $(ls -t kbuild* | head -n1)
/mnt/code/kernel-src-tree-build
Running make mrproper...
  CLEAN   scripts/basic
  CLEAN   scripts/kconfig
[TIMER]{MRPROPER}: 5s
x86_64 architecture detected, copying config
'configs/kernel-x86_64.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-rocky8_10_rebuild-99b4f48215a2"
Making olddefconfig
--
  HOSTLD  scripts/kconfig/conf
scripts/kconfig/conf  --olddefconfig Kconfig
#
# configuration written to .config
#
Starting Build
scripts/kconfig/conf  --syncconfig Kconfig
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_64_x32.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_64.h
--
  LD [M]  sound/usb/usx2y/snd-usb-usx2y.ko
  LD [M]  sound/virtio/virtio_snd.ko
  LD [M]  sound/x86/snd-hdmi-lpe-audio.ko
  LD [M]  sound/xen/snd_xen_front.ko
  LD [M]  virt/lib/irqbypass.ko
[TIMER]{BUILD}: 1431s
Making Modules
  INSTALL arch/x86/crypto/blowfish-x86_64.ko
  INSTALL arch/x86/crypto/camellia-aesni-avx-x86_64.ko
  INSTALL arch/x86/crypto/camellia-aesni-avx2.ko
  INSTALL arch/x86/crypto/camellia-x86_64.ko
--
  INSTALL sound/usb/usx2y/snd-usb-usx2y.ko
  INSTALL sound/x86/snd-hdmi-lpe-audio.ko
  INSTALL virt/lib/irqbypass.ko
  INSTALL sound/xen/snd_xen_front.ko
  DEPMOD  4.18.0-rocky8_10_rebuild-99b4f48215a2+
[TIMER]{MODULES}: 11s
Making Install
sh ./arch/x86/boot/install.sh 4.18.0-rocky8_10_rebuild-99b4f48215a2+ arch/x86/boot/bzImage \
        System.map "/boot"
[TIMER]{INSTALL}: 23s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-4.18.0-rocky8_10_rebuild-99b4f48215a2+ and Index to 2
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 5s
[TIMER]{BUILD}: 1431s
[TIMER]{MODULES}: 11s
[TIMER]{INSTALL}: 23s
[TIMER]{TOTAL} 1475s
Rebooting in 10 seconds

KSelfTest

[jmaple@devbox code]$ ~/workspace/auto_kernel_history_rebuild/Rocky10/rocky10/code/get_kselftest_diff.sh
kselftest.4.18.0-rocky8_10_rebuild-f35ded7732d4+.log
207
kselftest.4.18.0-rocky8_10_rebuild-9646b4b50868+.log
207
kselftest.4.18.0-rocky8_10_rebuild-baea35f64da5+.log
207
kselftest.4.18.0-rocky8_10_rebuild-99b4f48215a2+.log
207
Before: kselftest.4.18.0-rocky8_10_rebuild-baea35f64da5+.log
After: kselftest.4.18.0-rocky8_10_rebuild-99b4f48215a2+.log
Diff:
No differences found.

jira LE-4623
cve CVE-2025-39849
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Dan Carpenter <dan.carpenter@linaro.org>
commit 62b635d

If the ssid->datalen is more than IEEE80211_MAX_SSID_LEN (32) it would
lead to memory corruption so add some bounds checking.

Fixes: c38c701 ("wifi: cfg80211: Set SSID if it is not already set")
	Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/0aaaae4a3ed37c6252363c34ae4904b1604e8e32.1756456951.git.dan.carpenter@linaro.org
	Signed-off-by: Johannes Berg <johannes.berg@intel.com>
(cherry picked from commit 62b635d)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Yu Zhao <yuzhao@google.com>
commit 5d3ee42

find_get_pages_range() and find_get_pages_range_tag() already correctly
increment reference count on head when seeing compound page, but they
may still use page index from tail.  Page index from tail is always
zero, so these functions don't work on huge shmem.  This hasn't been a
problem because, AFAIK, nobody calls these functions on (huge) shmem.
Fix them anyway just in case.

Link: http://lkml.kernel.org/r/20190110030838.84446-1-yuzhao@google.com
	Signed-off-by: Yu Zhao <yuzhao@google.com>
	Reviewed-by: William Kucharski <william.kucharski@oracle.com>
	Cc: Matthew Wilcox <willy@infradead.org>
	Cc: Amir Goldstein <amir73il@gmail.com>
	Cc: Dave Chinner <david@fromorbit.com>
	Cc: "Darrick J . Wong" <darrick.wong@oracle.com>
	Cc: Johannes Weiner <hannes@cmpxchg.org>
	Cc: Souptick Joarder <jrdr.linux@gmail.com>
	Cc: Hugh Dickins <hughd@google.com>
	Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 5d3ee42)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Christoph Hellwig <hch@lst.de>
commit c9e0fc3

This export was added in this merge window, but without any actual
user, or justification for a modular user.

Fixes: a35a3c6 ("powerpc/mm/hash64: Add a variable to track the end of IO mapping")
	Signed-off-by: Christoph Hellwig <hch@lst.de>
	Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit c9e0fc3)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Christophe Leroy <christophe.leroy@c-s.fr>
commit c4028fa
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.81.1.el8_10/c4028fa2.failed

powerpc always selects CONFIG_MMU and CONFIG_MMU is not checked
anywhere else in powerpc code.

Drop the #ifdef and the alternative part of is_ioremap_addr()

Fixes: 9bd3bb6 ("mm/nvdimm: add is_ioremap_addr and use that to check ioremap address")
	Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
	Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/de395e444fb8dd7a6365c3314d78e15ebb3d7d1b.1566382245.git.christophe.leroy@c-s.fr
(cherry picked from commit c4028fa)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	arch/powerpc/include/asm/pgtable.h
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Matthew Wilcox (Oracle) <willy@infradead.org>
commit ec84821

This isn't just a random struct page, it's known to be a head page, and
calling it head makes the function better self-documenting.  The pgoff_t
is less confusing if it's named index instead of offset.  Also add a
couple of comments to explain why we're doing various things.

	Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Reviewed-by: Christoph Hellwig <hch@lst.de>
	Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
	Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
	Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20200318140253.6141-3-willy@infradead.org
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit ec84821)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Matthew Wilcox (Oracle) <willy@infradead.org>
commit 83daf83

No in-tree users (proc, madvise, memcg, mincore) can be built as a module.

	Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Reviewed-by: Christoph Hellwig <hch@lst.de>
	Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
	Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
	Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Link: http://lkml.kernel.org/r/20200318140253.6141-8-willy@infradead.org
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 83daf83)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Matthew Wilcox (Oracle) <willy@infradead.org>
commit a065060

If THP is disabled, find_subpage() can become a no-op by using
hpage_nr_pages() instead of compound_nr().  hpage_nr_pages() embeds a
check for PageTail, so we can drop the check here.

	Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Reviewed-by: Christoph Hellwig <hch@lst.de>
	Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
	Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
	Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Link: http://lkml.kernel.org/r/20200318140253.6141-5-willy@infradead.org
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit a065060)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Kees Cook <keescook@chromium.org>
commit 27d80fa

Variables declared in a switch statement before any case statements cannot
be automatically initialized with compiler instrumentation (as they are
not part of any execution flow).  With GCC's proposed automatic stack
variable initialization feature, this triggers a warning (and they don't
get initialized).  Clang's automatic stack variable initialization (via
CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also doesn't
initialize such variables[1].  Note that these warnings (or silent
skipping) happen before the dead-store elimination optimization phase, so
even when the automatic initializations are later elided in favor of
direct initializations, the warnings remain.

To avoid these problems, move such variables into the "case" where they're
used or lift them up into the main function body.

mm/shmem.c: In function `shmem_getpage_gfp':
mm/shmem.c:1816:10: warning: statement will never be executed [-Wswitch-unreachable]
 1816 |   loff_t i_size;
      |          ^~~~~~

[1] https://bugs.llvm.org/show_bug.cgi?id=44916

	Signed-off-by: Kees Cook <keescook@chromium.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
	Cc: Hugh Dickins <hughd@google.com>
	Cc: Alexander Potapenko <glider@google.com>
Link: http://lkml.kernel.org/r/20200220062312.69165-1-keescook@chromium.org
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 27d80fa)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Mateusz Nosek <mateusznosek0@gmail.com>
commit 343c3d7

Previously 0 was assigned to variable 'error' but the variable was never
read before reassignemnt later.  So the assignment can be removed.

	Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
	Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
	Cc: Hugh Dickins <hughd@google.com>
Link: http://lkml.kernel.org/r/20200301152832.24595-1-mateusznosek0@gmail.com
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 343c3d7)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Hugh Dickins <hughd@google.com>
commit 71725ed

Yang Shi writes:

Currently, when truncating a shmem file, if the range is partly in a THP
(start or end is in the middle of THP), the pages actually will just get
cleared rather than being freed, unless the range covers the whole THP.
Even though all the subpages are truncated (randomly or sequentially), the
THP may still be kept in page cache.

This might be fine for some usecases which prefer preserving THP, but
balloon inflation is handled in base page size.  So when using shmem THP
as memory backend, QEMU inflation actually doesn't work as expected since
it doesn't free memory.  But the inflation usecase really needs to get the
memory freed.  (Anonymous THP will also not get freed right away, but will
be freed eventually when all subpages are unmapped: whereas shmem THP
still stays in page cache.)

Split THP right away when doing partial hole punch, and if split fails
just clear the page so that read of the punched area will return zeroes.

Hugh Dickins adds:

Our earlier "team of pages" huge tmpfs implementation worked in the way
that Yang Shi proposes; and we have been using this patch to continue to
split the huge page when hole-punched or truncated, since converting over
to the compound page implementation.  Although huge tmpfs gives out huge
pages when available, if the user specifically asks to truncate or punch a
hole (perhaps to free memory, perhaps to reduce the memcg charge), then
the filesystem should do so as best it can, splitting the huge page.

That is not always possible: any additional reference to the huge page
prevents split_huge_page() from succeeding, so the result can be flaky.
But in practice it works successfully enough that we've not seen any
problem from that.

Add shmem_punch_compound() to encapsulate the decision of when a split is
needed, and doing the split if so.  Using this simplifies the flow in
shmem_undo_range(); and the first (trylock) pass does not need to do any
page clearing on failure, because the second pass will either succeed or
do that clearing.  Following the example of zero_user_segment() when
clearing a partial page, add flush_dcache_page() and set_page_dirty() when
clearing a hole - though I'm not certain that either is needed.

But: split_huge_page() would be sure to fail if shmem_undo_range()'s
pagevec holds further references to the huge page.  The easiest way to fix
that is for find_get_entries() to return early, as soon as it has put one
compound head or tail into the pagevec.  At first this felt like a hack;
but on examination, this convention better suits all its callers - or will
do, if the slight one-page-per-pagevec slowdown in shmem_unlock_mapping()
and shmem_seek_hole_data() is transformed into a 512-page-per-pagevec
speedup by checking for compound pages there.

	Signed-off-by: Hugh Dickins <hughd@google.com>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Cc: Yang Shi <yang.shi@linux.alibaba.com>
	Cc: Alexander Duyck <alexander.duyck@gmail.com>
	Cc: "Michael S. Tsirkin" <mst@redhat.com>
	Cc: David Hildenbrand <david@redhat.com>
	Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
	Cc: Matthew Wilcox <willy@infradead.org>
	Cc: Andrea Arcangeli <aarcange@redhat.com>
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2002261959020.10801@eggly.anvils
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 71725ed)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Hugh Dickins <hughd@google.com>
commit 0783ac9

Some optimizers don't notice that shmem_punch_compound() is always true
(PageTransCompound() being false) without CONFIG_TRANSPARENT_HUGEPAGE==y.

Use IS_ENABLED to help them to avoid the BUILD_BUG inside HPAGE_PMD_NR.

Fixes: 71725ed ("mm: huge tmpfs: try to split_huge_page() when punching hole")
	Reported-by: Randy Dunlap <rdunlap@infradead.org>
	Signed-off-by: Hugh Dickins <hughd@google.com>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Tested-by: Randy Dunlap <rdunlap@infradead.org>
	Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2004142339170.10035@eggly.anvils
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 0783ac9)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Matthew Wilcox (Oracle) <willy@infradead.org>
commit 61ef186

Patch series "Return head pages from find_*_entry", v2.

This patch series started out as part of the THP patch set, but it has
some nice effects along the way and it seems worth splitting it out and
submitting separately.

Currently find_get_entry() and find_lock_entry() return the page
corresponding to the requested index, but the first thing most callers do
is find the head page, which we just threw away.  As part of auditing all
the callers, I found some misuses of the APIs and some plain
inefficiencies that I've fixed.

The diffstat is unflattering, but I added more kernel-doc and a new wrapper.

This patch (of 8);

Provide this functionality from the swap cache.  It's useful for
more than just mincore().

	Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Cc: Hugh Dickins <hughd@google.com>
	Cc: William Kucharski <william.kucharski@oracle.com>
	Cc: Jani Nikula <jani.nikula@linux.intel.com>
	Cc: Alexey Dobriyan <adobriyan@gmail.com>
	Cc: Johannes Weiner <hannes@cmpxchg.org>
	Cc: Chris Wilson <chris@chris-wilson.co.uk>
	Cc: Matthew Auld <matthew.auld@intel.com>
	Cc: Huang Ying <ying.huang@intel.com>
Link: https://lkml.kernel.org/r/20200910183318.20139-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20200910183318.20139-2-willy@infradead.org
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 61ef186)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Matthew Wilcox (Oracle) <willy@infradead.org>
commit f5df863

The current code does not protect against swapoff of the underlying
swap device, so this is a bug fix as well as a worthwhile reduction in
code complexity.

	Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Cc: Alexey Dobriyan <adobriyan@gmail.com>
	Cc: Chris Wilson <chris@chris-wilson.co.uk>
	Cc: Huang Ying <ying.huang@intel.com>
	Cc: Hugh Dickins <hughd@google.com>
	Cc: Jani Nikula <jani.nikula@linux.intel.com>
	Cc: Johannes Weiner <hannes@cmpxchg.org>
	Cc: Matthew Auld <matthew.auld@intel.com>
	Cc: William Kucharski <william.kucharski@oracle.com>
Link: https://lkml.kernel.org/r/20200910183318.20139-3-willy@infradead.org
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit f5df863)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Matthew Wilcox (Oracle) <willy@infradead.org>
commit e6e8871

Instead of calling find_get_entry() for every page index, use an XArray
iterator to skip over NULL entries, and avoid calling get_page(),
because we only want the swap entries.

[willy@infradead.org: fix LTP soft lockups]
  Link: https://lkml.kernel.org/r/20200914165032.GS6583@casper.infradead.org

	Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Acked-by: Johannes Weiner <hannes@cmpxchg.org>
	Cc: Alexey Dobriyan <adobriyan@gmail.com>
	Cc: Chris Wilson <chris@chris-wilson.co.uk>
	Cc: Huang Ying <ying.huang@intel.com>
	Cc: Hugh Dickins <hughd@google.com>
	Cc: Jani Nikula <jani.nikula@linux.intel.com>
	Cc: Matthew Auld <matthew.auld@intel.com>
	Cc: William Kucharski <william.kucharski@oracle.com>
	Cc: Qian Cai <cai@redhat.com>
Link: https://lkml.kernel.org/r/20200910183318.20139-4-willy@infradead.org
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit e6e8871)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Matthew Wilcox (Oracle) <willy@infradead.org>
commit 8cf8864

Avoid bumping the refcount on pages when we're only interested in the
swap entries.

	Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Acked-by: Johannes Weiner <hannes@cmpxchg.org>
	Cc: Alexey Dobriyan <adobriyan@gmail.com>
	Cc: Chris Wilson <chris@chris-wilson.co.uk>
	Cc: Huang Ying <ying.huang@intel.com>
	Cc: Hugh Dickins <hughd@google.com>
	Cc: Jani Nikula <jani.nikula@linux.intel.com>
	Cc: Matthew Auld <matthew.auld@intel.com>
	Cc: William Kucharski <william.kucharski@oracle.com>
Link: https://lkml.kernel.org/r/20200910183318.20139-5-willy@infradead.org
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 8cf8864)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Matthew Wilcox (Oracle) <willy@infradead.org>
commit 9dfc8ff

i915 does not want to see value entries.  Switch it to use
find_lock_page() instead, and remove the export of find_lock_entry().
Move find_lock_entry() and find_get_entry() to mm/internal.h to discourage
any future use.

	Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Acked-by: Johannes Weiner <hannes@cmpxchg.org>
	Cc: Alexey Dobriyan <adobriyan@gmail.com>
	Cc: Chris Wilson <chris@chris-wilson.co.uk>
	Cc: Huang Ying <ying.huang@intel.com>
	Cc: Hugh Dickins <hughd@google.com>
	Cc: Jani Nikula <jani.nikula@linux.intel.com>
	Cc: Matthew Auld <matthew.auld@intel.com>
	Cc: William Kucharski <william.kucharski@oracle.com>
Link: https://lkml.kernel.org/r/20200910183318.20139-6-willy@infradead.org
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 9dfc8ff)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Matthew Wilcox (Oracle) <willy@infradead.org>
commit a6de4b4

There are only four callers remaining of find_get_entry().
get_shadow_from_swap_cache() only wants to see shadow entries and doesn't
care about which page is returned.  Push the find_subpage() call into
find_lock_entry(), find_get_incore_page() and pagecache_get_page().

[willy@infradead.org: fix oops]
  Link: https://lkml.kernel.org/r/20200914112738.GM6583@casper.infradead.org

	Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Cc: Alexey Dobriyan <adobriyan@gmail.com>
	Cc: Chris Wilson <chris@chris-wilson.co.uk>
	Cc: Huang Ying <ying.huang@intel.com>
	Cc: Hugh Dickins <hughd@google.com>
	Cc: Jani Nikula <jani.nikula@linux.intel.com>
	Cc: Johannes Weiner <hannes@cmpxchg.org>
	Cc: Matthew Auld <matthew.auld@intel.com>
	Cc: William Kucharski <william.kucharski@oracle.com>
Link: https://lkml.kernel.org/r/20200910183318.20139-7-willy@infradead.org
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit a6de4b4)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Matthew Wilcox (Oracle) <willy@infradead.org>
commit 63ec197

Convert shmem_getpage_gfp() (the only remaining caller of
find_lock_entry()) to cope with a head page being returned instead of
the subpage for the index.

[willy@infradead.org: fix BUG()s]
  Link https://lore.kernel.org/linux-mm/20200912032042.GA6583@casper.infradead.org/

	Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Cc: Alexey Dobriyan <adobriyan@gmail.com>
	Cc: Chris Wilson <chris@chris-wilson.co.uk>
	Cc: Huang Ying <ying.huang@intel.com>
	Cc: Hugh Dickins <hughd@google.com>
	Cc: Jani Nikula <jani.nikula@linux.intel.com>
	Cc: Johannes Weiner <hannes@cmpxchg.org>
	Cc: Matthew Auld <matthew.auld@intel.com>
	Cc: William Kucharski <william.kucharski@oracle.com>
Link: https://lkml.kernel.org/r/20200910183318.20139-8-willy@infradead.org
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 63ec197)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Matthew Wilcox (Oracle) <willy@infradead.org>
commit a8cf7f2

Add a new FGP_HEAD flag which avoids calling find_subpage() and add a
convenience wrapper for it.

	Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Cc: Alexey Dobriyan <adobriyan@gmail.com>
	Cc: Chris Wilson <chris@chris-wilson.co.uk>
	Cc: Huang Ying <ying.huang@intel.com>
	Cc: Hugh Dickins <hughd@google.com>
	Cc: Jani Nikula <jani.nikula@linux.intel.com>
	Cc: Johannes Weiner <hannes@cmpxchg.org>
	Cc: Matthew Auld <matthew.auld@intel.com>
	Cc: William Kucharski <william.kucharski@oracle.com>
Link: https://lkml.kernel.org/r/20200910183318.20139-9-willy@infradead.org
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit a8cf7f2)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
commit 89b4223

Changeset a8cf7f2 ("mm: add find_lock_head") renamed the
index parameter, but forgot to update the kernel-doc markups
accordingly.

Fixes: a8cf7f2 ("mm: add find_lock_head")
	Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
	Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/dce89b296a4f5f9f8f798d5e76b6736c14a916ac.1603791716.git.mchehab+huawei@kernel.org
	Signed-off-by: Jonathan Corbet <corbet@lwn.net>
(cherry picked from commit 89b4223)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Matthew Wilcox (Oracle) <willy@infradead.org>
commit 6638380

The calculation of the end page index was incorrect, leading to a
regression of 70% when running stress-ng.

With this fix, we instead see a performance improvement of 3%.

Fixes: e6e8871 ("mm: optimise madvise WILLNEED")
	Reported-by: kernel test robot <rong.a.chen@intel.com>
	Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Tested-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
	Acked-by: Johannes Weiner <hannes@cmpxchg.org>
	Cc: William Kucharski <william.kucharski@oracle.com>
	Cc: Feng Tang <feng.tang@intel.com>
	Cc: "Chen, Rong A" <rong.a.chen@intel.com>
Link: https://lkml.kernel.org/r/20201109134851.29692-1-willy@infradead.org
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 6638380)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Matthew Wilcox (Oracle) <willy@infradead.org>
commit c49f50d

Patch series "Overhaul multi-page lookups for THP", v4.

This THP prep patchset changes several page cache iteration APIs to only
return head pages.

 - It's only possible to tag head pages in the page cache, so only
   return head pages, not all their subpages.
 - Factor a lot of common code out of the various batch lookup routines
 - Add mapping_seek_hole_data()
 - Unify find_get_entries() and pagevec_lookup_entries()
 - Make find_get_entries only return head pages, like find_get_entry().

These are only loosely connected, but they seem to make sense together as
a series.

This patch (of 14):

Pagecache tags are used for dirty page writeback.  Since dirtiness is
tracked on a per-THP basis, we only want to return the head page rather
than each subpage of a tagged page.  All the filesystems which use huge
pages today are in-memory, so there are no tagged huge pages today.

Link: https://lkml.kernel.org/r/20201112212641.27837-2-willy@infradead.org
	Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
	Reviewed-by: Jan Kara <jack@suse.cz>
	Reviewed-by: William Kucharski <william.kucharski@oracle.com>
	Reviewed-by: Christoph Hellwig <hch@lst.de>
	Cc: Hugh Dickins <hughd@google.com>
	Cc: Johannes Weiner <hannes@cmpxchg.org>
	Cc: Yang Shi <yang.shi@linux.alibaba.com>
	Cc: Dave Chinner <dchinner@redhat.com>
	Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c49f50d)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Matthew Wilcox (Oracle) <willy@infradead.org>
commit 96888e0

The comment shows that the reason for using find_get_entries() is now
stale; find_get_pages() will not return 0 if it hits a consecutive run of
swap entries, and I don't believe it has since 2011.  pagevec_lookup() is
a simpler function to use than find_get_pages(), so use it instead.

Link: https://lkml.kernel.org/r/20201112212641.27837-3-willy@infradead.org
	Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
	Reviewed-by: Jan Kara <jack@suse.cz>
	Reviewed-by: William Kucharski <william.kucharski@oracle.com>
	Reviewed-by: Christoph Hellwig <hch@lst.de>
	Cc: Dave Chinner <dchinner@redhat.com>
	Cc: Hugh Dickins <hughd@google.com>
	Cc: Johannes Weiner <hannes@cmpxchg.org>
	Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
	Cc: Yang Shi <yang.shi@linux.alibaba.com>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 96888e0)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Matthew Wilcox (Oracle) <willy@infradead.org>
commit 8c647dd

There's no need to get a reference to the page, just load the entry and
see if it's a shadow entry.

Link: https://lkml.kernel.org/r/20201112212641.27837-4-willy@infradead.org
	Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
	Reviewed-by: Christoph Hellwig <hch@lst.de>
	Cc: Dave Chinner <dchinner@redhat.com>
	Cc: Hugh Dickins <hughd@google.com>
	Cc: Jan Kara <jack@suse.cz>
	Cc: Johannes Weiner <hannes@cmpxchg.org>
	Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
	Cc: William Kucharski <william.kucharski@oracle.com>
	Cc: Yang Shi <yang.shi@linux.alibaba.com>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 8c647dd)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Rik van Riel <riel@surriel.com>
commit 164cc4f

Patch series "mm,thp,shm: limit shmem THP alloc gfp_mask", v6.

The allocation flags of anonymous transparent huge pages can be controlled
through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can
help the system from getting bogged down in the page reclaim and
compaction code when many THPs are getting allocated simultaneously.

However, the gfp_mask for shmem THP allocations were not limited by those
configuration settings, and some workloads ended up with all CPUs stuck on
the LRU lock in the page reclaim code, trying to allocate dozens of THPs
simultaneously.

This patch applies the same configurated limitation of THPs to shmem
hugepage allocations, to prevent that from happening.

This way a THP defrag setting of "never" or "defer+madvise" will result in
quick allocation failures without direct reclaim when no 2MB free pages
are available.

With this patch applied, THP allocations for tmpfs will be a little more
aggressive than today for files mmapped with MADV_HUGEPAGE, and a little
less aggressive for files that are not mmapped or mapped without that
flag.

This patch (of 4):

The allocation flags of anonymous transparent huge pages can be controlled
through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can
help the system from getting bogged down in the page reclaim and
compaction code when many THPs are getting allocated simultaneously.

However, the gfp_mask for shmem THP allocations were not limited by those
configuration settings, and some workloads ended up with all CPUs stuck on
the LRU lock in the page reclaim code, trying to allocate dozens of THPs
simultaneously.

This patch applies the same configurated limitation of THPs to shmem
hugepage allocations, to prevent that from happening.

Controlling the gfp_mask of THP allocations through the knobs in sysfs
allows users to determine the balance between how aggressively the system
tries to allocate THPs at fault time, and how much the application may end
up stalling attempting those allocations.

This way a THP defrag setting of "never" or "defer+madvise" will result in
quick allocation failures without direct reclaim when no 2MB free pages
are available.

With this patch applied, THP allocations for tmpfs will be a little more
aggressive than today for files mmapped with MADV_HUGEPAGE, and a little
less aggressive for files that are not mmapped or mapped without that
flag.

Link: https://lkml.kernel.org/r/20201124194925.623931-1-riel@surriel.com
Link: https://lkml.kernel.org/r/20201124194925.623931-2-riel@surriel.com
	Signed-off-by: Rik van Riel <riel@surriel.com>
	Acked-by: Michal Hocko <mhocko@suse.com>
	Acked-by: Vlastimil Babka <vbabka@suse.cz>
	Cc: Xu Yu <xuyu@linux.alibaba.com>
	Cc: Mel Gorman <mgorman@suse.de>
	Cc: Andrea Arcangeli <aarcange@redhat.com>
	Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
	Cc: Hugh Dickins <hughd@google.com>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 164cc4f)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Rik van Riel <riel@surriel.com>
commit 78cc8cd

Matthew Wilcox pointed out that the i915 driver opportunistically
allocates tmpfs memory, but will happily reclaim some of its pool if no
memory is available.

Make sure the gfp mask used to opportunistically allocate a THP is always
at least as restrictive as the original gfp mask.

Link: https://lkml.kernel.org/r/20201124194925.623931-3-riel@surriel.com
	Signed-off-by: Rik van Riel <riel@surriel.com>
	Suggested-by: Matthew Wilcox <willy@infradead.org>
	Cc: Andrea Arcangeli <aarcange@redhat.com>
	Cc: Hugh Dickins <hughd@google.com>
	Cc: Mel Gorman <mgorman@suse.de>
	Cc: Michal Hocko <mhocko@suse.com>
	Cc: Vlastimil Babka <vbabka@suse.cz>
	Cc: Xu Yu <xuyu@linux.alibaba.com>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 78cc8cd)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Rik van Riel <riel@surriel.com>
commit 187df5d

Hugh pointed out that the gma500 driver uses shmem pages, but needs to
limit them to the DMA32 zone.  Ensure the allocations resulting from the
gfp_mask returned by limit_gfp_mask use the zone flags that were
originally passed to shmem_getpage_gfp.

Link: https://lkml.kernel.org/r/20210224121016.1314ed6d@imladris.surriel.com
	Signed-off-by: Rik van Riel <riel@surriel.com>
	Suggested-by: Hugh Dickins <hughd@google.com>
	Cc: Michal Hocko <mhocko@suse.com>
	Cc: Vlastimil Babka <vbabka@suse.cz>
	Cc: Xu Yu <xuyu@linux.alibaba.com>
	Cc: Mel Gorman <mgorman@suse.de>
	Cc: Andrea Arcangeli <aarcange@redhat.com>
	Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 187df5d)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Hugh Dickins <hughd@google.com>
commit 2b5bbcb

There's a block of code in shmem_setattr() to add the inode to
shmem_unused_huge_shrink()'s shrinklist when lowering i_size: it dates
from before 5.7 changed truncation to do split_huge_page() for itself, and
should have been removed at that time.

I am over-stating that: split_huge_page() can fail (notably if there's an
extra reference to the page at that time), so there might be value in
retrying.  But there were already retries as truncation worked through the
tails, and this addition risks repeating unsuccessful retries
indefinitely: I'd rather remove it now, and work on reducing the chance of
split_huge_page() failures separately, if we need to.

Link: https://lkml.kernel.org/r/b73b3492-8822-18f9-83e2-938528cdde94@google.com
Fixes: 71725ed ("mm: huge tmpfs: try to split_huge_page() when punching hole")
	Signed-off-by: Hugh Dickins <hughd@google.com>
	Reviewed-by: Yang Shi <shy828301@gmail.com>
	Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
	Cc: Matthew Wilcox <willy@infradead.org>
	Cc: Miaohe Lin <linmiaohe@huawei.com>
	Cc: Michal Hocko <mhocko@suse.com>
	Cc: Mike Kravetz <mike.kravetz@oracle.com>
	Cc: Rik van Riel <riel@surriel.com>
	Cc: Shakeel Butt <shakeelb@google.com>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 2b5bbcb)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Hugh Dickins <hughd@google.com>
commit c852023
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.81.1.el8_10/c852023e.failed

shmem_huge_enabled() is about to be enhanced into shmem_is_huge(), so that
it can be used more widely throughout: before making functional changes,
shift it to its final position (to avoid forward declaration).

Link: https://lkml.kernel.org/r/16fec7b7-5c84-415a-8586-69d8bf6a6685@google.com
	Signed-off-by: Hugh Dickins <hughd@google.com>
	Reviewed-by: Yang Shi <shy828301@gmail.com>
	Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
	Cc: Matthew Wilcox <willy@infradead.org>
	Cc: Miaohe Lin <linmiaohe@huawei.com>
	Cc: Michal Hocko <mhocko@suse.com>
	Cc: Mike Kravetz <mike.kravetz@oracle.com>
	Cc: Rik van Riel <riel@surriel.com>
	Cc: Shakeel Butt <shakeelb@google.com>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c852023)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	mm/shmem.c
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Hugh Dickins <hughd@google.com>
commit acdd9f8

khugepaged's collapse_file() currently uses SGP_NOHUGE to tell
shmem_getpage() not to try allocating a huge page, in the very unlikely
event that a racing hole-punch removes the swapped or fallocated page as
soon as i_pages lock is dropped.

We want to consolidate shmem's huge decisions, removing SGP_HUGE and
SGP_NOHUGE; but cannot quite persuade ourselves that it's okay to regress
the protection in this case - Yang Shi points out that the huge page would
remain indefinitely, charged to root instead of the intended memcg.

collapse_file() should not even allocate a small page in this case: why
proceed if someone is punching a hole?  SGP_READ is almost the right flag
here, except that it optimizes away from a fallocated page, with NULL to
tell caller to fill with zeroes (like a hole); whereas collapse_file()'s
sequence relies on using a cache page.  Add SGP_NOALLOC just for this.

There are too many consecutive "if (page"s there in shmem_getpage_gfp():
group it better; and fix the outdated "bring it back from swap" comment.

Link: https://lkml.kernel.org/r/1355343b-acf-4653-ef79-6aee40214ac5@google.com
	Signed-off-by: Hugh Dickins <hughd@google.com>
	Reviewed-by: Yang Shi <shy828301@gmail.com>
	Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
	Cc: Matthew Wilcox <willy@infradead.org>
	Cc: Miaohe Lin <linmiaohe@huawei.com>
	Cc: Michal Hocko <mhocko@suse.com>
	Cc: Mike Kravetz <mike.kravetz@oracle.com>
	Cc: Rik van Riel <riel@surriel.com>
	Cc: Shakeel Butt <shakeelb@google.com>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit acdd9f8)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author T.J. Mercier <tjmercier@google.com>
commit 13ef742

The root memcg is onlined even when memcg is disabled.  When it's onlined
a 2 second periodic stat flush is started, but no stat flushing is
required when memcg is disabled because there can be no child memcgs.
Most calls to flush memcg stats are avoided when memcg is disabled as a
result of the mem_cgroup_disabled check added in 7d7ef0a ("mm: memcg:
restore subtree stats flushing"), but the periodic flushing started in
mem_cgroup_css_online is not.  Skip it.

Link: https://lkml.kernel.org/r/20240126211927.1171338-1-tjmercier@google.com
Fixes: aa48e47 ("memcg: infrastructure to flush memcg stats")
	Signed-off-by: T.J. Mercier <tjmercier@google.com>
	Acked-by: Shakeel Butt <shakeelb@google.com>
	Acked-by: Johannes Weiner <hannes@cmpxchg.org>
	Acked-by: Chris Li <chrisl@kernel.org>
	Reported-by: Minchan Kim <minchan@google.com>
	Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
	Cc: Michal Hocko <mhocko@kernel.org>
	Cc: Muchun Song <muchun.song@linux.dev>
	Cc: Roman Gushchin <roman.gushchin@linux.dev>
	Cc: Michal Koutn <mkoutny@suse.com>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 13ef742)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author T.J. Mercier <tjmercier@google.com>
commit 287d5fe
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.81.1.el8_10/287d5fed.failed

Before 388536ac291 ("mm:vmscan: fix inaccurate reclaim during proactive
reclaim") we passed the number of pages for the reclaim request directly
to try_to_free_mem_cgroup_pages, which could lead to significant
overreclaim.  After 0388536 the number of pages was limited to a
maximum 32 (SWAP_CLUSTER_MAX) to reduce the amount of overreclaim.
However such a small batch size caused a regression in reclaim performance
due to many more reclaim start/stop cycles inside memory_reclaim.  The
restart cost is amortized over more pages with larger batch sizes, and
becomes a significant component of the runtime if the batch size is too
small.

Reclaim tries to balance nr_to_reclaim fidelity with fairness across nodes
and cgroups over which the pages are spread.  As such, the bigger the
request, the bigger the absolute overreclaim error.  Historic in-kernel
users of reclaim have used fixed, small sized requests to approach an
appropriate reclaim rate over time.  When we reclaim a user request of
arbitrary size, use decaying batch sizes to manage error while maintaining
reasonable throughput.

MGLRU enabled - memcg LRU used
root - full reclaim       pages/sec   time (sec)
pre-0388536ac291      :    68047        10.46
post-0388536ac291     :    13742        inf
(reclaim-reclaimed)/4 :    67352        10.51

MGLRU enabled - memcg LRU not used
/uid_0 - 1G reclaim       pages/sec   time (sec)  overreclaim (MiB)
pre-0388536ac291      :    258822       1.12            107.8
post-0388536ac291     :    105174       2.49            3.5
(reclaim-reclaimed)/4 :    233396       1.12            -7.4

MGLRU enabled - memcg LRU not used
/uid_0 - full reclaim     pages/sec   time (sec)
pre-0388536ac291      :    72334        7.09
post-0388536ac291     :    38105        14.45
(reclaim-reclaimed)/4 :    72914        6.96

[tjmercier@google.com: v4]
  Link: https://lkml.kernel.org/r/20240206175251.3364296-1-tjmercier@google.com
Link: https://lkml.kernel.org/r/20240202233855.1236422-1-tjmercier@google.com
Fixes: 0388536 ("mm:vmscan: fix inaccurate reclaim during proactive reclaim")
	Signed-off-by: T.J. Mercier <tjmercier@google.com>
	Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
	Acked-by: Johannes Weiner <hannes@cmpxchg.org>
	Reviewed-by: Michal Koutny <mkoutny@suse.com>
	Acked-by: Shakeel Butt <shakeelb@google.com>
	Acked-by: Michal Hocko <mhocko@suse.com>
	Cc: Roman Gushchin <roman.gushchin@linux.dev>
	Cc: Shakeel Butt <shakeelb@google.com>
	Cc: Muchun Song <songmuchun@bytedance.com>
	Cc: Efly Young <yangyifei03@kuaishou.com>
	Cc: Yu Zhao <yuzhao@google.com>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 287d5fe)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	mm/memcontrol.c
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Guenter Roeck <linux@roeck-us.net>
commit b1080c6

Two failure patterns are seen randomly when running slub_kunit tests with
CONFIG_SLAB_FREELIST_RANDOM and CONFIG_SLAB_FREELIST_HARDENED enabled.

Pattern 1:
     # test_clobber_zone: pass:1 fail:0 skip:0 total:1
     ok 1 test_clobber_zone
     # test_next_pointer: EXPECTATION FAILED at lib/slub_kunit.c:72
     Expected 3 == slab_errors, but
         slab_errors == 0 (0x0)
     # test_next_pointer: EXPECTATION FAILED at lib/slub_kunit.c:84
     Expected 2 == slab_errors, but
         slab_errors == 0 (0x0)
     # test_next_pointer: pass:0 fail:1 skip:0 total:1
     not ok 2 test_next_pointer

In this case, test_next_pointer() overwrites p[s->offset], but the data
at p[s->offset] is already 0x12.

Pattern 2:
     ok 1 test_clobber_zone
     # test_next_pointer: EXPECTATION FAILED at lib/slub_kunit.c:72
     Expected 3 == slab_errors, but
         slab_errors == 2 (0x2)
     # test_next_pointer: pass:0 fail:1 skip:0 total:1
     not ok 2 test_next_pointer

In this case, p[s->offset] has a value other than 0x12, but one of the
expected failures is nevertheless missing.

Invert data instead of writing a fixed value to corrupt the cache data
structures to fix the problem.

Fixes: 1f9f78b ("mm/slub, kunit: add a KUnit test for SLUB debugging functionality")
	Cc: Oliver Glitta <glittao@gmail.com>
	Cc: Vlastimil Babka <vbabka@suse.cz>
CC: Daniel Latypov <dlatypov@google.com>
	Cc: Marco Elver <elver@google.com>
	Signed-off-by: Guenter Roeck <linux@roeck-us.net>
	Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
(cherry picked from commit b1080c6)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Claudio Imbrenda <imbrenda@linux.ibm.com>
commit 843c328

The function __storage_key_init_range() expects the end address to be
the first byte outside the range to be initialized. I.e. end - start
should be the size of the area to be initialized.

The current code works because __storage_key_init_range() will still loop
over every page in the range, but it is slower than using sske_frame().

Fixes: 964c2c0 ("s390/mm: Clear huge page storage keys on enable_skey")
	Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
	Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20240416114220.28489-2-imbrenda@linux.ibm.com
	Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
(cherry picked from commit 843c328)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Claudio Imbrenda <imbrenda@linux.ibm.com>
commit 412050a

The function __storage_key_init_range() expects the end address to be
the first byte outside the range to be initialized. I.e. end - start
should be the size of the area to be initialized.

The current code works because __storage_key_init_range() will still loop
over every page in the range, but it is slower than using sske_frame().

Fixes: 3afdfca ("s390/mm: Clear skeys for newly mapped huge guest pmds")
	Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
	Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20240416114220.28489-3-imbrenda@linux.ibm.com
	Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
(cherry picked from commit 412050a)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
commit af64977

Since balancing mode was added in bda420b ("numa balancing: migrate
on fault among multiple bound nodes"), it was possible to set this mode
but it wouldn't be shown in /proc/<pid>/numa_maps since there was no
support for it in the mpol_to_str() helper.

Furthermore, because the balancing mode sets the MPOL_F_MORON flag, it
would be displayed as 'default' due a workaround introduced a few years
earlier in 8790c71 ("mm/mempolicy.c: fix mempolicy printing in
numa_maps").

To tidy this up we implement two changes:

Replace the MPOL_F_MORON check by pointer comparison against the
preferred_node_policy array.  By doing this we generalise the current
special casing and replace the incorrect 'default' with the correct 'bind'
for the mode.

Secondly, we add a string representation and corresponding handling for
the MPOL_F_NUMA_BALANCING flag.

With the two changes together we start showing the balancing flag when it
is set and therefore complete the fix.

Representation format chosen is to separate multiple flags with vertical
bars, following what existed long time ago in kernel 2.6.25.  But as
between then and now there wasn't a way to display multiple flags, this
patch does not change the format in practice.

Some /proc/<pid>/numa_maps output examples:

 555559580000 bind=balancing:0-1,3 file=...
 555585800000 bind=balancing|static:0,2 file=...
 555635240000 prefer=relative:0 file=

Link: https://lkml.kernel.org/r/20240708075632.95857-1-tursulin@igalia.com
	Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: bda420b ("numa balancing: migrate on fault among multiple bound nodes")
References: 8790c71 ("mm/mempolicy.c: fix mempolicy printing in numa_maps")
	Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
	Cc: Mel Gorman <mgorman@suse.de>
	Cc: Peter Zijlstra <peterz@infradead.org>
	Cc: Ingo Molnar <mingo@redhat.com>
	Cc: Rik van Riel <riel@surriel.com>
	Cc: Johannes Weiner <hannes@cmpxchg.org>
	Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
	Cc: Dave Hansen <dave.hansen@intel.com>
	Cc: Andi Kleen <ak@linux.intel.com>
	Cc: Michal Hocko <mhocko@suse.com>
	Cc: David Rientjes <rientjes@google.com>
	Cc: <stable@vger.kernel.org>	[5.12+]
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit af64977)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Samuel Holland <samuel.holland@sifive.com>
commit f75c235

Currently, kasan_init_sw_tags() is called before setup_per_cpu_areas(),
so per_cpu(prng_state, cpu) accesses the same address regardless of the
value of "cpu", and the same seed value gets copied to the percpu area
for every CPU. Fix this by moving the call to smp_prepare_boot_cpu(),
which is the first architecture hook after setup_per_cpu_areas().

Fixes: 3c9e3aa ("kasan: add tag related helper functions")
Fixes: 3f41b60 ("kasan: fix random seed generation for tag-based mode")
	Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
	Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20240814091005.969756-1-samuel.holland@sifive.com
	Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit f75c235)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Mike Rapoport (Microsoft) <rppt@kernel.org>
commit 33ea120

The CPA_ARRAY test always uses len[1] as numpages argument to
change_page_attr_set() although the addresses array is different each
iteration of the test loop.

Replace len[1] with len[i] to have numpages matching the addresses array.

Fixes: ecc729f ("x86/mm/cpa: Add ARRAY and PAGES_ARRAY selftests")
	Signed-off-by: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
	Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250126074733.1384926-2-rppt@kernel.org
(cherry picked from commit 33ea120)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Jann Horn <jannh@google.com>
commit 3ef938c

On the following path, flush_tlb_range() can be used for zapping normal
PMD entries (PMD entries that point to page tables) together with the PTE
entries in the pointed-to page table:

    collapse_pte_mapped_thp
      pmdp_collapse_flush
        flush_tlb_range

The arm64 version of flush_tlb_range() has a comment describing that it can
be used for page table removal, and does not use any last-level
invalidation optimizations. Fix the X86 version by making it behave the
same way.

Currently, X86 only uses this information for the following two purposes,
which I think means the issue doesn't have much impact:

 - In native_flush_tlb_multi() for checking if lazy TLB CPUs need to be
   IPI'd to avoid issues with speculative page table walks.
 - In Hyper-V TLB paravirtualization, again for lazy TLB stuff.

The patch "x86/mm: only invalidate final translations with INVLPGB" which
is currently under review (see
<https://lore.kernel.org/all/20241230175550.4046587-13-riel@surriel.com/>)
would probably be making the impact of this a lot worse.

Fixes: 016c4d9 ("x86/mm/tlb: Add freed_tables argument to flush_tlb_mm_range")
	Signed-off-by: Jann Horn <jannh@google.com>
	Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
	Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20250103-x86-collapse-flush-fix-v1-1-3c521856cfa6@google.com
(cherry picked from commit 3ef938c)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Ge Yang <yangge1116@126.com>
commit 67bab13
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.81.1.el8_10/67bab133.failed

Since the introduction of commit c77c0a8 ("mm/hugetlb: defer freeing
of huge pages if in non-task context"), which supports deferring the
freeing of hugetlb pages, the allocation of contiguous memory through
cma_alloc() may fail probabilistically.

In the CMA allocation process, if it is found that the CMA area is
occupied by in-use hugetlb folios, these in-use hugetlb folios need to be
migrated to another location.  When there are no available hugetlb folios
in the free hugetlb pool during the migration of in-use hugetlb folios,
new folios are allocated from the buddy system.  A temporary state is set
on the newly allocated folio.  Upon completion of the hugetlb folio
migration, the temporary state is transferred from the new folios to the
old folios.  Normally, when the old folios with the temporary state are
freed, it is directly released back to the buddy system.  However, due to
the deferred freeing of hugetlb pages, the PageBuddy() check fails,
ultimately leading to the failure of cma_alloc().

Here is a simplified call trace illustrating the process:
cma_alloc()
    ->__alloc_contig_migrate_range() // Migrate in-use hugetlb folios
        ->unmap_and_move_huge_page()
            ->folio_putback_hugetlb() // Free old folios
    ->test_pages_isolated()
        ->__test_page_isolated_in_pageblock()
             ->PageBuddy(page) // Check if the page is in buddy

To resolve this issue, we have implemented a function named
wait_for_freed_hugetlb_folios().  This function ensures that the hugetlb
folios are properly released back to the buddy system after their
migration is completed.  By invoking wait_for_freed_hugetlb_folios()
before calling PageBuddy(), we ensure that PageBuddy() will succeed.

Link: https://lkml.kernel.org/r/1739936804-18199-1-git-send-email-yangge1116@126.com
Fixes: c77c0a8 ("mm/hugetlb: defer freeing of huge pages if in non-task context")
	Signed-off-by: Ge Yang <yangge1116@126.com>
	Reviewed-by: Muchun Song <muchun.song@linux.dev>
	Acked-by: David Hildenbrand <david@redhat.com>
	Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
	Cc: Barry Song <21cnbao@gmail.com>
	Cc: Oscar Salvador <osalvador@suse.de>
	Cc: <stable@vger.kernel.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 67bab13)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	include/linux/hugetlb.h
#	mm/hugetlb.c
#	mm/page_isolation.c
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Michal Hocko <mhocko@suse.com>
commit 9a5b183

28307d9 ("percpu: make pcpu_alloc() aware of current gfp context")
has fixed a reclaim recursion for scoped GFP_NOFS context.  It has done
that by avoiding taking pcpu_alloc_mutex.  This is a correct solution as
the worker context with full GFP_KERNEL allocation/reclaim power and which
is using the same lock cannot block the NOFS pcpu_alloc caller.

On the other hand this is a very conservative approach that could lead to
failures because pcpu_alloc lockless implementation is quite limited.

We have a bug report about premature failures when scsi array of 193
devices is scanned.  Sometimes (not consistently) the scanning aborts
because the iscsid daemon fails to create the queue for a random scsi
device during the scan.  iscsid itslef is running with PR_SET_IO_FLUSHER
set so all allocations from this process context are GFP_NOIO.  This in
turn makes any pcpu_alloc lockless (without pcpu_alloc_mutex) which leads
to pre-mature failures.

It has turned out that iscsid has worked around this by dropping
PR_SET_IO_FLUSHER (open-iscsi/open-iscsi#382) when
scanning host.  But we can do better in this case on the kernel side and
use pcpu_alloc_mutex for NOIO resp.  NOFS constrained allocation scopes
too.  We just need the WQ worker to never trigger IO/FS reclaim.  Achieve
that by enforcing scoped GFP_NOIO for the whole execution of
pcpu_balance_workfn (this will imply NOFS constrain as well).  This will
remove the dependency chain and preserve the full allocation power of the
pcpu_alloc call.

While at it make is_atomic really test for blockable allocations.

Link: https://lkml.kernel.org/r/20250206122633.167896-1-mhocko@kernel.org
Fixes: 28307d9 ("percpu: make pcpu_alloc() aware of current gfp context")
	Signed-off-by: Michal Hocko <mhocko@suse.com>
	Acked-by: Vlastimil Babka <vbabka@suse.cz>
	Cc: Dennis Zhou <dennis@kernel.org>
	Cc: Filipe David Manana <fdmanana@suse.com>
	Cc: Tejun Heo <tj@kernel.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 9a5b183)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Zhenhua Huang <quic_zhenhuah@quicinc.com>
commit 89f43e1

Hotplugged memory can be smaller than the original memory. For example,
on my target:

root@genericarmv8:~# cat /sys/kernel/debug/memblock/memory
   0: 0x0000000064005000..0x0000000064023fff    0 NOMAP
   1: 0x0000000064400000..0x00000000647fffff    0 NOMAP
   2: 0x0000000068000000..0x000000006fffffff    0 DRV_MNG
   3: 0x0000000088800000..0x0000000094ffefff    0 NONE
   4: 0x0000000094fff000..0x0000000094ffffff    0 NOMAP
max_pfn will affect read_page_owner. Therefore, it should first compare and
then select the larger value for max_pfn.

Fixes: 8fac67c ("arm64: mm: update max_pfn after memory hotplug")
	Cc: <stable@vger.kernel.org> # 6.1.x
	Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
	Acked-by: David Hildenbrand <david@redhat.com>
	Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20250321070019.1271859-1-quic_zhenhuah@quicinc.com
	Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 89f43e1)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
commit a995199
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.81.1.el8_10/a9951993.failed

In the case of apply_to_existing_page_range(), apply_to_pte_range() is
reached with 'create' set to false.  When !create, the loop over the PTE
page table is broken.

apply_to_pte_range() will only move to the next PTE entry if 'create' is
true or if the current entry is not pte_none().

This means that the user of apply_to_existing_page_range() will not have
'fn' called for any entries after the first pte_none() in the PTE page
table.

Fix the loop logic in apply_to_pte_range().

There are no known runtime issues from this, but the fix is trivial enough
for stable@ even without a known buggy user.

Link: https://lkml.kernel.org/r/20250409094043.1629234-1-kirill.shutemov@linux.intel.com
	Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: be1db47 ("mm/memory.c: add apply_to_existing_page_range() helper")
	Cc: Daniel Axtens <dja@axtens.net>
	Cc: David Hildenbrand <david@redhat.com>
	Cc: Vlastimil Babka <vbabka@suse.cz>
	Cc: <stable@vger.kernel.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit a995199)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	mm/memory.c
…ble()

jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Baoquan He <bhe@redhat.com>
commit 8c03ebd

Not like fault_in_readable() or fault_in_writeable(), in
fault_in_safe_writeable() local variable 'start' is increased page by page
to loop till the whole address range is handled.  However, it mistakenly
calculates the size of the handled range with 'uaddr - start'.

Fix it here.

Andreas said:

: In gfs2, fault_in_iov_iter_writeable() is used in
: gfs2_file_direct_read() and gfs2_file_read_iter(), so this potentially
: affects buffered as well as direct reads.  This bug could cause those
: gfs2 functions to spin in a loop.

Link: https://lkml.kernel.org/r/20250410035717.473207-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20250410035717.473207-2-bhe@redhat.com
	Signed-off-by: Baoquan He <bhe@redhat.com>
Fixes: fe673d3 ("mm: gup: make fault_in_safe_writeable() use fixup_user_fault()")
	Reviewed-by: Oscar Salvador <osalvador@suse.de>
	Acked-by: David Hildenbrand <david@redhat.com>
	Cc: Andreas Gruenbacher <agruenba@redhat.com>
	Cc: Yanjun.Zhu <yanjun.zhu@linux.dev>
	Cc: <stable@vger.kernel.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 8c03ebd)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Kemeng Shi <shikemeng@huaweicloud.com>
commit 3f778ab

If multi shmem_unuse() for different swap type is called concurrently, a
dead loop could occur as following:

shmem_unuse(typeA)               shmem_unuse(typeB)
 mutex_lock(&shmem_swaplist_mutex)
 list_for_each_entry_safe(info, next, ...)
  ...
  mutex_unlock(&shmem_swaplist_mutex)
  /* info->swapped may drop to 0 */
  shmem_unuse_inode(&info->vfs_inode, type)

                                  mutex_lock(&shmem_swaplist_mutex)
                                  list_for_each_entry(info, next, ...)
                                   if (!info->swapped)
                                    list_del_init(&info->swaplist)

                                  ...
                                  mutex_unlock(&shmem_swaplist_mutex)

  mutex_lock(&shmem_swaplist_mutex)
  /* iterate with offlist entry and encounter a dead loop */
  next = list_next_entry(info, swaplist);
  ...

Restart the iteration if the inode is already off shmem_swaplist list to
fix the issue.

Link: https://lkml.kernel.org/r/20250516170939.965736-4-shikemeng@huaweicloud.com
Fixes: b56a2d8 ("mm: rid swapoff of quadratic complexity")
	Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
	Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
	Cc: Hugh Dickins <hughd@google.com>
	Cc: Kairui Song <kasong@tencent.com>
	Cc: kernel test robot <oliver.sang@intel.com>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 3f778ab)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Moshe Shemesh <moshe@nvidia.com>
commit 33afbfc

In case pci channel becomes offline the driver should not wait for PCI
reads during health dump and recovery flow. The driver has timeout for
each of these loops trying to read PCI, so it would fail anyway.
However, in case of recovery waiting till timeout may cause the pci
error_detected() callback fail to meet pci_dpc_recovered() wait timeout.

Fixes: b3bd076 ("net/mlx5: Report devlink health on FW fatal issues")
	Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
	Reviewed-by: Shay Drori <shayd@nvidia.com>
	Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
	Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 33afbfc)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
cve CVE-2023-53297
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Min Li <lm0963hack@gmail.com>
commit 25e97f7

conn->chan_lock isn't acquired before l2cap_get_chan_by_scid,
if l2cap_get_chan_by_scid returns NULL, then 'bad unlock balance'
is triggered.

	Reported-by: syzbot+9519d6b5b79cf7787cf3@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/000000000000894f5f05f95e9f4d@google.com/
	Signed-off-by: Min Li <lm0963hack@gmail.com>
	Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
(cherry picked from commit 25e97f7)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
cve CVE-2025-39841
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author John Evans <evans1210144@gmail.com>
commit 9dba9a4

Fix a use-after-free window by correcting the buffer release sequence in
the deferred receive path. The code freed the RQ buffer first and only
then cleared the context pointer under the lock. Concurrent paths (e.g.,
ABTS and the repost path) also inspect and release the same pointer under
the lock, so the old order could lead to double-free/UAF.

Note that the repost path already uses the correct pattern: detach the
pointer under the lock, then free it after dropping the lock. The
deferred path should do the same.

Fixes: 472e146 ("scsi: lpfc: Correct upcalling nvmet_fc transport during io done downcall")
	Cc: stable@vger.kernel.org
	Signed-off-by: John Evans <evans1210144@gmail.com>
Link: https://lore.kernel.org/r/20250828044008.743-1-evans1210144@gmail.com
	Reviewed-by: Justin Tee <justin.tee@broadcom.com>
	Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 9dba9a4)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
cve CVE-2025-39817
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Li Nan <linan122@huawei.com>
commit a6358f8

Observed on kernel 6.6 (present on master as well):

  BUG: KASAN: slab-out-of-bounds in memcmp+0x98/0xd0
  Call trace:
   kasan_check_range+0xe8/0x190
   __asan_loadN+0x1c/0x28
   memcmp+0x98/0xd0
   efivarfs_d_compare+0x68/0xd8
   __d_lookup_rcu_op_compare+0x178/0x218
   __d_lookup_rcu+0x1f8/0x228
   d_alloc_parallel+0x150/0x648
   lookup_open.isra.0+0x5f0/0x8d0
   open_last_lookups+0x264/0x828
   path_openat+0x130/0x3f8
   do_filp_open+0x114/0x248
   do_sys_openat2+0x340/0x3c0
   __arm64_sys_openat+0x120/0x1a0

If dentry->d_name.len < EFI_VARIABLE_GUID_LEN , 'guid' can become
negative, leadings to oob. The issue can be triggered by parallel
lookups using invalid filename:

  T1			T2
  lookup_open
   ->lookup
    simple_lookup
     d_add
     // invalid dentry is added to hash list

			lookup_open
			 d_alloc_parallel
			  __d_lookup_rcu
			   __d_lookup_rcu_op_compare
			    hlist_bl_for_each_entry_rcu
			    // invalid dentry can be retrieved
			     ->d_compare
			      efivarfs_d_compare
			      // oob

Fix it by checking 'guid' before cmp.

Fixes: da27a24 ("efivarfs: guid part of filenames are case-insensitive")
	Signed-off-by: Li Nan <linan122@huawei.com>
	Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
	Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
(cherry picked from commit a6358f8)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
cve CVE-2023-53386
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Min Li <lm0963hack@gmail.com>
commit 3673952

Similar to commit c5d2b6f ("Bluetooth: Fix use-after-free in
hci_remove_ltk/hci_remove_irk"). We can not access k after kfree_rcu()
call.

Fixes: d7d4168 ("Bluetooth: Fix Suspicious RCU usage warnings")
	Signed-off-by: Min Li <lm0963hack@gmail.com>
	Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
(cherry picked from commit 3673952)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4623
cve CVE-2022-50386
Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10
commit-author Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
commit 35fcbc4

This uses l2cap_chan_hold_unless_zero() after calling
__l2cap_get_chan_blah() to prevent the following trace:

Bluetooth: l2cap_core.c:static void l2cap_chan_destroy(struct kref
*kref)
Bluetooth: chan 0000000023c4974d
Bluetooth: parent 00000000ae861c08
==================================================================
BUG: KASAN: use-after-free in __mutex_waiter_is_first
kernel/locking/mutex.c:191 [inline]
BUG: KASAN: use-after-free in __mutex_lock_common
kernel/locking/mutex.c:671 [inline]
BUG: KASAN: use-after-free in __mutex_lock+0x278/0x400
kernel/locking/mutex.c:729
Read of size 8 at addr ffff888006a49b08 by task kworker/u3:2/389

Link: https://lore.kernel.org/lkml/20220622082716.478486-1-lee.jones@linaro.org
	Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
	Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
(cherry picked from commit 35fcbc4)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..kernel-mainline: 567757
Number of commits in rpm: 102
Number of commits matched with upstream: 95 (93.14%)
Number of commits in upstream but not in rpm: 567662
Number of commits NOT found in upstream: 7 (6.86%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.81.1.el8_10 for kernel-4.18.0-553.81.1.el8_10
Clean Cherry Picks: 76 (80.00%)
Empty Cherry Picks: 18 (18.95%)
_______________________________

Full Details Located here:
ciq/ciq_backports/kernel-4.18.0-553.81.1.el8_10/rebuild.details.txt

Includes:
* git commit header above
* Empty Commits with upstream SHA
* RPM ChangeLog Entries that could not be matched

Individual Empty Commit failures contained in the same containing directory.
The git message for empty commits will have the path for the failed commit.
File names are the first 8 characters of the upstream SHA
@PlaidCat PlaidCat requested review from a team and Copilot October 30, 2025 18:33
@PlaidCat PlaidCat self-assigned this Oct 30, 2025

This comment was marked as resolved.

Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

@PlaidCat PlaidCat merged commit 99b4f48 into rocky8_10 Oct 31, 2025
8 checks passed
@PlaidCat PlaidCat deleted the rocky8_10_rebuild branch October 31, 2025 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants