Skip to content

Conversation

@PlaidCat
Copy link
Collaborator

General Process:

Checking Rebuild Commits for Potentially missing commits:

kernel-4.18.0-553.83.1.el8_10

[jmaple@devbox kernel-src-tree]$ cat ciq/ciq_backports/kernel-4.18.0-553.83.1.el8_10/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..kernel-mainline: 567757
Number of commits in rpm: 27
Number of commits matched with upstream: 19 (70.37%)
Number of commits in upstream but not in rpm: 567738
Number of commits NOT found in upstream: 8 (29.63%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.83.1.el8_10 for kernel-4.18.0-553.83.1.el8_10
Clean Cherry Picks: 8 (42.11%)
Empty Cherry Picks: 11 (57.89%)
_______________________________

__EMPTY COMMITS__________________________
e3b63e966cac0bf78aaa1efede1827a252815a1d mm: zswap: fix missing folio cleanup in writeback race path
c4abe6234246c75cdc43326415d9cff88b7cf06c s390/pci: Fix __pcilg_mio_inuser() inline assembly
503f1c72c31bbee21e669a08cf65c49e96d42755 i40e: fix Jumbo Frame support after iPXE boot
9969779d0803f5dcd4460ae7aca2bc3fd91bff12 Documentation/hw-vuln: Add VMSCAPE documentation
a508cec6e5215a3fbc7e73ae86a5c5602187934d x86/vmscape: Enumerate VMSCAPE bug
2f8f173413f1cbf52660d04df92d0069c4306d25 x86/vmscape: Add conditional IBPB mitigation
556c1ad666ad90c50ec8fccb930dd5046cfbecfb x86/vmscape: Enable the mitigation
6449f5baf9c78a7a442d64f4a61378a21c5db113 x86/bugs: Move cpu_bugs_smt_update() down
b7cc9887231526ca4fa89f3fa4119e47c2dc7b1e x86/vmscape: Warn when STIBP is disabled with SMT
8a68d64bb10334426834e8c273319601878e961e x86/vmscape: Add old Intel CPUs to affected list
2e488f13755ffbb60f307e991b27024716a33b29 fs: fix UAF/GPF bug in nilfs_mdt_destroy

__CHANGES NOT IN UPSTREAM________________
Adding prod certs and changed cert date to 20210620
Adding Rocky secure boot certs
Fixing vmlinuz removal
Fixing UEFI CA path
Porting to 8.10, debranding and Rocky branding
Fixing pesign_key_name values
redhat/configs: Enable CONFIG_MITIGATION_VMSCAPE for x86_64
fanotify: add watchdog for permission events

BUILD

[jmaple@devbox code]$ egrep -B 5 -A 5 "\[TIMER\]|^Starting Build" $(ls -t kbuild* | head -n1)
/mnt/code/kernel-src-tree-build
Running make mrproper...
  CLEAN   scripts/basic
  CLEAN   scripts/kconfig
[TIMER]{MRPROPER}: 5s
x86_64 architecture detected, copying config
'configs/kernel-x86_64.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-rocky8_10_rebuild-d6bebad94beb"
Making olddefconfig
--
  HOSTLD  scripts/kconfig/conf
scripts/kconfig/conf  --olddefconfig Kconfig
#
# configuration written to .config
#
Starting Build
scripts/kconfig/conf  --syncconfig Kconfig
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_64_x32.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_64.h
--
  LD [M]  sound/usb/usx2y/snd-usb-usx2y.ko
  LD [M]  sound/virtio/virtio_snd.ko
  LD [M]  sound/x86/snd-hdmi-lpe-audio.ko
  LD [M]  sound/xen/snd_xen_front.ko
  LD [M]  virt/lib/irqbypass.ko
[TIMER]{BUILD}: 1435s
Making Modules
  INSTALL arch/x86/crypto/blowfish-x86_64.ko
  INSTALL arch/x86/crypto/camellia-aesni-avx-x86_64.ko
  INSTALL arch/x86/crypto/camellia-aesni-avx2.ko
  INSTALL arch/x86/crypto/camellia-x86_64.ko
--
  INSTALL sound/virtio/virtio_snd.ko
  INSTALL sound/x86/snd-hdmi-lpe-audio.ko
  INSTALL sound/xen/snd_xen_front.ko
  INSTALL virt/lib/irqbypass.ko
  DEPMOD  4.18.0-rocky8_10_rebuild-d6bebad94beb+
[TIMER]{MODULES}: 11s
Making Install
sh ./arch/x86/boot/install.sh 4.18.0-rocky8_10_rebuild-d6bebad94beb+ arch/x86/boot/bzImage \
        System.map "/boot"
[TIMER]{INSTALL}: 28s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-4.18.0-rocky8_10_rebuild-d6bebad94beb+ and Index to 2
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 5s
[TIMER]{BUILD}: 1435s
[TIMER]{MODULES}: 11s
[TIMER]{INSTALL}: 28s
[TIMER]{TOTAL} 1484s
Rebooting in 10 seconds

KSelfTest

[jmaple@devbox code]$ ~/workspace/auto_kernel_history_rebuild/Rocky10/rocky10/code/get_kselftest_diff.sh
kselftest.4.18.0-rocky8_10_rebuild-baea35f64da5+.log
207
kselftest.4.18.0-rocky8_10_rebuild-99b4f48215a2+.log
207
kselftest.4.18.0-rocky8_10_rebuild-48e11f31ca38+.log
207
kselftest.4.18.0-rocky8_10_rebuild-d6bebad94beb+.log
207
Before: kselftest.4.18.0-rocky8_10_rebuild-48e11f31ca38+.log
After: kselftest.4.18.0-rocky8_10_rebuild-d6bebad94beb+.log
Diff:
No differences found.

jira LE-4704
cve CVE-2023-53178
Rebuild_History Non-Buildable kernel-4.18.0-553.83.1.el8_10
commit-author Domenico Cerasuolo <cerasuolodomenico@gmail.com>
commit 04fc781

The zswap writeback mechanism can cause a race condition resulting in
memory corruption, where a swapped out page gets swapped in with data that
was written to a different page.

The race unfolds like this:
1. a page with data A and swap offset X is stored in zswap
2. page A is removed off the LRU by zpool driver for writeback in
   zswap-shrink work, data for A is mapped by zpool driver
3. user space program faults and invalidates page entry A, offset X is
   considered free
4. kswapd stores page B at offset X in zswap (zswap could also be
   full, if so, page B would then be IOed to X, then skip step 5.)
5. entry A is replaced by B in tree->rbroot, this doesn't affect the
   local reference held by zswap-shrink work
6. zswap-shrink work writes back A at X, and frees zswap entry A
7. swapin of slot X brings A in memory instead of B

The fix:
Once the swap page cache has been allocated (case ZSWAP_SWAPCACHE_NEW),
zswap-shrink work just checks that the local zswap_entry reference is
still the same as the one in the tree.  If it's not the same it means that
it's either been invalidated or replaced, in both cases the writeback is
aborted because the local entry contains stale data.

Reproducer:
I originally found this by running `stress` overnight to validate my work
on the zswap writeback mechanism, it manifested after hours on my test
machine.  The key to make it happen is having zswap writebacks, so
whatever setup pumps /sys/kernel/debug/zswap/written_back_pages should do
the trick.

In order to reproduce this faster on a vm, I setup a system with ~100M of
available memory and a 500M swap file, then running `stress --vm 1
--vm-bytes 300000000 --vm-stride 4000` makes it happen in matter of tens
of minutes.  One can speed things up even more by swinging
/sys/module/zswap/parameters/max_pool_percent up and down between, say, 20
and 1; this makes it reproduce in tens of seconds.  It's crucial to set
`--vm-stride` to something other than 4096 otherwise `stress` won't
realize that memory has been corrupted because all pages would have the
same data.

Link: https://lkml.kernel.org/r/20230503151200.19707-1-cerasuolodomenico@gmail.com
	Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
	Acked-by: Johannes Weiner <hannes@cmpxchg.org>
	Reviewed-by: Chris Li (Google) <chrisl@kernel.org>
	Cc: Dan Streetman <ddstreet@ieee.org>
	Cc: Johannes Weiner <hannes@cmpxchg.org>
	Cc: Minchan Kim <minchan@kernel.org>
	Cc: Nitin Gupta <ngupta@vflare.org>
	Cc: Seth Jennings <sjenning@redhat.com>
	Cc: Vitaly Wool <vitaly.wool@konsulko.com>
	Cc: <stable@vger.kernel.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 04fc781)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4704
cve CVE-2023-53178
Rebuild_History Non-Buildable kernel-4.18.0-553.83.1.el8_10
commit-author Yosry Ahmed <yosryahmed@google.com>
commit e3b63e9
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.83.1.el8_10/e3b63e96.failed

In zswap_writeback_entry(), after we get a folio from
__read_swap_cache_async(), we grab the tree lock again to check that the
swap entry was not invalidated and recycled.  If it was, we delete the
folio we just added to the swap cache and exit.

However, __read_swap_cache_async() returns the folio locked when it is
newly allocated, which is always true for this path, and the folio is
ref'd.  Make sure to unlock and put the folio before returning.

This was discovered by code inspection, probably because this path handles
a race condition that should not happen often, and the bug would not crash
the system, it will only strand the folio indefinitely.

Link: https://lkml.kernel.org/r/20240125085127.1327013-1-yosryahmed@google.com
Fixes: 04fc781 ("mm: fix zswap writeback race condition")
	Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
	Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
	Acked-by: Johannes Weiner <hannes@cmpxchg.org>
	Reviewed-by: Nhat Pham <nphamcs@gmail.com>
	Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
	Cc: <stable@vger.kernel.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit e3b63e9)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	mm/zswap.c
jira LE-4704
Rebuild_History Non-Buildable kernel-4.18.0-553.83.1.el8_10
commit-author Heiko Carstens <hca@linux.ibm.com>
commit c4abe62
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.83.1.el8_10/c4abe623.failed

Use "a" constraint for the shift operand of the __pcilg_mio_inuser() inline
assembly. The used "d" constraint allows the compiler to use any general
purpose register for the shift operand, including register zero.

If register zero is used this my result in incorrect code generation:

 8f6:   a7 0a ff f8             ahi     %r0,-8
 8fa:   eb 32 00 00 00 0c       srlg    %r3,%r2,0  <----

If register zero is selected to contain the shift value, the srlg
instruction ignores the contents of the register and always shifts zero
bits. Therefore use the "a" constraint which does not permit to select
register zero.

Fixes: f058599 ("s390/pci: Fix s390_mmio_read/write with MIO")
	Cc: stable@vger.kernel.org
	Reported-by: Niklas Schnelle <schnelle@linux.ibm.com>
	Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
	Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
(cherry picked from commit c4abe62)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	arch/s390/pci/pci_mmio.c
jira LE-4704
Rebuild_History Non-Buildable kernel-4.18.0-553.83.1.el8_10
commit-author Easwar Hariharan <eahariha@linux.microsoft.com>
commit b35108a

secs_to_jiffies() is defined in hci_event.c and cannot be reused by
other call sites. Hoist it into the core code to allow conversion of the
~1150 usages of msecs_to_jiffies() that either:

 - use a multiplier value of 1000 or equivalently MSEC_PER_SEC, or
 - have timeouts that are denominated in seconds (i.e. end in 000)

It's implemented as a macro to allow usage in static initializers.

This will also allow conversion of yet more sites that use (sec * HZ)
directly, and improve their readability.

	Suggested-by: Michael Kelley <mhklinux@outlook.com>
	Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com>
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
	Reviewed-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Link: https://lore.kernel.org/all/20241030-open-coded-timeouts-v3-1-9ba123facf88@linux.microsoft.com
(cherry picked from commit b35108a)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4704
Rebuild_History Non-Buildable kernel-4.18.0-553.83.1.el8_10
commit-author Easwar Hariharan <eahariha@linux.microsoft.com>
commit bb2784d

While converting users of msecs_to_jiffies(), lkp reported that some range
checks would always be true because of the mismatch between the implied int
value of secs_to_jiffies() vs the unsigned long return value of the
msecs_to_jiffies() calls it was replacing.

Fix this by casting the secs_to_jiffies() input value to unsigned long.

Fixes: b35108a ("jiffies: Define secs_to_jiffies()")
	Reported-by: kernel test robot <lkp@intel.com>
	Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com>
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
	Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250130192701.99626-1-eahariha@linux.microsoft.com
Closes: https://lore.kernel.org/oe-kbuild-all/202501301334.NB6NszQR-lkp@intel.com/
(cherry picked from commit bb2784d)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4704
Rebuild_History Non-Buildable kernel-4.18.0-553.83.1.el8_10
commit-author Manish Chopra <manishc@marvell.com>
commit 42510df

Statistics read through bond interface via sysfs causes
below bug and traces as it triggers the bonding module to
collect the slave device statistics while holding the spinlock,
beneath that qede->qed driver statistics flow gets scheduled out
due to usleep_range() used in PTT acquire logic

[ 3673.988874] Hardware name: HPE ProLiant DL365 Gen10 Plus/ProLiant DL365 Gen10 Plus, BIOS A42 10/29/2021
[ 3673.988878] Call Trace:
[ 3673.988891]  dump_stack_lvl+0x34/0x44
[ 3673.988908]  __schedule_bug.cold+0x47/0x53
[ 3673.988918]  __schedule+0x3fb/0x560
[ 3673.988929]  schedule+0x43/0xb0
[ 3673.988932]  schedule_hrtimeout_range_clock+0xbf/0x1b0
[ 3673.988937]  ? __hrtimer_init+0xc0/0xc0
[ 3673.988950]  usleep_range+0x5e/0x80
[ 3673.988955]  qed_ptt_acquire+0x2b/0xd0 [qed]
[ 3673.988981]  _qed_get_vport_stats+0x141/0x240 [qed]
[ 3673.989001]  qed_get_vport_stats+0x18/0x80 [qed]
[ 3673.989016]  qede_fill_by_demand_stats+0x37/0x400 [qede]
[ 3673.989028]  qede_get_stats64+0x19/0xe0 [qede]
[ 3673.989034]  dev_get_stats+0x5c/0xc0
[ 3673.989045]  netstat_show.constprop.0+0x52/0xb0
[ 3673.989055]  dev_attr_show+0x19/0x40
[ 3673.989065]  sysfs_kf_seq_show+0x9b/0xf0
[ 3673.989076]  seq_read_iter+0x120/0x4b0
[ 3673.989087]  new_sync_read+0x118/0x1a0
[ 3673.989095]  vfs_read+0xf3/0x180
[ 3673.989099]  ksys_read+0x5f/0xe0
[ 3673.989102]  do_syscall_64+0x3b/0x90
[ 3673.989109]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3673.989115] RIP: 0033:0x7f8467d0b082
[ 3673.989119] Code: c0 e9 b2 fe ff ff 50 48 8d 3d ca 05 08 00 e8 35 e7 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[ 3673.989121] RSP: 002b:00007ffffb21fd08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 3673.989127] RAX: ffffffffffffffda RBX: 000000000100eca0 RCX: 00007f8467d0b082
[ 3673.989128] RDX: 00000000000003ff RSI: 00007ffffb21fdc0 RDI: 0000000000000003
[ 3673.989130] RBP: 00007f8467b96028 R08: 0000000000000010 R09: 00007ffffb21ec00
[ 3673.989132] R10: 00007ffffb27b170 R11: 0000000000000246 R12: 00000000000000f0
[ 3673.989134] R13: 0000000000000003 R14: 00007f8467b92000 R15: 0000000000045a05
[ 3673.989139] CPU: 30 PID: 285188 Comm: read_all Kdump: loaded Tainted: G        W  OE

Fix this by collecting the statistics asynchronously from a periodic
delayed work scheduled at default stats coalescing interval and return
the recent copy of statisitcs from .ndo_get_stats64(), also add ability
to configure/retrieve stats coalescing interval using below commands -

ethtool -C ethx stats-block-usecs <val>
ethtool -c ethx

Fixes: 133fac0 ("qede: Add basic ethtool support")
	Cc: Sudarsana Kalluru <skalluru@marvell.com>
	Cc: David Miller <davem@davemloft.net>
	Signed-off-by: Manish Chopra <manishc@marvell.com>
Link: https://lore.kernel.org/r/20230605112600.48238-1-manishc@marvell.com
	Signed-off-by: Paolo Abeni <pabeni@redhat.com>
(cherry picked from commit 42510df)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4704
Rebuild_History Non-Buildable kernel-4.18.0-553.83.1.el8_10
commit-author Kumar Kartikeya Dwivedi <memxor@gmail.com>
commit cb0f800

cpumap needs to set, clear, and test the lowest bit in skb pointer in
various places. To make these checks less noisy, add pointer friendly
bitop macros that also do some typechecking to sanitize the argument.

These wrap the non-atomic bitops __set_bit, __clear_bit, and test_bit
but for pointer arguments. Pointer's address has to be passed in and it
is treated as an unsigned long *, since width and representation of
pointer and unsigned long match on targets Linux supports. They are
prefixed with double underscore to indicate lack of atomicity.

	Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
	Signed-off-by: Alexei Starovoitov <ast@kernel.org>
	Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210702111825.491065-3-memxor@gmail.com
(cherry picked from commit cb0f800)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4704
Rebuild_History Non-Buildable kernel-4.18.0-553.83.1.el8_10
commit-author Luo Jie <quic_luoj@quicinc.com>
commit a256ae2

Add a helper for replacing the contents of bitfield in memory
with the specified value.

Even though a helper xxx_replace_bits() is available, it is not
well documented, and only reports errors at the run time, which
will not be helpful to catch possible overflow errors due to
incorrect parameter types used.

FIELD_MODIFY(REG_FIELD_C, &reg, c) is the wrapper to the code below.

	reg &= ~REG_FIELD_C;
	reg |= FIELD_PREP(REG_FIELD_C, c);

Yury: trim commit message, align backslashes.

	Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
	Signed-off-by: Yury Norov <yury.norov@gmail.com>
(cherry picked from commit a256ae2)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4704
Rebuild_History Non-Buildable kernel-4.18.0-553.83.1.el8_10
commit-author Ivan Vecera <ivecera@redhat.com>
commit 7d9f22b

Commit 3a2c6ce ("i40e: Add a check to see if MFS is set") added
a warning message that reports unexpected size of port's MFS (max
frame size) value. This message use for the port number local
variable 'i' that is wrong.
In i40e_probe() this 'i' variable is used only to iterate VSIs
to find FDIR VSI:

<code>
...
/* if FDIR VSI was set up, start it now */
        for (i = 0; i < pf->num_alloc_vsi; i++) {
                if (pf->vsi[i] && pf->vsi[i]->type == I40E_VSI_FDIR) {
                        i40e_vsi_open(pf->vsi[i]);
                        break;
                }
        }
...
</code>

So the warning message use for the port number index of FDIR VSI
if this exists or pf->num_alloc_vsi if not.

Fix the message by using 'pf->hw.port' for the port number.

Fixes: 3a2c6ce ("i40e: Add a check to see if MFS is set")
	Signed-off-by: Ivan Vecera <ivecera@redhat.com>
	Reviewed-by: Simon Horman <horms@kernel.org>
	Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
(cherry picked from commit 7d9f22b)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4704
Rebuild_History Non-Buildable kernel-4.18.0-553.83.1.el8_10
commit-author Erwan Velu <e.velu@criteo.com>
commit ef3c313

If the MFS is set below the default (0x2600), a warning message is
reported like the following :

	MFS for port 1 has been set below the default: 600

This message is a bit confusing as the number shown here (600) is in
fact an hexa number: 0x600 = 1536

Without any explicit "0x" prefix, this message is read like the MFS is
set to 600 bytes.

MFS, as per MTUs, are usually expressed in decimal base.

This commit reports both current and default MFS values in decimal
so it's less confusing for end-users.

A typical warning message looks like the following :

	MFS for port 1 (1536) has been set below the default (9728)

	Signed-off-by: Erwan Velu <e.velu@criteo.com>
	Reviewed-by: Simon Horman <horms@kernel.org>
	Tested-by: Tony Brelinski <tony.brelinski@intel.com>
	Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fixes: 3a2c6ce ("i40e: Add a check to see if MFS is set")
Link: https://lore.kernel.org/r/20240423182723.740401-3-anthony.l.nguyen@intel.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit ef3c313)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4704
Rebuild_History Non-Buildable kernel-4.18.0-553.83.1.el8_10
commit-author Jacob Keller <jacob.e.keller@intel.com>
commit 503f1c7
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.83.1.el8_10/503f1c72.failed

The i40e hardware has multiple hardware settings which define the Maximum
Frame Size (MFS) of the physical port. The firmware has an AdminQ command
(0x0603) to configure the MFS, but the i40e Linux driver never issues this
command.

In most cases this is no problem, as the NVM default value has the device
configured for its maximum value of 9728. Unfortunately, recent versions of
the iPXE intelxl driver now issue the 0x0603 Set Mac Config command,
modifying the MFS and reducing it from its default value of 9728.

This occurred as part of iPXE commit 6871a7de705b ("[intelxl] Use admin
queue to set port MAC address and maximum frame size"), a prerequisite
change for supporting the E800 series hardware in iPXE. Both the E700 and
E800 firmware support the AdminQ command, and the iPXE code shares much of
the logic between the two device drivers.

The ice E800 Linux driver already issues the 0x0603 Set Mac Config command
early during probe, and is thus unaffected by the iPXE change.

Since commit 3a2c6ce ("i40e: Add a check to see if MFS is set"), the
i40e driver does check the I40E_PRTGL_SAH register, but it only logs a
warning message if its value is below the 9728 default. This register also
only covers received packets and not transmitted packets. A warning can
inform system administrators, but does not correct the issue. No
interactions from userspace cause the driver to write to PRTGL_SAH or issue
the 0x0603 AdminQ command. Only a GLOBR reset will restore the value to its
default value. There is no obvious method to trigger a GLOBR reset from
user space.

To fix this, introduce the i40e_aq_set_mac_config() function, similar to
the one from the ice driver. Call this during early probe to ensure that
the device configuration matches driver expectation. Unlike E800, the E700
firmware also has a bit to control whether the MAC should append CRC data.
It is on by default, but setting a 0 to this bit would disable CRC. The
i40e implementation must set this bit to ensure CRC will be appended by the
MAC.

In addition to the AQ command, instead of just checking the I40E_PRTGL_SAH
register, update its value to the 9728 default and write it back. This
ensures that the hardware is in the expected state, regardless of whether
the iPXE (or any other early boot driver) has modified this state.

This is a better user experience, as we now fix the issues with larger MTU
instead of merely warning. It also aligns with the way the ice E800 series
driver works.

A final note: The Fixes tag provided here is not strictly accurate. The
issue occurs as a result of an external entity (the iPXE intelxl driver),
and this is not a regression specifically caused by the mentioned change.
However, I believe the original change to just warn about PRTGL_SAH being
too low was an insufficient fix.

Fixes: 3a2c6ce ("i40e: Add a check to see if MFS is set")
Link: ipxe/ipxe@6871a7d
	Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
	Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
	Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
	Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
	Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
(cherry picked from commit 503f1c7)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	drivers/net/ethernet/intel/i40e/i40e_main.c
jira LE-4704
cve CVE-2025-40300
Rebuild_History Non-Buildable kernel-4.18.0-553.83.1.el8_10
commit-author Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
commit 9969779
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.83.1.el8_10/9969779d.failed

VMSCAPE is a vulnerability that may allow a guest to influence the branch
prediction in host userspace, particularly affecting hypervisors like QEMU.

Add the documentation.

	Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
	Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
	Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
	Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
(cherry picked from commit 9969779)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	Documentation/admin-guide/hw-vuln/index.rst
jira LE-4704
cve CVE-2025-40300
Rebuild_History Non-Buildable kernel-4.18.0-553.83.1.el8_10
commit-author Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
commit a508cec
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.83.1.el8_10/a508cec6.failed

The VMSCAPE vulnerability may allow a guest to cause Branch Target
Injection (BTI) in userspace hypervisors.

Kernels (both host and guest) have existing defenses against direct BTI
attacks from guests. There are also inter-process BTI mitigations which
prevent processes from attacking each other. However, the threat in this
case is to a userspace hypervisor within the same process as the attacker.

Userspace hypervisors have access to their own sensitive data like disk
encryption keys and also typically have access to all guest data. This
means guest userspace may use the hypervisor as a confused deputy to attack
sensitive guest kernel data. There are no existing mitigations for these
attacks.

Introduce X86_BUG_VMSCAPE for this vulnerability and set it on affected
Intel and AMD CPUs.

	Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
	Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
	Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
(cherry picked from commit a508cec)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	arch/x86/include/asm/cpufeatures.h
#	arch/x86/kernel/cpu/common.c
jira LE-4704
cve CVE-2025-40300
Rebuild_History Non-Buildable kernel-4.18.0-553.83.1.el8_10
commit-author Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
commit 2f8f173
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.83.1.el8_10/2f8f1734.failed

VMSCAPE is a vulnerability that exploits insufficient branch predictor
isolation between a guest and a userspace hypervisor (like QEMU). Existing
mitigations already protect kernel/KVM from a malicious guest. Userspace
can additionally be protected by flushing the branch predictors after a
VMexit.

Since it is the userspace that consumes the poisoned branch predictors,
conditionally issue an IBPB after a VMexit and before returning to
userspace. Workloads that frequently switch between hypervisor and
userspace will incur the most overhead from the new IBPB.

This new IBPB is not integrated with the existing IBPB sites. For
instance, a task can use the existing speculation control prctl() to
get an IBPB at context switch time. With this implementation, the
IBPB is doubled up: one at context switch and another before running
userspace.

The intent is to integrate and optimize these cases post-embargo.

[ dhansen: elaborate on suboptimal IBPB solution ]

	Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
	Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
	Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
	Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
	Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
	Acked-by: Sean Christopherson <seanjc@google.com>
(cherry picked from commit 2f8f173)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	arch/x86/include/asm/cpufeatures.h
#	arch/x86/include/asm/entry-common.h
#	arch/x86/include/asm/nospec-branch.h
jira LE-4704
cve CVE-2025-40300
Rebuild_History Non-Buildable kernel-4.18.0-553.83.1.el8_10
commit-author Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
commit 556c1ad
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.83.1.el8_10/556c1ad6.failed

Enable the previously added mitigation for VMscape. Add the cmdline
vmscape={off|ibpb|force} and sysfs reporting.

	Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
	Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
	Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
	Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
(cherry picked from commit 556c1ad)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	Documentation/admin-guide/kernel-parameters.txt
#	arch/x86/Kconfig
#	arch/x86/kernel/cpu/bugs.c
#	drivers/base/cpu.c
#	include/linux/cpu.h
jira LE-4704
cve CVE-2025-40300
Rebuild_History Non-Buildable kernel-4.18.0-553.83.1.el8_10
commit-author Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
commit 6449f5b
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.83.1.el8_10/6449f5ba.failed

cpu_bugs_smt_update() uses global variables from different mitigations. For
SMT updates it can't currently use vmscape_mitigation that is defined after
it.

Since cpu_bugs_smt_update() depends on many other mitigations, move it
after all mitigations are defined. With that, it can use vmscape_mitigation
in a moment.

No functional change.

	Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
	Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
	Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
(cherry picked from commit 6449f5b)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	arch/x86/kernel/cpu/bugs.c
jira LE-4704
cve CVE-2025-40300
Rebuild_History Non-Buildable kernel-4.18.0-553.83.1.el8_10
commit-author Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
commit b7cc988
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.83.1.el8_10/b7cc9887.failed

Cross-thread attacks are generally harder as they require the victim to be
co-located on a core. However, with VMSCAPE the adversary targets belong to
the same guest execution, that are more likely to get co-located. In
particular, a thread that is currently executing userspace hypervisor
(after the IBPB) may still be targeted by a guest execution from a sibling
thread.

Issue a warning about the potential risk, except when:

- SMT is disabled
- STIBP is enabled system-wide
- Intel eIBRS is enabled (which implies STIBP protection)

	Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
	Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
(cherry picked from commit b7cc988)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	arch/x86/kernel/cpu/bugs.c
jira LE-4704
cve CVE-2025-40300
Rebuild_History Non-Buildable kernel-4.18.0-553.83.1.el8_10
commit-author Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
commit 8a68d64
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.83.1.el8_10/8a68d64b.failed

These old CPUs are not tested against VMSCAPE, but are likely vulnerable.

	Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
	Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
(cherry picked from commit 8a68d64)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	arch/x86/kernel/cpu/common.c
jira LE-4704
cve CVE-2022-50367
Rebuild_History Non-Buildable kernel-4.18.0-553.83.1.el8_10
commit-author Dongliang Mu <mudongliangabcd@gmail.com>
commit 2e488f1
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.83.1.el8_10/2e488f13.failed

In alloc_inode, inode_init_always() could return -ENOMEM if
security_inode_alloc() fails, which causes inode->i_private
uninitialized. Then nilfs_is_metadata_file_inode() returns
true and nilfs_free_inode() wrongly calls nilfs_mdt_destroy(),
which frees the uninitialized inode->i_private
and leads to crashes(e.g., UAF/GPF).

Fix this by moving security_inode_alloc just prior to
this_cpu_inc(nr_inodes)

Link: https://lkml.kernel.org/r/CAFcO6XOcf1Jj2SeGt=jJV59wmhESeSKpfR0omdFRq+J9nD1vfQ@mail.gmail.com
	Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
	Reported-by: Hao Sun <sunhao.th@gmail.com>
	Reported-by: Jiacheng Xu <stitch@zju.edu.cn>
	Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
	Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
	Cc: Al Viro <viro@zeniv.linux.org.uk>
	Cc: stable@vger.kernel.org
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit 2e488f1)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	fs/inode.c
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..kernel-mainline: 567757
Number of commits in rpm: 27
Number of commits matched with upstream: 19 (70.37%)
Number of commits in upstream but not in rpm: 567738
Number of commits NOT found in upstream: 8 (29.63%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.83.1.el8_10 for kernel-4.18.0-553.83.1.el8_10
Clean Cherry Picks: 8 (42.11%)
Empty Cherry Picks: 11 (57.89%)
_______________________________

Full Details Located here:
ciq/ciq_backports/kernel-4.18.0-553.83.1.el8_10/rebuild.details.txt

Includes:
* git commit header above
* Empty Commits with upstream SHA
* RPM ChangeLog Entries that could not be matched

Individual Empty Commit failures contained in the same containing directory.
The git message for empty commits will have the path for the failed commit.
File names are the first 8 characters of the upstream SHA
@PlaidCat PlaidCat requested a review from a team November 12, 2025 14:19
@PlaidCat PlaidCat self-assigned this Nov 12, 2025
Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

Copy link

@jdieter jdieter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@PlaidCat PlaidCat merged commit d6bebad into rocky8_10 Nov 12, 2025
2 checks passed
@PlaidCat PlaidCat deleted the rocky8_10_rebuild branch November 12, 2025 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants