Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

12.1 ufs31 #1

Merged
merged 26 commits into from
Apr 21, 2022
Merged

12.1 ufs31 #1

merged 26 commits into from
Apr 21, 2022

Conversation

goodmeow
Copy link

No description provided.

goodmeow and others added 26 commits April 20, 2022 21:38
*** These codes will do nothing without turning on associated CONFIGs ***

This change contains the drivers to support the Turbo Write/Write Booster
feature and the Host Performance Booster feature on Samsung 256GB UFS 3.0
device KLUEG8UHDB-C2D1 with FW 0900.

Please be aware that JEDEC "official" support for these features is in
their UFS 3.1 spec JESD220E, and that is not 100% compatible with Samsung's
implementation for their UFS 2.1 and UFS 3.0 parts.

Reference:
   * mt_sm8250_H2.2.1_T1.1.0.patch, Samsung
   * UFS TURBO WRITE SPECIFICATION V1.0.1G Rev1, Samsung
   * UFS HPB SPECIFICATION V2.0.0G Rev1, Samsung
   * "13.4.17 Turbo Write", UFS 3.1 spec JESD220E, JEDEC

Change-Id: I2cd1e202f804b0b462459651df8f19e823a30c88
Signed-off-by: Qing Chang <a21550@motorola.com>
Reviewed-on: https://gerrit.mot.com/1489722
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Ling Jin <lingjin@motorola.com>
Submit-Approved: Jira Key
With Samsung's default configuration, tw_mode will be 1 (manual mode),
and we need to manually write to the following sysfs nodes to turn on
TW.

/sys/devices/platform/soc/1d84000.ufshc/ufstw_lu4/tw_enable
/sys/devices/platform/soc/1d84000.ufshc/ufstw_lu4/flush_during_hibern_enter

This change makes it possible that the above could be done "automatically"
during early driver initialization so that things are getting earsier.

Change-Id: I772c44cc52b3c4903103d444d686fc9ec488866f
Signed-off-by: Qing Chang <a21550@motorola.com>
Reviewed-on: https://gerrit.mot.com/1493425
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Qing Chang <qing@motorola.com>
Submit-Approved: Jira Key
Samsung recommended to block hibernation using pm_runtime_get_sync()
while TW is being turned on during power up.

Change-Id: I01b6b407dbdbea181bc7544b200e2009694ae830
Signed-off-by: Qing Chang <a21550@motorola.com>
Reviewed-on: https://gerrit.mot.com/1497772
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Qing Chang <qing@motorola.com>
Submit-Approved: Jira Key
This patch is from Samsung during joint validation for the Turbo
Write and Host Performance Booster features on UFS 3.0 parts.

If fTurboWriteBufferFlushDuringHibnerEnter flag is set
and the value of bAvailbleTurboWriteBufferSize is less than
UFSTW_AVAIL_BUF_SIZE_DISABLE_LPM(defined in ufstw.h),
LPM mode will be disabled on runtime suspend.
In addtion, if not runtime suspend(shutdown and system suspend),
fTurboWriteBufferFlushDuringHibnerEnter flag will be disabled in
suspend routines.

Change-Id: I344cb57f994a8893d400a1470eecb3e35fe0dd5d
Signed-off-by: Qing Chang <a21550@motorola.com>
Reviewed-on: https://gerrit.mot.com/1498481
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Qing Chang <qing@motorola.com>
Submit-Approved: Jira Key
This is caused by qcom change:
+++
2ceec83 Revert existing FBE changes for ICE FBE
+++
Signed-off-by: a7301c <a7301c@motorola.com>

Change-Id: I67de012e5ddf8cb4b74c46402422433eceaf6e28
Signed-off-by: a7301c <a7301c@motorola.com>
enable turbo write feature for hynix by skipping version checking. hynix
turbo write feature version is zero, so just skip checking version.

Change-Id: I1e1125bb99ca6410025ea9c5d458b8ec41b69cd8
Signed-off-by: yuedl1 <yuedl1@mt.com>
Reviewed-on: https://gerrit.mot.com/1499133
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Qing Chang <qing@motorola.com>
Reviewed-by: Shi-Yong Li <a22381@motorola.com>
Submit-Approved: Jira Key
when devfreq ufs, ufshcd wait doorbell clear timeout. some operations
maybe take time longer than 1s. such as unmap operation. so change
the timeout to 2s. error log as bellow:
01-07 09:50:34.508     0     0 E ufshcd-qcom 1d84000.ufshc: ufshcd_wait_for_doorbell_clr: timedout waiting for doorbell to clear (tm=0x0, tr=0x2000000)
01-07 09:50:34.508     0     0 E devfreq 1d84000.ufshc: dvfs failed with (-16) error

Change-Id: Ie33481603a6f96c710e0180bfdd61b7d88eb003b
Signed-off-by: yuedl1 <yuedl1@mt.com>
Reviewed-on: https://gerrit.mot.com/1554341
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Chuanchun Zhu <zhucc2@motorola.com>
Reviewed-by: Huosheng Liao <liaohs@motorola.com>
Reviewed-by: Shi-Yong Li <a22381@motorola.com>
Submit-Approved: Jira Key
Add HPB 1.0 driver from Toshiba Kioxia
Since it is a device controlled mode HPB,
Flag needs to be added to coexist with
Current Samsung HPB/TW driver

Reference:
	M1CCL04-118_HPB10_DeviceControlMode.zip KioXia

Change-Id: I46a31e0c2ff9b6da9d23638f398f2224436dc8b8
Signed-off-by: a17671 <a17671@motorola.com>
Reviewed-on: https://gerrit.mot.com/1755653
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Wei Deng <dengwei1@motorola.com>
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key
R introduces the new inline encrypted f2fs feature
KioXia HPB driver has compatible issue with QCOM ICE
The HPB READ 0xF8 command is not recognized by QCOM ICE
ICE then return all the encrypted data so F2FS checking will fail
Switching to READ_16 so QCOM ICE will decrypt the data in the fly

Change-Id: I063d0fa7fa3481b6d0f8c27488cba1e6927ca96a
Signed-off-by: a17671 <a17671@motorola.com>
Reviewed-on: https://gerrit.mot.com/1798575
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key
In the phones with KioXia 256G
READ_16 is used for large LBA
Current KioXia HPB driver altered the
READ_16 Address with different meaning
Thus filter it from HPB to unblock the testing

Change-Id: I217a23740d0252b1fad910a2665ed7abd2e06c97
Signed-off-by: a17671 <a17671@motorola.com>
Reviewed-on: https://gerrit.mot.com/1801148
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key
KioXia has two FW versions
Only 1001 version supports HPB

Change-Id: I556f365ae627919cddab48076368e7de2465fa67
Signed-off-by: a17671 <a17671@motorola.com>
Reviewed-on: https://gerrit.mot.com/1803405
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Wang Wang <wangwang1@mt.com>
Submit-Approved: Jira Key
In Samsung mt_SM8250_DD_H261_T130.patch
HID/RINGBUFFER POC are removed

Change-Id: Ia9a8b5506b14815fc95c8bc9e174b90c71156eae
Signed-off-by: a17671 <a17671@motorola.com>
Reviewed-on: https://gerrit.mot.com/1804333
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Wei Deng <dengwei1@motorola.com>
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key
Integrate New Samsung UFS3.1 driver
Support HPB/TW/HID/RINGBUFFER

Change-Id: Iea9c787b49c563844859ffdc9a7bc7528268cdf3
Signed-off-by: a17671 <a17671@motorola.com>
Reviewed-on: https://gerrit.mot.com/1773951
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Wei Deng <dengwei1@motorola.com>
Reviewed-by: Qing Chang <qing@motorola.com>
Submit-Approved: Jira Key
root cause from Samsung:
Interrupt context called blk_get_request when task_lock is used.
To avoid this issue, we added another block layer API that
Does not create io_context. i
io_context is related to task that creates IO,
so this change can prevent irrelevent io_context to be created.
Operating request is no problem without io_context as well.

Change-Id: Idecd9fcc7af1c49742a401c0b52511c930325422
Signed-off-by: a17671 <a17671@motorola.com>
Reviewed-on: https://gerrit.mot.com/1820427
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key
KioXia HPB driver did not handle the suspend/resume
Cases, if the phone and the UFS device enter suspend mode
While HPB driver still sends read buffer command
To UFS device, it may not get response from UFS device
And sometimes breaks the block layer elevator algorithm
Then all the other bio request could get blocked

Change-Id: I5b61c6c1ab0ffb1cf841cd35a80138dadda0d91b
Signed-off-by: a17671 <a17671@motorola.com>
Reviewed-on: https://gerrit.mot.com/1825352
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key
Samsung add the RQF_PM flag
To avoid the possible elevator break
In case of the Suspending rpm_status

Change-Id: I82a3a5cb5406a851ba7c7c3503b5819c2b002413
Signed-off-by: a17671 <a17671@motorola.com>
Reviewed-on: https://gerrit.mot.com/1826131
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Wang Wang <wangwang1@mt.com>
Submit-Approved: Jira Key
Current block layer elevator may be stuck
If PM is in the suspending status. QCOM
Gave a fix in SR 05006893, their formal CR
Is 2852181

Change-Id: Ieb5bbfebefa2d497291263d900f2c8cc186fe8d7
Signed-off-by: a17671 <a17671@motorola.com>
Reviewed-on: https://gerrit.mot.com/1852288
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key
Samsung gave another HPB driver patch
To fix the stack overflow issue caused by
Recurssive add pre-request into the queue

Change-Id: I03e9f72219cd2a738d577e7adee4ab302ce120d8
Signed-off-by: a17671 <a17671@motorola.com>
Reviewed-on: https://gerrit.mot.com/1835172
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key
Samsung HPB driver has unblanced pm operations
Which cause the nr_pending negative value
This will have impact of runtime pm management

Change-Id: Ie21a02b3c92cb2495e0be9db0885f275ecfbcfb5
Signed-off-by: a17671 <a17671@motorola.com>
Reviewed-on: https://gerrit.mot.com/1890611
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key
During error scenarios, while error handler is
in middle of resetting the host controller, we
might get a line reset which may result isr handler
trying to access UFS register. But since controller
is in reset state, SW won't have access to controller
register which will lead to noc error.
Per qcom suggestion to pickup the change from 5.4 kernel

Change-Id: Idaf75f512ac1df18f777b192f7d85366360f724f
Signed-off-by: a17671 <a17671@motorola.com>
Reviewed-on: https://gerrit.mot.com/1926918
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key
Header file changes for Samsung
HPB_FS and the UFSFEATURE_31

Change-Id: I972820a59f5c70916f0f636796a8703c15ab482d
Signed-off-by: a17671 <a17671@motorola.com>
Reviewed-on: https://gerrit.mot.com/1773953
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Wei Deng <dengwei1@motorola.com>
Reviewed-by: Qing Chang <qing@motorola.com>
Submit-Approved: Jira Key
@goodmeow goodmeow merged commit 8eace32 into arrow-12.1 Apr 21, 2022
goodmeow pushed a commit that referenced this pull request Apr 27, 2022
Syzbot recently found a number of issues related to incremental-fs
(see bug numbers below). All have to do with the fact that incr-fs
allows mounts of the same source and target multiple times.
This is a design decision and the user space component "Data Loader"
expects this to work for app re-install use case.
The mounting depth needs to be controlled, however, and only allowed
to be two levels deep. In case of more than two mount attempts the
driver needs to return an error.
In case of the issues listed below the common pattern is that the
reproducer calls:

mount("./file0", "./file0", "incremental-fs", 0, NULL)

many times and then invokes a file operation like chmod, setxattr,
or open on the ./file0. This causes a recursive call for all the
mounted instances, which eventually causes a stack overflow and
a kernel crash:

BUG: stack guard page was hit at ffffc90000c0fff8
kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP KASAN

This change also cleans up the mount error path to properly clean
allocated resources and call deactivate_locked_super(), which
causes the incfs_kill_sb() to be called, where the sb is freed.

Bug: 211066171
Bug: 213140206
Bug: 213215835
Bug: 211914587
Bug: 211213635
Bug: 213137376
Bug: 211161296

Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Change-Id: I08d9b545a2715423296bf4beb67bdbbed78d1be1
goodmeow pushed a commit that referenced this pull request Apr 27, 2022
commit ddbc84f3f595cf1fc8234a191193b5d20ad43938 upstream.

ZONE_MOVABLE uses the remaining memory in each node.  Its starting pfn
is also aligned to MAX_ORDER_NR_PAGES.  It is possible for the remaining
memory in a node to be less than MAX_ORDER_NR_PAGES, meaning there is
not enough room for ZONE_MOVABLE on that node.

Unfortunately this condition is not checked for.  This leads to
zone_movable_pfn[] getting set to a pfn greater than the last pfn in a
node.

calculate_node_totalpages() then sets zone->present_pages to be greater
than zone->spanned_pages which is invalid, as spanned_pages represents
the maximum number of pages in a zone assuming no holes.

Subsequently it is possible free_area_init_core() will observe a zone of
size zero with present pages.  In this case it will skip setting up the
zone, including the initialisation of free_lists[].

However populated_zone() checks zone->present_pages to see if a zone has
memory available.  This is used by iterators such as
walk_zones_in_node().  pagetypeinfo_showfree() uses this to walk the
free_list of each zone in each node, which are assumed to be initialised
due to the zone not being empty.

As free_area_init_core() never initialised the free_lists[] this results
in the following kernel crash when trying to read /proc/pagetypeinfo:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
  CPU: 0 PID: 456 Comm: cat Not tainted 5.16.0 #461
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
  RIP: 0010:pagetypeinfo_show+0x163/0x460
  Code: 9e 82 e8 80 57 0e 00 49 8b 06 b9 01 00 00 00 4c 39 f0 75 16 e9 65 02 00 00 48 83 c1 01 48 81 f9 a0 86 01 00 0f 84 48 02 00 00 <48> 8b 00 4c 39 f0 75 e7 48 c7 c2 80 a2 e2 82 48 c7 c6 79 ef e3 82
  RSP: 0018:ffffc90001c4bd10 EFLAGS: 00010003
  RAX: 0000000000000000 RBX: ffff88801105f638 RCX: 0000000000000001
  RDX: 0000000000000001 RSI: 000000000000068b RDI: ffff8880163dc68b
  RBP: ffffc90001c4bd90 R08: 0000000000000001 R09: ffff8880163dc67e
  R10: 656c6261766f6d6e R11: 6c6261766f6d6e55 R12: ffff88807ffb4a00
  R13: ffff88807ffb49f8 R14: ffff88807ffb4580 R15: ffff88807ffb3000
  FS:  00007f9c83eff5c0(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000000013c8e000 CR4: 0000000000350ef0
  Call Trace:
   seq_read_iter+0x128/0x460
   proc_reg_read_iter+0x51/0x80
   new_sync_read+0x113/0x1a0
   vfs_read+0x136/0x1d0
   ksys_read+0x70/0xf0
   __x64_sys_read+0x1a/0x20
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Fix this by checking that the aligned zone_movable_pfn[] does not exceed
the end of the node, and if it does skip creating a movable zone on this
node.

Link: https://lkml.kernel.org/r/20220215025831.2113067-1-apopple@nvidia.com
Fixes: 2a1e274 ("Create the ZONE_MOVABLE zone")
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
goodmeow pushed a commit that referenced this pull request Apr 27, 2022
[ Upstream commit de2ae403b4c0e79a3410e63bc448542fbb9f9bfc ]

is_xen_pmu() is taking the cpu number as parameter, but it is not using
it. Instead it just tests whether the Xen PMU initialization on the
current cpu did succeed. As this test is done by checking a percpu
pointer, preemption needs to be disabled in order to avoid switching
the cpu while doing the test. While resuming from suspend() this seems
not to be the case:

[   88.082751] ACPI: PM: Low-level resume complete
[   88.087933] ACPI: EC: EC started
[   88.091464] ACPI: PM: Restoring platform NVS memory
[   88.097166] xen_acpi_processor: Uploading Xen processor PM info
[   88.103850] Enabling non-boot CPUs ...
[   88.108128] installing Xen timer for CPU 1
[   88.112763] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/7138
[   88.122256] caller is is_xen_pmu+0x12/0x30
[   88.126937] CPU: 0 PID: 7138 Comm: systemd-sleep Tainted: G        W         5.16.13-2.fc32.qubes.x86_64 #1
[   88.137939] Hardware name: Star Labs StarBook/StarBook, BIOS 7.97 03/21/2022
[   88.145930] Call Trace:
[   88.148757]  <TASK>
[   88.151193]  dump_stack_lvl+0x48/0x5e
[   88.155381]  check_preemption_disabled+0xde/0xe0
[   88.160641]  is_xen_pmu+0x12/0x30
[   88.164441]  xen_smp_intr_init_pv+0x75/0x100

Fix that by replacing is_xen_pmu() by a simple boolean variable which
reflects the Xen PMU initialization state on cpu 0.

Modify xen_pmu_init() to return early in case it is being called for a
cpu other than cpu 0 and the boolean variable not being set.

Fixes: bf6dfb1 ("xen/PMU: PMU emulation code")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20220325142002.31789-1-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
goodmeow pushed a commit that referenced this pull request Jun 30, 2022
[ Upstream commit 3120aac6d0ecd9accf56894aeac0e265f74d3d5a ]

UDC driver should not touch gadget's driver internals, especially it
should not reset driver->bus. This wasn't harmful so far, but since
commit fc274c1e9973 ("USB: gadget: Add a new bus for gadgets") gadget
subsystem got it's own bus and messing with ->bus triggers the
following NULL pointer dereference:

dwc2 12480000.hsotg: bound driver g_ether
8<--- cut here ---
Unable to handle kernel NULL pointer dereference at virtual address 00000000
[00000000] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in: ...
CPU: 0 PID: 620 Comm: modprobe Not tainted 5.18.0-rc5-next-20220504 #11862
Hardware name: Samsung Exynos (Flattened Device Tree)
PC is at module_add_driver+0x44/0xe8
LR is at sysfs_do_create_link_sd+0x84/0xe0
...
Process modprobe (pid: 620, stack limit = 0x(ptrval))
...
 module_add_driver from bus_add_driver+0xf4/0x1e4
 bus_add_driver from driver_register+0x78/0x10c
 driver_register from usb_gadget_register_driver_owner+0x40/0xb4
 usb_gadget_register_driver_owner from do_one_initcall+0x44/0x1e0
 do_one_initcall from do_init_module+0x44/0x1c8
 do_init_module from load_module+0x19b8/0x1b9c
 load_module from sys_finit_module+0xdc/0xfc
 sys_finit_module from ret_fast_syscall+0x0/0x54
Exception stack(0xf1771fa8 to 0xf1771ff0)
...
dwc2 12480000.hsotg: new device is high-speed
---[ end trace 0000000000000000 ]---

Fix this by removing driver->bus entry reset.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20220505104618.22729-1-m.szyprowski@samsung.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
goodmeow pushed a commit that referenced this pull request Jun 30, 2022
[ Upstream commit 06c4da89c24e7023ea448cadf8e9daf06a0aae6e ]

Otherwise there may be race between module removal and the handling of
netlink command, which can lead to the oops as shown below:

  BUG: kernel NULL pointer dereference, address: 0000000000000098
  Oops: 0002 [#1] SMP PTI
  CPU: 1 PID: 31299 Comm: nbd-client Tainted: G            E     5.14.0-rc4
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
  RIP: 0010:down_write+0x1a/0x50
  Call Trace:
   start_creating+0x89/0x130
   debugfs_create_dir+0x1b/0x130
   nbd_start_device+0x13d/0x390 [nbd]
   nbd_genl_connect+0x42f/0x748 [nbd]
   genl_family_rcv_msg_doit.isra.0+0xec/0x150
   genl_rcv_msg+0xe5/0x1e0
   netlink_rcv_skb+0x55/0x100
   genl_rcv+0x29/0x40
   netlink_unicast+0x1a8/0x250
   netlink_sendmsg+0x21b/0x430
   ____sys_sendmsg+0x2a4/0x2d0
   ___sys_sendmsg+0x81/0xc0
   __sys_sendmsg+0x62/0xb0
   __x64_sys_sendmsg+0x1f/0x30
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  Modules linked in: nbd(E-)

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20220521073749.3146892-2-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
goodmeow pushed a commit that referenced this pull request Jun 30, 2022
[ Upstream commit c55b2b983b0fa012942c3eb16384b2b722caa810 ]

When nbd module is being removing, nbd_alloc_config() may be
called concurrently by nbd_genl_connect(), although try_module_get()
will return false, but nbd_alloc_config() doesn't handle it.

The race may lead to the leak of nbd_config and its related
resources (e.g, recv_workq) and oops in nbd_read_stat() due
to the unload of nbd module as shown below:

  BUG: kernel NULL pointer dereference, address: 0000000000000040
  Oops: 0000 [#1] SMP PTI
  CPU: 5 PID: 13840 Comm: kworker/u17:33 Not tainted 5.14.0+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
  Workqueue: knbd16-recv recv_work [nbd]
  RIP: 0010:nbd_read_stat.cold+0x130/0x1a4 [nbd]
  Call Trace:
   recv_work+0x3b/0xb0 [nbd]
   process_one_work+0x1ed/0x390
   worker_thread+0x4a/0x3d0
   kthread+0x12a/0x150
   ret_from_fork+0x22/0x30

Fixing it by checking the return value of try_module_get()
in nbd_alloc_config(). As nbd_alloc_config() may return ERR_PTR(-ENODEV),
assign nbd->config only when nbd_alloc_config() succeeds to ensure
the value of nbd->config is binary (valid or NULL).

Also adding a debug message to check the reference counter
of nbd_config during module removal.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20220521073749.3146892-3-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
goodmeow pushed a commit that referenced this pull request Jul 8, 2022
…nction for Clang CFI

blake2s_compress_generic is weakly aliased by blake2s_compress. The
current harness for function selection uses a function pointer, which is
ordinarily inlined and resolved at compile time. But when Clang's CFI is
enabled, CFI still triggers when making an indirect call via a weak
symbol. This seems like a bug in Clang's CFI, as though it's bucketing
weak symbols and strong symbols differently. It also only seems to
trigger when "full LTO" mode is used, rather than "thin LTO".

[    0.000000][    T0] Kernel panic - not syncing: CFI failure (target: blake2s_compress_generic+0x0/0x1444)
[    0.000000][    T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-mainline-06981-g076c855b846e #1
[    0.000000][    T0] Hardware name: MT6873 (DT)
[    0.000000][    T0] Call trace:
[    0.000000][    T0]  dump_backtrace+0xfc/0x1dc
[    0.000000][    T0]  dump_stack_lvl+0xa8/0x11c
[    0.000000][    T0]  panic+0x194/0x464
[    0.000000][    T0]  __cfi_check_fail+0x54/0x58
[    0.000000][    T0]  __cfi_slowpath_diag+0x354/0x4b0
[    0.000000][    T0]  blake2s_update+0x14c/0x178
[    0.000000][    T0]  _extract_entropy+0xf4/0x29c
[    0.000000][    T0]  crng_initialize_primary+0x24/0x94
[    0.000000][    T0]  rand_initialize+0x2c/0x6c
[    0.000000][    T0]  start_kernel+0x2f8/0x65c
[    0.000000][    T0]  __primary_switched+0xc4/0x7be4
[    0.000000][    T0] Rebooting in 5 seconds..

Nonetheless, the function pointer method isn't so terrific anyway, so
this patch replaces it with a simple boolean, which also gets inlined
away. This successfully works around the Clang bug.

In general, I'm not too keen on all of the indirection involved here; it
clearly does more harm than good. Hopefully the whole thing can get
cleaned up down the road when lib/crypto is overhauled more
comprehensively. But for now, we go with a simple bandaid.

Fixes: 6048fdcc5f26 ("lib/crypto: blake2s: include as built-in")
Link: ClangBuiltLinux/linux#1567
Reported-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

(cherry picked from commit d2a02e3c8bb6b347818518edff5a4b40ff52d6d8)
Change-Id: I3a6461e3f05f57cd898f8a4f46e73fb0214037ef
Signed-off-by: Eric Biggers <ebiggers@google.com>
goodmeow pushed a commit that referenced this pull request Jul 9, 2022
…nction for Clang CFI

blake2s_compress_generic is weakly aliased by blake2s_compress. The
current harness for function selection uses a function pointer, which is
ordinarily inlined and resolved at compile time. But when Clang's CFI is
enabled, CFI still triggers when making an indirect call via a weak
symbol. This seems like a bug in Clang's CFI, as though it's bucketing
weak symbols and strong symbols differently. It also only seems to
trigger when "full LTO" mode is used, rather than "thin LTO".

[    0.000000][    T0] Kernel panic - not syncing: CFI failure (target: blake2s_compress_generic+0x0/0x1444)
[    0.000000][    T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-mainline-06981-g076c855b846e #1
[    0.000000][    T0] Hardware name: MT6873 (DT)
[    0.000000][    T0] Call trace:
[    0.000000][    T0]  dump_backtrace+0xfc/0x1dc
[    0.000000][    T0]  dump_stack_lvl+0xa8/0x11c
[    0.000000][    T0]  panic+0x194/0x464
[    0.000000][    T0]  __cfi_check_fail+0x54/0x58
[    0.000000][    T0]  __cfi_slowpath_diag+0x354/0x4b0
[    0.000000][    T0]  blake2s_update+0x14c/0x178
[    0.000000][    T0]  _extract_entropy+0xf4/0x29c
[    0.000000][    T0]  crng_initialize_primary+0x24/0x94
[    0.000000][    T0]  rand_initialize+0x2c/0x6c
[    0.000000][    T0]  start_kernel+0x2f8/0x65c
[    0.000000][    T0]  __primary_switched+0xc4/0x7be4
[    0.000000][    T0] Rebooting in 5 seconds..

Nonetheless, the function pointer method isn't so terrific anyway, so
this patch replaces it with a simple boolean, which also gets inlined
away. This successfully works around the Clang bug.

In general, I'm not too keen on all of the indirection involved here; it
clearly does more harm than good. Hopefully the whole thing can get
cleaned up down the road when lib/crypto is overhauled more
comprehensively. But for now, we go with a simple bandaid.

Fixes: 6048fdcc5f26 ("lib/crypto: blake2s: include as built-in")
Link: ClangBuiltLinux/linux#1567
Reported-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

(cherry picked from commit d2a02e3c8bb6b347818518edff5a4b40ff52d6d8)
Change-Id: I3a6461e3f05f57cd898f8a4f46e73fb0214037ef
Signed-off-by: Eric Biggers <ebiggers@google.com>
goodmeow pushed a commit that referenced this pull request Aug 22, 2022
Mismatched configuration leading to kernel crash when powering down eeprom.
Conclusion taken by comparing the compiled dtbo (which is then decompiled)
and decompiled dtbo from MIUI V12.5.1.0.RKHMIXM.

  [    9.920081] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000023
  [    9.920090] Mem abort info:
  [    9.920092]   ESR = 0x96000005
  [    9.920095]   Exception class = DABT (current EL), IL = 32 bits
  [    9.920097]   SET = 0, FnV = 0
  [    9.920099]   EA = 0, S1PTW = 0
  [    9.920101] Data abort info:
  [    9.920103]   ISV = 0, ISS = 0x00000005
  [    9.920106]   CM = 0, WnR = 0
  [    9.920109] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000fd044d25
  [    9.920111] [0000000000000023] pgd=0000000000000000, pud=0000000000000000
  [    9.920116] Internal error: Oops: 96000005 [#1] PREEMPT SMP
  [    9.920119] Modules linked in: msm_11ad_proxy
  [    9.920123] Process android.hardwar (pid: 878, stack limit = 0x00000000c84556d2)
  [    9.920127] CPU: 4 PID: 878 Comm: android.hardwar Tainted: G S                4.19.113-lineageos-g74fb848f9-dirty #12
  [    9.920131] Hardware name: Qualcomm Technologies, Inc. xiaomi alioth (DT)
  [    9.920134] pstate: 80400005 (Nzcv daif +PAN -UAO)
  [    9.920143] pc : cam_sensor_util_power_down+0x1fc/0x51c
  [    9.920146] lr : cam_sensor_util_power_down+0xfc/0x51c
  [    9.920148] sp : ffffff800ea8ba30
  [    9.920150] x29: ffffff800ea8ba50 x28: fffffffadc938000
  [    9.920153] x27: 0000000000000000 x26: 0000000000000068
  [    9.920156] x25: ffffffa24975b305 x24: 0000000000000000
  [    9.920159] x23: ffffffa24975b28d x22: 0000000000000000
  [    9.920161] x21: fffffffadc938000 x20: ffffffa24975af59
  [    9.920164] x19: fffffffbe23ec490 x18: 000000000000006c
  [    9.920166] x17: 00000000000a51d8 x16: 0000000000000000
  [    9.920169] x15: 0000000000000000 x14: 000000000000005c
  [    9.920171] x13: ffffff800ea8ba30 x12: ffffff800ea8b5b8
  [    9.920174] x11: ffffff800ea8b590 x10: ffffff80ffffffe0
  [    9.920176] x9 : 0000000000000009 x8 : 0000000000000009
  [    9.920178] x7 : 0000000000000000 x6 : fffffffbffa84dc1
  [    9.920181] x5 : 0000000000000003 x4 : 0000000000000009
  [    9.920183] x3 : ffffffa24975af59 x2 : 00000000000007a9
  [    9.920186] x1 : ffffffa24975b28d x0 : 0000000000000020
  [    9.920189] Call trace:
  [    9.920192]  cam_sensor_util_power_down+0x1fc/0x51c
  [    9.920196]  cam_eeprom_power_down+0x28/0xd8
  [    9.920199]  cam_eeprom_driver_cmd+0x1708/0x178c
  [    9.920202]  cam_eeprom_subdev_ioctl+0x24/0x38
  [    9.920207]  subdev_do_ioctl_lock+0x738/0xb94
  [    9.920210]  video_usercopy+0x364/0x6ac
  [    9.920212]  subdev_ioctl+0x14/0x1c
  [    9.920215]  v4l2_ioctl+0x50/0x5c
  [    9.920219]  do_vfs_ioctl+0x694/0xf68
  [    9.920221]  __arm64_sys_ioctl+0x70/0x98
  [    9.920225]  el0_svc_common+0x98/0x148
  [    9.920228]  el0_svc_handler+0x60/0x78
  [    9.920231]  el0_svc+0x8/0xc
  [    9.920234] Code: 51002509 7100113f 54000482 8b0802c9 (39406929)
  [    9.920237] ---[ end trace 728273a38e93d148 ]---
  [    9.920246] Kernel panic - not syncing: Fatal exception

Signed-off-by: Albert I <kras@raphielgang.org>
Change-Id: I3c316b2be0656762cd46484bc41c181d5a53de19
goodmeow pushed a commit that referenced this pull request Aug 22, 2022
Our static-static calculation returns a failure if the public key is of
low order. We check for this when peers are added, and don't allow them
to be added if they're low order, except in the case where we haven't
yet been given a private key. In that case, we would defer the removal
of the peer until we're given a private key, since at that point we're
doing new static-static calculations which incur failures we can act on.
This meant, however, that we wound up removing peers rather late in the
configuration flow.

Syzkaller points out that peer_remove calls flush_workqueue, which in
turn might then wait for sending a handshake initiation to complete.
Since handshake initiation needs the static identity lock, holding the
static identity lock while calling peer_remove can result in a rare
deadlock. We have precisely this case in this situation of late-stage
peer removal based on an invalid public key. We can't drop the lock when
removing, because then incoming handshakes might interact with a bogus
static-static calculation.

While the band-aid patch for this would involve breaking up the peer
removal into two steps like wg_peer_remove_all does, in order to solve
the locking issue, there's actually a much more elegant way of fixing
this:

If the static-static calculation succeeds with one private key, it
*must* succeed with all others, because all 32-byte strings map to valid
private keys, thanks to clamping. That means we can get rid of this
silly dance and locking headaches of removing peers late in the
configuration flow, and instead just reject them early on, regardless of
whether the device has yet been assigned a private key. For the case
where the device doesn't yet have a private key, we safely use zeros
just for the purposes of checking for low order points by way of
checking the output of the calculation.

The following PoC will trigger the deadlock:

ip link add wg0 type wireguard
ip addr add 10.0.0.1/24 dev wg0
ip link set wg0 up
ping -f 10.0.0.2 &
while true; do
        wg set wg0 private-key /dev/null peer AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= allowed-ips 10.0.0.0/24 endpoint 10.0.0.3:1234
        wg set wg0 private-key <(echo AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=)
done

[    0.949105] ======================================================
[    0.949550] WARNING: possible circular locking dependency detected
[    0.950143] 5.5.0-debug+ #18 Not tainted
[    0.950431] ------------------------------------------------------
[    0.950959] wg/89 is trying to acquire lock:
[    0.951252] ffff8880333e2128 ((wq_completion)wg-kex-wg0){+.+.}, at: flush_workqueue+0xe3/0x12f0
[    0.951865]
[    0.951865] but task is already holding lock:
[    0.952280] ffff888032819bc0 (&wg->static_identity.lock){++++}, at: wg_set_device+0x95d/0xcc0
[    0.953011]
[    0.953011] which lock already depends on the new lock.
[    0.953011]
[    0.953651]
[    0.953651] the existing dependency chain (in reverse order) is:
[    0.954292]
[    0.954292] -> #2 (&wg->static_identity.lock){++++}:
[    0.954804]        lock_acquire+0x127/0x350
[    0.955133]        down_read+0x83/0x410
[    0.955428]        wg_noise_handshake_create_initiation+0x97/0x700
[    0.955885]        wg_packet_send_handshake_initiation+0x13a/0x280
[    0.956401]        wg_packet_handshake_send_worker+0x10/0x20
[    0.956841]        process_one_work+0x806/0x1500
[    0.957167]        worker_thread+0x8c/0xcb0
[    0.957549]        kthread+0x2ee/0x3b0
[    0.957792]        ret_from_fork+0x24/0x30
[    0.958234]
[    0.958234] -> #1 ((work_completion)(&peer->transmit_handshake_work)){+.+.}:
[    0.958808]        lock_acquire+0x127/0x350
[    0.959075]        process_one_work+0x7ab/0x1500
[    0.959369]        worker_thread+0x8c/0xcb0
[    0.959639]        kthread+0x2ee/0x3b0
[    0.959896]        ret_from_fork+0x24/0x30
[    0.960346]
[    0.960346] -> #0 ((wq_completion)wg-kex-wg0){+.+.}:
[    0.960945]        check_prev_add+0x167/0x1e20
[    0.961351]        __lock_acquire+0x2012/0x3170
[    0.961725]        lock_acquire+0x127/0x350
[    0.961990]        flush_workqueue+0x106/0x12f0
[    0.962280]        peer_remove_after_dead+0x160/0x220
[    0.962600]        wg_set_device+0xa24/0xcc0
[    0.962994]        genl_rcv_msg+0x52f/0xe90
[    0.963298]        netlink_rcv_skb+0x111/0x320
[    0.963618]        genl_rcv+0x1f/0x30
[    0.963853]        netlink_unicast+0x3f6/0x610
[    0.964245]        netlink_sendmsg+0x700/0xb80
[    0.964586]        __sys_sendto+0x1dd/0x2c0
[    0.964854]        __x64_sys_sendto+0xd8/0x1b0
[    0.965141]        do_syscall_64+0x90/0xd9a
[    0.965408]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[    0.965769]
[    0.965769] other info that might help us debug this:
[    0.965769]
[    0.966337] Chain exists of:
[    0.966337]   (wq_completion)wg-kex-wg0 --> (work_completion)(&peer->transmit_handshake_work) --> &wg->static_identity.lock
[    0.966337]
[    0.967417]  Possible unsafe locking scenario:
[    0.967417]
[    0.967836]        CPU0                    CPU1
[    0.968155]        ----                    ----
[    0.968497]   lock(&wg->static_identity.lock);
[    0.968779]                                lock((work_completion)(&peer->transmit_handshake_work));
[    0.969345]                                lock(&wg->static_identity.lock);
[    0.969809]   lock((wq_completion)wg-kex-wg0);
[    0.970146]
[    0.970146]  *** DEADLOCK ***
[    0.970146]
[    0.970531] 5 locks held by wg/89:
[    0.970908]  #0: ffffffff827433c8 (cb_lock){++++}, at: genl_rcv+0x10/0x30
[    0.971400]  #1: ffffffff82743480 (genl_mutex){+.+.}, at: genl_rcv_msg+0x642/0xe90
[    0.971924]  #2: ffffffff827160c0 (rtnl_mutex){+.+.}, at: wg_set_device+0x9f/0xcc0
[    0.972488]  #3: ffff888032819de0 (&wg->device_update_lock){+.+.}, at: wg_set_device+0xb0/0xcc0
[    0.973095]  xiaomi-sm8250-devs#4: ffff888032819bc0 (&wg->static_identity.lock){++++}, at: wg_set_device+0x95d/0xcc0
[    0.973653]
[    0.973653] stack backtrace:
[    0.973932] CPU: 1 PID: 89 Comm: wg Not tainted 5.5.0-debug+ #18
[    0.974476] Call Trace:
[    0.974638]  dump_stack+0x97/0xe0
[    0.974869]  check_noncircular+0x312/0x3e0
[    0.975132]  ? print_circular_bug+0x1f0/0x1f0
[    0.975410]  ? __kernel_text_address+0x9/0x30
[    0.975727]  ? unwind_get_return_address+0x51/0x90
[    0.976024]  check_prev_add+0x167/0x1e20
[    0.976367]  ? graph_lock+0x70/0x160
[    0.976682]  __lock_acquire+0x2012/0x3170
[    0.976998]  ? register_lock_class+0x1140/0x1140
[    0.977323]  lock_acquire+0x127/0x350
[    0.977627]  ? flush_workqueue+0xe3/0x12f0
[    0.977890]  flush_workqueue+0x106/0x12f0
[    0.978147]  ? flush_workqueue+0xe3/0x12f0
[    0.978410]  ? find_held_lock+0x2c/0x110
[    0.978662]  ? lock_downgrade+0x6e0/0x6e0
[    0.978919]  ? queue_rcu_work+0x60/0x60
[    0.979166]  ? netif_napi_del+0x151/0x3b0
[    0.979501]  ? peer_remove_after_dead+0x160/0x220
[    0.979871]  peer_remove_after_dead+0x160/0x220
[    0.980232]  wg_set_device+0xa24/0xcc0
[    0.980516]  ? deref_stack_reg+0x8e/0xc0
[    0.980801]  ? set_peer+0xe10/0xe10
[    0.981040]  ? __ww_mutex_check_waiters+0x150/0x150
[    0.981430]  ? __nla_validate_parse+0x163/0x270
[    0.981719]  ? genl_family_rcv_msg_attrs_parse+0x13f/0x310
[    0.982078]  genl_rcv_msg+0x52f/0xe90
[    0.982348]  ? genl_family_rcv_msg_attrs_parse+0x310/0x310
[    0.982690]  ? register_lock_class+0x1140/0x1140
[    0.983049]  netlink_rcv_skb+0x111/0x320
[    0.983298]  ? genl_family_rcv_msg_attrs_parse+0x310/0x310
[    0.983645]  ? netlink_ack+0x880/0x880
[    0.983888]  genl_rcv+0x1f/0x30
[    0.984168]  netlink_unicast+0x3f6/0x610
[    0.984443]  ? netlink_detachskb+0x60/0x60
[    0.984729]  ? find_held_lock+0x2c/0x110
[    0.984976]  netlink_sendmsg+0x700/0xb80
[    0.985220]  ? netlink_broadcast_filtered+0xa60/0xa60
[    0.985533]  __sys_sendto+0x1dd/0x2c0
[    0.985763]  ? __x64_sys_getpeername+0xb0/0xb0
[    0.986039]  ? sockfd_lookup_light+0x17/0x160
[    0.986397]  ? __sys_recvmsg+0x8c/0xf0
[    0.986711]  ? __sys_recvmsg_sock+0xd0/0xd0
[    0.987018]  __x64_sys_sendto+0xd8/0x1b0
[    0.987283]  ? lockdep_hardirqs_on+0x39b/0x5a0
[    0.987666]  do_syscall_64+0x90/0xd9a
[    0.987903]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[    0.988223] RIP: 0033:0x7fe77c12003e
[    0.988508] Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 4
[    0.989666] RSP: 002b:00007fffada2ed58 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[    0.990137] RAX: ffffffffffffffda RBX: 00007fe77c159d48 RCX: 00007fe77c12003e
[    0.990583] RDX: 0000000000000040 RSI: 000055fd1d38e020 RDI: 0000000000000004
[    0.991091] RBP: 000055fd1d38e020 R08: 000055fd1cb63358 R09: 000000000000000c
[    0.991568] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000002c
[    0.992014] R13: 0000000000000004 R14: 000055fd1d38e020 R15: 0000000000000001

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ec31c2676a10e064878927b243fada8c2fb0c03c)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I860bfac72c98c8c9b26f4490b4f346dc67892f87
Signed-off-by: althafvly <althafvly@gmail.com>
goodmeow pushed a commit that referenced this pull request Aug 22, 2022
…rom table

Eric reported that syzkaller found a race of this variety:

CPU 1                                       CPU 2
-------------------------------------------|---------------------------------------
wg_index_hashtable_replace(old, ...)       |
  if (hlist_unhashed(&old->index_hash))    |
                                           | wg_index_hashtable_remove(old)
                                           |   hlist_del_init_rcu(&old->index_hash)
				           |     old->index_hash.pprev = NULL
  hlist_replace_rcu(&old->index_hash, ...) |
    *old->index_hash.pprev                 |

Syzbot wasn't actually able to reproduce this more than once or create a
reproducer, because the race window between checking "hlist_unhashed" and
calling "hlist_replace_rcu" is just so small. Adding an mdelay(5) or
similar there helps make this demonstrable using this simple script:

    #!/bin/bash
    set -ex
    trap 'kill $pid1; kill $pid2; ip link del wg0; ip link del wg1' EXIT
    ip link add wg0 type wireguard
    ip link add wg1 type wireguard
    wg set wg0 private-key <(wg genkey) listen-port 9999
    wg set wg1 private-key <(wg genkey) peer $(wg show wg0 public-key) endpoint 127.0.0.1:9999 persistent-keepalive 1
    wg set wg0 peer $(wg show wg1 public-key)
    ip link set wg0 up
    yes link set wg1 up | ip -force -batch - &
    pid1=$!
    yes link set wg1 down | ip -force -batch - &
    pid2=$!
    wait

The fundumental underlying problem is that we permit calls to wg_index_
hashtable_remove(handshake.entry) without requiring the caller to take
the handshake mutex that is intended to protect members of handshake
during mutations. This is consistently the case with calls to wg_index_
hashtable_insert(handshake.entry) and wg_index_hashtable_replace(
handshake.entry), but it's missing from a pertinent callsite of wg_
index_hashtable_remove(handshake.entry). So, this patch makes sure that
mutex is taken.

The original code was a little bit funky though, in the form of:

    remove(handshake.entry)
    lock(), memzero(handshake.some_members), unlock()
    remove(handshake.entry)

The original intention of that double removal pattern outside the lock
appears to be some attempt to prevent insertions that might happen while
locks are dropped during expensive crypto operations, but actually, all
callers of wg_index_hashtable_insert(handshake.entry) take the write
lock and then explicitly check handshake.state, as they should, which
the aforementioned memzero clears, which means an insertion should
already be impossible. And regardless, the original intention was
necessarily racy, since it wasn't guaranteed that something else would
run after the unlock() instead of after the remove(). So, from a
soundness perspective, it seems positive to remove what looks like a
hack at best.

The crash from both syzbot and from the script above is as follows:

  general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
  KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
  CPU: 0 PID: 7395 Comm: kworker/0:3 Not tainted 5.9.0-rc4-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Workqueue: wg-kex-wg1 wg_packet_handshake_receive_worker
  RIP: 0010:hlist_replace_rcu include/linux/rculist.h:505 [inline]
  RIP: 0010:wg_index_hashtable_replace+0x176/0x330 drivers/net/wireguard/peerlookup.c:174
  Code: 00 fc ff df 48 89 f9 48 c1 e9 03 80 3c 01 00 0f 85 44 01 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 10 48 89 c6 48 c1 ee 03 <80> 3c 0e 00 0f 85 06 01 00 00 48 85 d2 4c 89 28 74 47 e8 a3 4f b5
  RSP: 0018:ffffc90006a97bf8 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff888050ffc4f8 RCX: dffffc0000000000
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88808e04e010
  RBP: ffff88808e04e000 R08: 0000000000000001 R09: ffff8880543d0000
  R10: ffffed100a87a000 R11: 000000000000016e R12: ffff8880543d0000
  R13: ffff88808e04e008 R14: ffff888050ffc508 R15: ffff888050ffc500
  FS:  0000000000000000(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000f5505db0 CR3: 0000000097cf7000 CR4: 00000000001526f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
  wg_noise_handshake_begin_session+0x752/0xc9a drivers/net/wireguard/noise.c:820
  wg_receive_handshake_packet drivers/net/wireguard/receive.c:183 [inline]
  wg_packet_handshake_receive_worker+0x33b/0x730 drivers/net/wireguard/receive.c:220
  process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
  worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
  kthread+0x3b5/0x4a0 kernel/kthread.c:292
  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/wireguard/20200908145911.4090480-1-edumazet@google.com/
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9179ba31367bcf481c3c79b5f028c94faad9f30a)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5d1ca2bb77b61c654b2f56660a6b1c3f5fb2446f
Signed-off-by: althafvly <althafvly@gmail.com>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
As Wenqing Liu reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=216456

BUG: KASAN: use-after-free in recover_data+0x63ae/0x6ae0 [f2fs]
Read of size 4 at addr ffff8881464dcd80 by task mount/1013

CPU: 3 PID: 1013 Comm: mount Tainted: G        W          6.0.0-rc4 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
Call Trace:
 dump_stack_lvl+0x45/0x5e
 print_report.cold+0xf3/0x68d
 kasan_report+0xa8/0x130
 recover_data+0x63ae/0x6ae0 [f2fs]
 f2fs_recover_fsync_data+0x120d/0x1fc0 [f2fs]
 f2fs_fill_super+0x4665/0x61e0 [f2fs]
 mount_bdev+0x2cf/0x3b0
 legacy_get_tree+0xed/0x1d0
 vfs_get_tree+0x81/0x2b0
 path_mount+0x47e/0x19d0
 do_mount+0xce/0xf0
 __x64_sys_mount+0x12c/0x1a0
 do_syscall_64+0x38/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

The root cause is: in fuzzed image, SSA table is corrupted: ofs_in_node
is larger than ADDRS_PER_PAGE(), result in out-of-range access on 4k-size
page.

- recover_data
 - do_recover_data
  - check_index_in_prev_nodes
   - f2fs_data_blkaddr

This patch adds sanity check on summary info in recovery and GC flow
in where the flows rely on them.

After patch:
[   29.310883] F2FS-fs (loop0): Inconsistent ofs_in_node:65286 in summary, ino:0, nid:6, max:1018

Cc: <stable@kernel.org>
Reported-by: Wenqing Liu <wenqingliu0120@gmail.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
It is possible that ino of dirent or orphan inode is corrupted in a
fuzzed image, occasionally, if corrupted ino is equal to meta ino:
meta_ino, node_ino or compress_ino, caller of f2fs_iget() from below
call paths will get meta inode directly, it's not allowed, let's
add sanity check to detect such cases.

case #1
- recover_dentry
 - __f2fs_find_entry
 - f2fs_iget_retry

case #2
- recover_orphan_inode
 - f2fs_iget_retry

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
There is bug on issue after atgc feature is enabled in
32bits platform as the following log:

F2FS-fs (dm-x): inconsistent rbtree, cur(3470333575168) next(3320009719808)
------------[ cut here ]------------
kernel BUG at fs/f2fs/gc.c:602!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
PC is at get_victim_by_default+0x13c0/0x1498
LR is at f2fs_check_rb_tree_consistence+0xc4/0xd4
....
[<c04d98b0>] (get_victim_by_default) from [<c04d4f44>] (f2fs_gc+0x220/0x6cc)
[<c04d4f44>] (f2fs_gc) from [<c04d4780>] (gc_thread_func+0x2ac/0x708)
[<c04d4780>] (gc_thread_func) from [<c015c774>] (kthread+0x1a8/0x1b4)
[<c015c774>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)

the reason is the 64bits key in struct rb_entry has __packed attibute
but has not in struct victim_entry, so the wrong key value got by
struct rb_entry in f2fs_check_rb_tree_consistence, the following are
the memory layouts of struct rb_entry and struct victim_entry in 32bits
platform:

struct rb_entry {
   [0] struct rb_node rb_node;
       union {
           struct {...};
  [12]     unsigned long long key;
       };
}
SIZE: 20

struct victim_entry {
   [0] struct rb_node rb_node;
       union {
           struct {...};
  [16]     struct victim_info vi;
       };
  [32] struct list_head list;
}
SIZE: 40

This patch fix the inconsistence layout of 64bits key between
struct rb_entry and struct victim_entry.

Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")

Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
Wei Chen reports a kernel bug as blew:

INFO: task syz-executor.0:29056 blocked for more than 143 seconds.
      Not tainted 5.15.0-rc5 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.0  state:D stack:14632 pid:29056 ppid:  6574 flags:0x00000004
Call Trace:
 __schedule+0x4a1/0x1720
 schedule+0x36/0xe0
 rwsem_down_write_slowpath+0x322/0x7a0
 fscrypt_ioctl_set_policy+0x11f/0x2a0
 __f2fs_ioctl+0x1a9f/0x5780
 f2fs_ioctl+0x89/0x3a0
 __x64_sys_ioctl+0xe8/0x140
 do_syscall_64+0x34/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Eric did some investigation on this issue, quoted from reply of Eric:

"Well, the quality of this bug report has a lot to be desired (not on
upstream kernel, reproducer is full of totally irrelevant stuff, not
sent to the mailing list of the filesystem whose disk image is being
fuzzed, etc.).  But what is going on is that f2fs_empty_dir() doesn't
consider the case of a directory with an extremely large i_size on a
malicious disk image.

Specifically, the reproducer mounts an f2fs image with a directory
that has an i_size of 14814520042850357248, then calls
FS_IOC_SET_ENCRYPTION_POLICY on it.

That results in a call to f2fs_empty_dir() to check whether the
directory is empty.  f2fs_empty_dir() then iterates through all
3616826182336513 blocks the directory allegedly contains to check
whether any contain anything.  i_rwsem is held during this, so
anything else that tries to take it will hang."

In order to solve this issue, let's use f2fs_get_next_page_offset()
to speed up iteration by skipping holes for all below functions:
- f2fs_empty_dir
- f2fs_readdir
- find_in_level

The way why we can speed up iteration was described in
'commit 3cf4574 ("f2fs: introduce get_next_page_offset to speed
up SEEK_DATA")'.

Meanwhile, in f2fs_empty_dir(), let's use f2fs_find_data_page()
instead f2fs_get_lock_data_page(), due to i_rwsem was held in
caller of f2fs_empty_dir(), there shouldn't be any races, so it's
fine to not lock dentry page during lookuping dirents in the page.

Link: https://lore.kernel.org/lkml/536944df-a0ae-1dd8-148f-510b476e1347@kernel.org/T/
Reported-by: Wei Chen <harperchen1110@gmail.com>
Cc: Eric Biggers <ebiggers@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
Now, we allow to use CMA pages for certain user space
allocations. One of them is anonymous page fault case.
To align the use case, we should also allow to use CMA
pages in swap-in cases. This could help mitigate OOM
on swap-in cases showing plenty of free CMA left.

logd.klogd invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-1000
CPU: 0 PID: 433 Comm: logd.klogd Tainted: G        W  OE     5.10.100-android13-0 #1

Call trace:
 dump_backtrace.cfi_jt+0x0/0x8
 show_stack+0x1c/0x2c
 dump_stack_lvl+0xc0/0x13c
 dump_header+0x54/0x238
 oom_kill_process+0xb0/0x158
 out_of_memory+0x17c/0x328
 __alloc_pages_slowpath+0x5c4/0x8d0
 __alloc_pages_nodemask+0x1bc/0x2e0
 __read_swap_cache_async+0xdc/0x370
 swap_vma_readahead+0x3b4/0x488
 swapin_readahead+0x3c/0x54
 do_swap_page+0x1e0/0xaa0
 handle_pte_fault+0x128/0x1e0
 handle_mm_fault+0x308/0x590
 do_page_fault+0x33c/0x478
 do_translation_fault+0x58/0x11c
 do_mem_abort+0x68/0x144
 el0_da+0x24/0x34
 el0_sync_handler+0xc4/0xec
 el0_sync+0x1c0/0x200
Mem-Info:
active_anon:0 inactive_anon:3222 isolated_anon:62
 active_file:232 inactive_file:428 isolated_file:0
 unevictable:37232 dirty:3 writeback:40
 slab_reclaimable:19943 slab_unreclaimable:281193
 mapped:37126 shmem:2815 pagetables:8981 bounce:0
 free:126007 free_pcp:223 free_cma:123062
Node 0 active_anon:16kB inactive_anon:13160kB active_file:292kB inactive_file:2000kB unevictable:148928kB isolated(anon):0kB isolated(file):0kB mapped:148308kB dirty:12kB writeback:164kB shmem:11260kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:20528kB shadow_call_stack:5200kB all_unreclaimable? no
DMA32 free:14128kB min:7572kB low:22636kB high:37700kB reserved_highatomic:4096KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1913856kB managed:1553276kB mlocked:0kB pagetables:1292kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:2520kB
lowmem_reserve[]: 0 0 0
Normal free:489888kB min:19808kB low:59220kB high:98632kB reserved_highatomic:36864KB active_anon:20kB inactive_anon:12168kB active_file:0kB inactive_file:1640kB unevictable:148928kB writepending:180kB present:4194304kB managed:4063392kB mlocked:148928kB pagetables:34632kB bounce:0kB free_pcp:1928kB local_pcp:0kB free_cma:489752kB
lowmem_reserve[]: 0 0 0
DMA32: 166*4kB (UME) 163*8kB (UMECH) 592*16kB (UMCH) 5*32kB (UC) 2*64kB (C) 2*128kB (C) 2*256kB (C) 1*512kB (C) 1*1024kB (C) 0*2048kB 0*4096kB = 14032kB
Normal: 969*4kB (C) 77*8kB (C) 40*16kB (C) 17*32kB (C) 5*64kB (C) 1*128kB (C) 2*256kB (C) 1*512kB (C) 0*1024kB 0*2048kB 118*4096kB (C) = 490476kB
40220 total pagecache pages
30 pages in swap cache
Swap cache stats: add 2634625, delete 2635304, find 160621/2963954
Free swap  = 1473788kB
Total swap = 2097148kB

Bug: 229822798
Signed-off-by: Martin Liu <liumartin@google.com>
Change-Id: Ia0bb6f72e52f77f26062e1769bfd92e831f07cab
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
There were many order-3 fail allocation report while VM had lots of
*reclaimable* memory.

17353.434071] kworker/u16:4 invoked oom-killer: gfp_mask=0x6160c0(GFP_KERNEL|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_MEMALLOC), nodemask=(null), order=3, oom_score_adj=0
[17353.434079] kworker/u16:4 cpuset=/ mems_allowed=0
[17353.434086] CPU: 6 PID: 30045 Comm: kworker/u16:4 Tainted: G S      WC O      4.19.95-g8137b6ce669e-ab6554412 #1
[17353.434089] Hardware name: Google Inc. MSM sm7250 v2 Bramble DVT (DT)
[17353.434194] Workqueue: iparepwq95 __typeid__ZTSFiP44ipa_disable_force_clear_datapath_req_msg_v01E_global_addr [ipa3]
[17353.434197] Call trace:
[17353.434206] __typeid__ZTSFjP11task_structPK11user_regsetE_global_addr+0x14/0x18
[17353.434210] dump_stack+0xbc/0xf8
[17353.434217] dump_header+0xc8/0x250
[17353.434220] oom_kill_process+0x130/0x538
[17353.434222] out_of_memory+0x320/0x444
[17353.434226] __alloc_pages_nodemask+0x1124/0x13b4
[17353.434314] ipa3_alloc_rx_pkt_page+0x64/0x1a8 [ipa3]
[17353.434403] ipa3_wq_page_repl+0x78/0x1a4 [ipa3]
[17353.434407] process_one_work+0x3a8/0x6e4
[17353.434410] worker_thread+0x394/0x820
[17353.434413] kthread+0x19c/0x1ac
[17353.434417] ret_from_fork+0x10/0x18
[17353.434419] Mem-Info:
[17353.434424] active_anon:357378 inactive_anon:119141 isolated_anon:13\x0a active_file:97495 inactive_file:122151 isolated_file:22\x0a unevictable:49750 dirty:3553 writeback:0 unstable:0\x0a slab_reclaimable:30018 slab_unreclaimable:73884\x0a mapped:259586 shmem:27580 pagetables:39581 bounce:0\x0a free:17710 free_pcp:301 free_cma:0
[17353.434433] Node 0 active_anon:1429512kB inactive_anon:476564kB active_file:389980kB inactive_file:488604kB unevictable:199000kB isolated(anon):52kB isolated(file):88kB mapped:1038344kB dirty:14212kB writeback:0kB shmem:110320kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[17353.434439] Normal free:70840kB min:9172kB low:43900kB high:49484kB active_anon:1429284kB inactive_anon:476336kB active_file:389980kB inactive_file:488604kB unevictable:199000kB writepending:14212kB present:5764280kB managed:5584928kB mlocked:199000kB kernel_stack:92656kB shadow_call_stack:5792kB pagetables:158324kB bounce:0kB free_pcp:1204kB local_pcp:108kB free_cma:0kB
[17353.434441] lowmem_reserve[]: 0 0
[17353.434444] Normal: 8956*4kB (UMEH) 2726*8kB (UH) 751*16kB (UH) 33*32kB (H) 7*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 71152kB
[17353.434451] 300317 total pagecache pages
[17353.434454] 4228 pages in swap cache
[17353.434456] Swap cache stats: add 20710158, delete 20707317, find 1014864/9891370
[17353.434459] Free swap  = 103732kB
[17353.434460] Total swap = 2097148kB
[17353.434462] 1441070 pages RAM
[17353.434465] 0 pages HighMem/MovableOnly
[17353.434466] 44838 pages reserved
[17353.434469] 73728 pages cma reserved

When we saw the trace, compaction finished with COMPACT_COMPLETE(iow, it
already did full scanning a zone but failed to create order-3 allocation)
so should_compact_retry returns "false".

           <...>-30045 [006] .... 17353.433704: reclaim_retry_zone: node=0 zone=Normal   order=3 reclaimable=696132 available=713920 min_wmark=2293 no_progress_loops=0 wmark_check=0
           <...>-30045 [006] .... 17353.433706: compact_retry: order=3 priority=COMPACT_PRIO_SYNC_FULL compaction_result=failed retries=0 max_retries=16 should_retry=0

If we see previous trace, we could see compaction is hard to find free pages
in the zone so free scanner of compaction moves fast toward migration scanner
and finally, they(migration scanner and free page scanner) crossed over.

           <...>-30045 [006] .... 17353.427026: mm_compaction_isolate_freepages: range=(0x144c00 ~ 0x145000) nr_scanned=784 nr_taken=0
           <...>-30045 [006] .... 17353.427037: mm_compaction_isolate_freepages: range=(0x144800 ~ 0x144c00) nr_scanned=1019 nr_taken=0
           <...>-30045 [006] .... 17353.427049: mm_compaction_isolate_freepages: range=(0x144400 ~ 0x144800) nr_scanned=880 nr_taken=1
           <...>-30045 [006] .... 17353.427061: mm_compaction_isolate_freepages: range=(0x144000 ~ 0x144400) nr_scanned=869 nr_taken=0
           <...>-30045 [006] .... 17353.427212: mm_compaction_isolate_freepages: range=(0x140c00 ~ 0x141000) nr_scanned=1016 nr_taken=0
..
..
           <...>-30045 [006] .... 17353.433696: mm_compaction_finished: node=0 zone=Normal   order=3 ret=complete
           <...>-30045 [006] .... 17353.433698: mm_compaction_end: zone_start=0x80600 migrate_pfn=0xc9400 free_pfn=0xc9500 zone_end=0x200000, mode=sync status=complete

If we see previous trace to see reclaim activities, we could see
it was not hard to reclaim memory.

           <...>-30045 [006] .... 17353.413941: mm_vmscan_direct_reclaim_begin: order=3 may_writepage=1 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_MEMALLOC classzone_idx=0
           <...>-30045 [006] d..1 17353.413946: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=8 nr_scanned=8 nr_skipped=0 nr_taken=8 lru=inactive_anon
           <...>-30045 [006] .... 17353.413958: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=8 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=8 nr_ref_keep=0 nr_unmap_fail=0 priority=12 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC
           <...>-30045 [006] .... 17353.413960: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119119 inactive=119119 total_active=357352 active=357352 ratio=3 flags=RECLAIM_WB_ANON
           <...>-30045 [006] d..1 17353.413965: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=22 nr_scanned=22 nr_skipped=0 nr_taken=22 lru=inactive_file
           <...>-30045 [006] .... 17353.413979: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=22 nr_reclaimed=22 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 priority=12 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
           <...>-30045 [006] .... 17353.413979: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=122195 inactive=122195 total_active=97508 active=97508 ratio=1 flags=RECLAIM_WB_FILE
           <...>-30045 [006] .... 17353.413980: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119119 inactive=119119 total_active=357352 active=357352 ratio=3 flags=RECLAIM_WB_ANON
           <...>-30045 [006] .... 17353.414134: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON
           <...>-30045 [006] .... 17353.414135: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON
           <...>-30045 [006] d..1 17353.414141: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=29 nr_scanned=29 nr_skipped=0 nr_taken=29 lru=inactive_anon
           <...>-30045 [006] .... 17353.414170: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=29 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=29 nr_ref_keep=0 nr_unmap_fail=0 priority=10 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC
           <...>-30045 [006] .... 17353.414170: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119107 inactive=119107 total_active=357385 active=357385 ratio=3 flags=RECLAIM_WB_ANON
           <...>-30045 [006] d..1 17353.414176: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=32 nr_scanned=32 nr_skipped=0 nr_taken=32 lru=active_anon
           <...>-30045 [006] .... 17353.414206: mm_vmscan_lru_shrink_active: nid=0 nr_taken=32 nr_active=0 nr_deactivated=32 nr_referenced=32 priority=10 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC
           <...>-30045 [006] d..1 17353.414212: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=32 nr_scanned=32 nr_skipped=0 nr_taken=32 lru=inactive_file
           <...>-30045 [006] .... 17353.414225: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=32 nr_reclaimed=32 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 priority=10 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
           <...>-30045 [006] .... 17353.414225: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=122131 inactive=122131 total_active=97508 active=97508 ratio=1 flags=RECLAIM_WB_FILE
           <...>-30045 [006] d..1 17353.414228: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=16 nr_scanned=16 nr_skipped=0 nr_taken=16 lru=inactive_file
           <...>-30045 [006] .... 17353.414235: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=16 nr_reclaimed=16 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 priority=10 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
           <...>-30045 [006] .... 17353.414235: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=122115 inactive=122115 total_active=97508 active=97508 ratio=1 flags=RECLAIM_WB_FILE
           <...>-30045 [006] .... 17353.414236: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119139 inactive=119139 total_active=357353 active=357353 ratio=3 flags=RECLAIM_WB_ANON
           <...>-30045 [006] .... 17353.414320: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON
           <...>-30045 [006] .... 17353.414321: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON
           <...>-30045 [006] .... 17353.414339: mm_vmscan_direct_reclaim_end: nr_reclaimed=70

Based on that, we could assume that if reclaimer has reclaimed more pages,
compaction could find free pages easily so free scanner of compaction were
not moved fast like that. That means it wouldn't fail for non-costly high-order
allocation.

What this patch does is if the order is non-costly high order allocation,
it will keep trying migration with reclaiming if system has enough
reclaimable memory.

Bug: 159909686
Bug: 156785617
Bug: 158449887
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Ib5ac2644d06cfca88fcc3cff93d2201058c4d488
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
The warning was caused by it holds mons_lock than holds
devfreq_list_lock when probe and hold devfreq_list_lock than holds
mons_lock when start governor. But as devfreq framework design, the
deadlock won't happen. The device just can start the governor has been
registered when devfreq_add_device and if the devices wants to set the
governor to the device when devfreq_add_governor, the device should use
the governor as default governor. They are conflict, so it never happens.

         : ======================================================
         : WARNING: possible circular locking dependency detected
         : 5.10.43-android12-9-00011-g4be155626125-dirty #1 Not tainted
         : ------------------------------------------------------
         : init/1 is trying to acquire lock:
 ffffff88117aa230 (&cpu_grp->mons_lock){+.+.}-{3: 3}, at: start_hwmon+0x74/0x3f4 [arm_memlat_mon]
         : but task is already holding lock:
 ffffffc01275d7c0 (devfreq_list_lock){+.+.}-{3: 3}, at: devfreq_add_device+0x48c/0x654
         : which lock already depends on the new lock.
         : the existing dependency chain (in reverse order) is:
         : -> #2 (devfreq_list_lock){+.+.}-{3:3}:
         : lock_acquire+0x10c/0x380
         : __mutex_lock_common+0xe4/0xf10
         : mutex_lock_nested+0x54/0x64
         : devfreq_add_governor+0x40/0x2e4
         : register_memlat+0x68/0xd0 [governor_memlat]
         : arm_memlat_mon_driver_probe+0x2b4/0x510 [arm_memlat_mon]
.....
.....
         : -> #1 (state_lock){+.+.}-{3:3}:
         : lock_acquire+0x10c/0x380
         : __mutex_lock_common+0xe4/0xf10
         : mutex_lock_nested+0x54/0x64
         : register_memlat+0x3c/0xd0 [governor_memlat]
         : arm_memlat_mon_driver_probe+0x2b4/0x510 [arm_memlat_mon]
....
....
         : -> #0 (&cpu_grp->mons_lock){+.+.}-{3:3}:
         : validate_chain+0xa54/0x29d8
         : __lock_acquire+0x764/0xdfc
         : lock_acquire+0x10c/0x380
         : __mutex_lock_common+0xe4/0xf10
         : mutex_lock_nested+0x54/0x64
         : start_hwmon+0x74/0x3f4 [arm_memlat_mon]
         : devfreq_memlat_ev_handler+0x334/0x438 [governor_memlat]
         : devfreq_add_device+0x4d0/0x654
...
...
         : other info that might help us debug this:
         : Chain exists of:
         : &cpu_grp->mons_lock --> state_lock --> devfreq_list_lock
         : Possible unsafe locking scenario:

Bug: 194230346
Test: boot and check the warnings
Change-Id: I15abbaaae792108471b19f2c713da9ebbf9f507d
Signed-off-by: Kyle Lin <kylelin@google.com>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
…nction for Clang CFI

blake2s_compress_generic is weakly aliased by blake2s_compress. The
current harness for function selection uses a function pointer, which is
ordinarily inlined and resolved at compile time. But when Clang's CFI is
enabled, CFI still triggers when making an indirect call via a weak
symbol. This seems like a bug in Clang's CFI, as though it's bucketing
weak symbols and strong symbols differently. It also only seems to
trigger when "full LTO" mode is used, rather than "thin LTO".

[    0.000000][    T0] Kernel panic - not syncing: CFI failure (target: blake2s_compress_generic+0x0/0x1444)
[    0.000000][    T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-mainline-06981-g076c855b846e #1
[    0.000000][    T0] Hardware name: MT6873 (DT)
[    0.000000][    T0] Call trace:
[    0.000000][    T0]  dump_backtrace+0xfc/0x1dc
[    0.000000][    T0]  dump_stack_lvl+0xa8/0x11c
[    0.000000][    T0]  panic+0x194/0x464
[    0.000000][    T0]  __cfi_check_fail+0x54/0x58
[    0.000000][    T0]  __cfi_slowpath_diag+0x354/0x4b0
[    0.000000][    T0]  blake2s_update+0x14c/0x178
[    0.000000][    T0]  _extract_entropy+0xf4/0x29c
[    0.000000][    T0]  crng_initialize_primary+0x24/0x94
[    0.000000][    T0]  rand_initialize+0x2c/0x6c
[    0.000000][    T0]  start_kernel+0x2f8/0x65c
[    0.000000][    T0]  __primary_switched+0xc4/0x7be4
[    0.000000][    T0] Rebooting in 5 seconds..

Nonetheless, the function pointer method isn't so terrific anyway, so
this patch replaces it with a simple boolean, which also gets inlined
away. This successfully works around the Clang bug.

In general, I'm not too keen on all of the indirection involved here; it
clearly does more harm than good. Hopefully the whole thing can get
cleaned up down the road when lib/crypto is overhauled more
comprehensively. But for now, we go with a simple bandaid.

Fixes: 6048fdcc5f26 ("lib/crypto: blake2s: include as built-in")
Link: ClangBuiltLinux/linux#1567
Reported-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

(cherry picked from commit d2a02e3c8bb6b347818518edff5a4b40ff52d6d8)
Change-Id: I3a6461e3f05f57cd898f8a4f46e73fb0214037ef
Signed-off-by: Eric Biggers <ebiggers@google.com>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
A recent clang 16 update has introduced segfault when compiling kernel
using the `-polly-invariant-load-hoisting` flag and breaks compilation
when kernel is compiled with full LTO.
This is due to the flag being ignorant about the errors during SCoP
verification and does not take the errors into account as per a recent
issue opened at [LLVM], causing polly to do segfault and the compiler
to print the following backtrace:

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: clang -Wp,-MD,drivers/md/.md.o.d -nostdinc -isystem /tmp/cirrus-ci-build/toolchains/clang/lib/clang/16.0.0/include -I../arch/arm64/include -I./arch/arm64/include/generated -I../include -I./include -I../arch/arm64/include/uapi -I./arch/arm64/include/generated/uapi -I../include/uapi -I./include/generated/uapi -include ../include/linux/kconfig.h -include ../include/linux/compiler_types.h -I../drivers/md -Idrivers/md -D__KERNEL__ -mlittle-endian -DKASAN_SHADOW_SCALE_SHIFT=3 -Qunused-arguments -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -Werror-implicit-function-declaration -Werror=return-type -Wno-format-security -std=gnu89 --target=aarch64-linux-gnu --prefix=/tmp/cirrus-ci-build/Kernel/../toolchains/clang/bin/aarch64-linux-gnu- --gcc-toolchain=/tmp/cirrus-ci-build/toolchains/clang -Wno-misleading-indentation -Wno-bool-operation -Werror=unknown-warning-option -Wno-unsequenced -opaque-pointers -fno-PIE -mgeneral-regs-only -DCONFIG_AS_LSE=1 -fno-asynchronous-unwind-tables -Wno-psabi -DKASAN_SHADOW_SCALE_SHIFT=3 -fno-delete-null-pointer-checks -Wno-frame-address -Wno-int-in-bool-context -Wno-address-of-packed-member -O3 -march=armv8.1-a+crypto+fp16+rcpc -mtune=cortex-a53 -mllvm -polly -mllvm -polly-ast-use-context -mllvm -polly-detect-keep-going -mllvm -polly-invariant-load-hoisting -mllvm -polly-run-inliner -mllvm -polly-vectorizer=stripmine -mllvm -polly-loopfusion-greedy=1 -mllvm -polly-reschedule=1 -mllvm -polly-postopts=1 -fstack-protector-strong --target=aarch64-linux-gnu --gcc-toolchain=/tmp/cirrus-ci-build/toolchains/clang -meabi gnu -Wno-format-invalid-specifier -Wno-gnu -Wno-duplicate-decl-specifier -Wno-asm-operand-widths -Wno-initializer-overrides -Wno-tautological-constant-out-of-range-compare -Wno-tautological-compare -mno-global-merge -Wno-void-ptr-dereference -Wno-unused-but-set-variable -Wno-unused-const-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -Wvla -flto -fwhole-program-vtables -fvisibility=hidden -Wdeclaration-after-statement -Wno-pointer-sign -Wno-array-bounds -fno-strict-overflow -fno-stack-check -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -fmacro-prefix-map=../= -Wno-initializer-overrides -Wno-unused-value -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-uninitialized -Wno-pointer-to-enum-cast -Wno-unaligned-access -DKBUILD_BASENAME=\"md\" -DKBUILD_MODNAME=\"md_mod\" -D__KBUILD_MODNAME=kmod_md_mod -c -o drivers/md/md.o ../drivers/md/md.c
1.	<eof> parser at end of file
2.	Optimizer
  CC      drivers/media/platform/msm/camera_v2/camera/camera.o
  AR      drivers/media/pci/intel/ipu3/built-in.a
  CC      drivers/md/dm-linear.o
 #0 0x0000559d3527073f (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x3a7073f)
 #1 0x0000559d352705bf (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x3a705bf)
 #2 0x0000559d3523b198 (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x3a3b198)
 #3 0x0000559d3523b33e (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x3a3b33e)
 xiaomi-sm8250-devs#4 0x00007f339dc3ea00 (/usr/lib/libc.so.6+0x38a00)
 xiaomi-sm8250-devs#5 0x0000559d35affccf (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x42ffccf)
 xiaomi-sm8250-devs#6 0x0000559d35b01710 (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x4301710)
 #7 0x0000559d35b01a12 (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x4301a12)
 #8 0x0000559d35b09a9e (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x4309a9e)
 #9 0x0000559d35b14707 (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x4314707)
clang-16: error: clang frontend command failed with exit code 139 (use -v to see invocation)
Neutron clang version 16.0.0 (https://github.com/llvm/llvm-project.git 598f5275c16049b1e1b5bc934cbde447a82d485e)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /tmp/cirrus-ci-build/Kernel/../toolchains/clang/bin

From the very nature of `-polly-detect-keep-going`, it can be concluded
that this is a potentially unsafe flag and instead of using it conditionally
for CLANG < 16, just remove it all together for all. This should allow polly
to run SCoP verifications as intended.

Issue [LLVM]: llvm/llvm-project#58484 (comment)

Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: NotZeetaa <rodrigo2005contente@gmail.com>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
commit b31cb5a6eb7a48b0a7bfdf06832b1fd5088d8c79 upstream.

When disabling quotas we are deleting the quota root from the list
fs_info->dirty_cowonly_roots without taking the lock that protects it,
which is struct btrfs_fs_info::trans_lock. This unsynchronized list
manipulation may cause chaos if there's another concurrent manipulation
of this list, such as when adding a root to it with
ctree.c:add_root_to_dirty_list().

This can result in all sorts of weird failures caused by a race, such as
the following crash:

  [337571.278245] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] PREEMPT SMP PTI
  [337571.278933] CPU: 1 PID: 115447 Comm: btrfs Tainted: G        W          6.4.0-rc6-btrfs-next-134+ #1
  [337571.279153] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
  [337571.279572] RIP: 0010:commit_cowonly_roots+0x11f/0x250 [btrfs]
  [337571.279928] Code: 85 38 06 00 (...)
  [337571.280363] RSP: 0018:ffff9f63446efba0 EFLAGS: 00010206
  [337571.280582] RAX: ffff942d98ec2638 RBX: ffff9430b82b4c30 RCX: 0000000449e1c000
  [337571.280798] RDX: dead000000000100 RSI: ffff9430021e4900 RDI: 0000000000036070
  [337571.281015] RBP: ffff942d98ec2000 R08: ffff942d98ec2000 R09: 000000000000015b
  [337571.281254] R10: 0000000000000009 R11: 0000000000000001 R12: ffff942fe8fbf600
  [337571.281476] R13: ffff942dabe23040 R14: ffff942dabe20800 R15: ffff942d92cf3b48
  [337571.281723] FS:  00007f478adb7340(0000) GS:ffff94349fa40000(0000) knlGS:0000000000000000
  [337571.281950] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [337571.282184] CR2: 00007f478ab9a3d5 CR3: 000000001e02c001 CR4: 0000000000370ee0
  [337571.282416] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [337571.282647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [337571.282874] Call Trace:
  [337571.283101]  <TASK>
  [337571.283327]  ? __die_body+0x1b/0x60
  [337571.283570]  ? die_addr+0x39/0x60
  [337571.283796]  ? exc_general_protection+0x22e/0x430
  [337571.284022]  ? asm_exc_general_protection+0x22/0x30
  [337571.284251]  ? commit_cowonly_roots+0x11f/0x250 [btrfs]
  [337571.284531]  btrfs_commit_transaction+0x42e/0xf90 [btrfs]
  [337571.284803]  ? _raw_spin_unlock+0x15/0x30
  [337571.285031]  ? release_extent_buffer+0x103/0x130 [btrfs]
  [337571.285305]  reset_balance_state+0x152/0x1b0 [btrfs]
  [337571.285578]  btrfs_balance+0xa50/0x11e0 [btrfs]
  [337571.285864]  ? __kmem_cache_alloc_node+0x14a/0x410
  [337571.286086]  btrfs_ioctl+0x249a/0x3320 [btrfs]
  [337571.286358]  ? mod_objcg_state+0xd2/0x360
  [337571.286577]  ? refill_obj_stock+0xb0/0x160
  [337571.286798]  ? seq_release+0x25/0x30
  [337571.287016]  ? __rseq_handle_notify_resume+0x3ba/0x4b0
  [337571.287235]  ? percpu_counter_add_batch+0x2e/0xa0
  [337571.287455]  ? __x64_sys_ioctl+0x88/0xc0
  [337571.287675]  __x64_sys_ioctl+0x88/0xc0
  [337571.287901]  do_syscall_64+0x38/0x90
  [337571.288126]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
  [337571.288352] RIP: 0033:0x7f478aaffe9b

So fix this by locking struct btrfs_fs_info::trans_lock before deleting
the quota root from that list.

Fixes: bed92ea ("Btrfs: qgroup implementation and prototypes")
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
[ Upstream commit 2aaa8a15de73874847d62eb595c6683bface80fd ]

With some IPv6 Ext Hdr (RPL, SRv6, etc.), we can send a packet that
has the link-local address as src and dst IP and will be forwarded to
an external IP in the IPv6 Ext Hdr.

For example, the script below generates a packet whose src IP is the
link-local address and dst is updated to 11::.

  # for f in $(find /proc/sys/net/ -name *seg6_enabled*); do echo 1 > $f; done
  # python3
  >>> from socket import *
  >>> from scapy.all import *
  >>>
  >>> SRC_ADDR = DST_ADDR = "fe80::5054:ff:fe12:3456"
  >>>
  >>> pkt = IPv6(src=SRC_ADDR, dst=DST_ADDR)
  >>> pkt /= IPv6ExtHdrSegmentRouting(type=4, addresses=["11::", "22::"], segleft=1)
  >>>
  >>> sk = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW)
  >>> sk.sendto(bytes(pkt), (DST_ADDR, 0))

For such a packet, we call ip6_route_input() to look up a route for the
next destination in these three functions depending on the header type.

  * ipv6_rthdr_rcv()
  * ipv6_rpl_srh_rcv()
  * ipv6_srh_rcv()

If no route is found, ip6_null_entry is set to skb, and the following
dst_input(skb) calls ip6_pkt_drop().

Finally, in icmp6_dev(), we dereference skb_rt6_info(skb)->rt6i_idev->dev
as the input device is the loopback interface.  Then, we have to check if
skb_rt6_info(skb)->rt6i_idev is NULL or not to avoid NULL pointer deref
for ip6_null_entry.

BUG: kernel NULL pointer dereference, address: 0000000000000000
 PF: supervisor read access in kernel mode
 PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 157 Comm: python3 Not tainted 6.4.0-11996-gb121d614371c #35
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:icmp6_send (net/ipv6/icmp.c:436 net/ipv6/icmp.c:503)
Code: fe ff ff 48 c7 40 30 c0 86 5d 83 e8 c6 44 1c 00 e9 c8 fc ff ff 49 8b 46 58 48 83 e0 fe 0f 84 4a fb ff ff 48 8b 80 d0 00 00 00 <48> 8b 00 44 8b 88 e0 00 00 00 e9 34 fb ff ff 4d 85 ed 0f 85 69 01
RSP: 0018:ffffc90000003c70 EFLAGS: 00000286
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000e0
RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff888006d72a18
RBP: ffffc90000003d80 R08: 0000000000000000 R09: 0000000000000001
R10: ffffc90000003d98 R11: 0000000000000040 R12: ffff888006d72a10
R13: 0000000000000000 R14: ffff8880057fb800 R15: ffffffff835d86c0
FS:  00007f9dc72ee740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000057b2000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
 <IRQ>
 ip6_pkt_drop (net/ipv6/route.c:4513)
 ipv6_rthdr_rcv (net/ipv6/exthdrs.c:640 net/ipv6/exthdrs.c:686)
 ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:437 (discriminator 5))
 ip6_input_finish (./include/linux/rcupdate.h:781 net/ipv6/ip6_input.c:483)
 __netif_receive_skb_one_core (net/core/dev.c:5455)
 process_backlog (./include/linux/rcupdate.h:781 net/core/dev.c:5895)
 __napi_poll (net/core/dev.c:6460)
 net_rx_action (net/core/dev.c:6529 net/core/dev.c:6660)
 __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:554)
 do_softirq (kernel/softirq.c:454 kernel/softirq.c:441)
 </IRQ>
 <TASK>
 __local_bh_enable_ip (kernel/softirq.c:381)
 __dev_queue_xmit (net/core/dev.c:4231)
 ip6_finish_output2 (./include/net/neighbour.h:544 net/ipv6/ip6_output.c:135)
 rawv6_sendmsg (./include/net/dst.h:458 ./include/linux/netfilter.h:303 net/ipv6/raw.c:656 net/ipv6/raw.c:914)
 sock_sendmsg (net/socket.c:725 net/socket.c:748)
 __sys_sendto (net/socket.c:2134)
 __x64_sys_sendto (net/socket.c:2146 net/socket.c:2142 net/socket.c:2142)
 do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
RIP: 0033:0x7f9dc751baea
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
RSP: 002b:00007ffe98712c38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007ffe98712cf8 RCX: 00007f9dc751baea
RDX: 0000000000000060 RSI: 00007f9dc6460b90 RDI: 0000000000000003
RBP: 00007f9dc56e8be0 R08: 00007ffe98712d70 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f9dc6af5d1b
 </TASK>
Modules linked in:
CR2: 0000000000000000
 ---[ end trace 0000000000000000 ]---
RIP: 0010:icmp6_send (net/ipv6/icmp.c:436 net/ipv6/icmp.c:503)
Code: fe ff ff 48 c7 40 30 c0 86 5d 83 e8 c6 44 1c 00 e9 c8 fc ff ff 49 8b 46 58 48 83 e0 fe 0f 84 4a fb ff ff 48 8b 80 d0 00 00 00 <48> 8b 00 44 8b 88 e0 00 00 00 e9 34 fb ff ff 4d 85 ed 0f 85 69 01
RSP: 0018:ffffc90000003c70 EFLAGS: 00000286
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000e0
RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff888006d72a18
RBP: ffffc90000003d80 R08: 0000000000000000 R09: 0000000000000001
R10: ffffc90000003d98 R11: 0000000000000040 R12: ffff888006d72a10
R13: 0000000000000000 R14: ffff8880057fb800 R15: ffffffff835d86c0
FS:  00007f9dc72ee740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000057b2000 CR4: 00000000007506f0
PKRU: 55555554
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled

Fixes: 4832c30 ("net: ipv6: put host and anycast routes on device with address")
Reported-by: Wang Yufen <wangyufen@huawei.com>
Closes: https://lore.kernel.org/netdev/c41403a9-c2f6-3b7e-0c96-e1901e605cd0@huawei.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
[ Upstream commit 0dd37d6dd33a9c23351e6115ae8cdac7863bc7de ]

We've run into the case that the balancer tries to balance a migration
disabled task and trigger the warning in set_task_cpu() like below:

 ------------[ cut here ]------------
 WARNING: CPU: 7 PID: 0 at kernel/sched/core.c:3115 set_task_cpu+0x188/0x240
 Modules linked in: hclgevf xt_CHECKSUM ipt_REJECT nf_reject_ipv4 <...snip>
 CPU: 7 PID: 0 Comm: swapper/7 Kdump: loaded Tainted: G           O       6.1.0-rc4+ #1
 Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V5.B221.01 12/09/2021
 pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : set_task_cpu+0x188/0x240
 lr : load_balance+0x5d0/0xc60
 sp : ffff80000803bc70
 x29: ffff80000803bc70 x28: ffff004089e190e8 x27: ffff004089e19040
 x26: ffff007effcabc38 x25: 0000000000000000 x24: 0000000000000001
 x23: ffff80000803be84 x22: 000000000000000c x21: ffffb093e79e2a78
 x20: 000000000000000c x19: ffff004089e19040 x18: 0000000000000000
 x17: 0000000000001fad x16: 0000000000000030 x15: 0000000000000000
 x14: 0000000000000003 x13: 0000000000000000 x12: 0000000000000000
 x11: 0000000000000001 x10: 0000000000000400 x9 : ffffb093e4cee530
 x8 : 00000000fffffffe x7 : 0000000000ce168a x6 : 000000000000013e
 x5 : 00000000ffffffe1 x4 : 0000000000000001 x3 : 0000000000000b2a
 x2 : 0000000000000b2a x1 : ffffb093e6d6c510 x0 : 0000000000000001
 Call trace:
  set_task_cpu+0x188/0x240
  load_balance+0x5d0/0xc60
  rebalance_domains+0x26c/0x380
  _nohz_idle_balance.isra.0+0x1e0/0x370
  run_rebalance_domains+0x6c/0x80
  __do_softirq+0x128/0x3d8
  ____do_softirq+0x18/0x24
  call_on_irq_stack+0x2c/0x38
  do_softirq_own_stack+0x24/0x3c
  __irq_exit_rcu+0xcc/0xf4
  irq_exit_rcu+0x18/0x24
  el1_interrupt+0x4c/0xe4
  el1h_64_irq_handler+0x18/0x2c
  el1h_64_irq+0x74/0x78
  arch_cpu_idle+0x18/0x4c
  default_idle_call+0x58/0x194
  do_idle+0x244/0x2b0
  cpu_startup_entry+0x30/0x3c
  secondary_start_kernel+0x14c/0x190
  __secondary_switched+0xb0/0xb4
 ---[ end trace 0000000000000000 ]---

Further investigation shows that the warning is superfluous, the migration
disabled task is just going to be migrated to its current running CPU.
This is because that on load balance if the dst_cpu is not allowed by the
task, we'll re-select a new_dst_cpu as a candidate. If no task can be
balanced to dst_cpu we'll try to balance the task to the new_dst_cpu
instead. In this case when the migration disabled task is not on CPU it
only allows to run on its current CPU, load balance will select its
current CPU as new_dst_cpu and later triggers the warning above.

The new_dst_cpu is chosen from the env->dst_grpmask. Currently it
contains CPUs in sched_group_span() and if we have overlapped groups it's
possible to run into this case. This patch makes env->dst_grpmask of
group_balance_mask() which exclude any CPUs from the busiest group and
solve the issue. For balancing in a domain with no overlapped groups
the behaviour keeps same as before.

Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20230530082507.10444-1-yangyicong@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
[ Upstream commit 314c82841602a111c04a7210c21dc77e0d560242 ]

Can be called via nft set element list iteration, which may acquire
rcu and/or bh read lock (depends on set type).

BUG: sleeping function called from invalid context at net/netfilter/nf_tables_api.c:3353
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1232, name: nft
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
2 locks held by nft/1232:
 #0: ffff8881180e3ea8 (&nft_net->commit_mutex){+.+.}-{3:3}, at: nf_tables_valid_genid
 #1: ffffffff83f5f540 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire
Call Trace:
 nft_chain_validate
 nft_lookup_validate_setelem
 nft_pipapo_walk
 nft_lookup_validate
 nft_chain_validate
 nft_immediate_validate
 nft_chain_validate
 nf_tables_validate
 nf_tables_abort

No choice but to move it to nf_tables_validate().

Fixes: 81ea01066741 ("netfilter: nf_tables: add rescheduling points during loop detection walks")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
[ Upstream commit 30e0191b16e8a58e4620fa3e2839ddc7b9d4281c ]

skbuff: skb_under_panic: text:ffffffff88771f69 len:56 put:-4
 head:ffff88805f86a800 data:ffff887f5f86a850 tail:0x88 end:0x2c0 dev:pim6reg
 ------------[ cut here ]------------
 kernel BUG at net/core/skbuff.c:192!
 invalid opcode: 0000 [#1] PREEMPT SMP KASAN
 CPU: 2 PID: 22968 Comm: kworker/2:11 Not tainted 6.5.0-rc3-00044-g0a8db05b571a #236
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:skb_panic+0x152/0x1d0
 Call Trace:
  <TASK>
  skb_push+0xc4/0xe0
  ip6mr_cache_report+0xd69/0x19b0
  reg_vif_xmit+0x406/0x690
  dev_hard_start_xmit+0x17e/0x6e0
  __dev_queue_xmit+0x2d6a/0x3d20
  vlan_dev_hard_start_xmit+0x3ab/0x5c0
  dev_hard_start_xmit+0x17e/0x6e0
  __dev_queue_xmit+0x2d6a/0x3d20
  neigh_connected_output+0x3ed/0x570
  ip6_finish_output2+0x5b5/0x1950
  ip6_finish_output+0x693/0x11c0
  ip6_output+0x24b/0x880
  NF_HOOK.constprop.0+0xfd/0x530
  ndisc_send_skb+0x9db/0x1400
  ndisc_send_rs+0x12a/0x6c0
  addrconf_dad_completed+0x3c9/0xea0
  addrconf_dad_work+0x849/0x1420
  process_one_work+0xa22/0x16e0
  worker_thread+0x679/0x10c0
  ret_from_fork+0x28/0x60
  ret_from_fork_asm+0x11/0x20

When setup a vlan device on dev pim6reg, DAD ns packet may sent on reg_vif_xmit().
reg_vif_xmit()
    ip6mr_cache_report()
        skb_push(skb, -skb_network_offset(pkt));//skb_network_offset(pkt) is 4
And skb_push declared as:
	void *skb_push(struct sk_buff *skb, unsigned int len);
		skb->data -= len;
		//0xffff88805f86a84c - 0xfffffffc = 0xffff887f5f86a850
skb->data is set to 0xffff887f5f86a850, which is invalid mem addr, lead to skb_push() fails.

Fixes: 14fb64e ("[IPV6] MROUTE: Support PIM-SM (SSM).")
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
[ Upstream commit 3c40eb8145325b0f5b93b8a169146078cb2c49d6 ]

The call to mmc_request_done() can schedule, so it must not be called
from irq context. Wake the irq thread if it needs to be called, and let
its existing logic do its work.

Fixes the following kernel bug, which appears when running an RT patched
kernel on the AmLogic Meson AXG A113X SoC:
[   11.111407] BUG: scheduling while atomic: kworker/0:1H/75/0x00010001
[   11.111438] Modules linked in:
[   11.111451] CPU: 0 PID: 75 Comm: kworker/0:1H Not tainted 6.4.0-rc3-rt2-rtx-00081-gfd07f41ed6b4-dirty #1
[   11.111461] Hardware name: RTX AXG A113X Linux Platform Board (DT)
[   11.111469] Workqueue: kblockd blk_mq_run_work_fn
[   11.111492] Call trace:
[   11.111497]  dump_backtrace+0xac/0xe8
[   11.111510]  show_stack+0x18/0x28
[   11.111518]  dump_stack_lvl+0x48/0x60
[   11.111530]  dump_stack+0x18/0x24
[   11.111537]  __schedule_bug+0x4c/0x68
[   11.111548]  __schedule+0x80/0x574
[   11.111558]  schedule_loop+0x2c/0x50
[   11.111567]  schedule_rtlock+0x14/0x20
[   11.111576]  rtlock_slowlock_locked+0x468/0x730
[   11.111587]  rt_spin_lock+0x40/0x64
[   11.111596]  __wake_up_common_lock+0x5c/0xc4
[   11.111610]  __wake_up+0x18/0x24
[   11.111620]  mmc_blk_mq_req_done+0x68/0x138
[   11.111633]  mmc_request_done+0x104/0x118
[   11.111644]  meson_mmc_request_done+0x38/0x48
[   11.111654]  meson_mmc_irq+0x128/0x1f0
[   11.111663]  __handle_irq_event_percpu+0x70/0x114
[   11.111674]  handle_irq_event_percpu+0x18/0x4c
[   11.111683]  handle_irq_event+0x80/0xb8
[   11.111691]  handle_fasteoi_irq+0xa4/0x120
[   11.111704]  handle_irq_desc+0x20/0x38
[   11.111712]  generic_handle_domain_irq+0x1c/0x28
[   11.111721]  gic_handle_irq+0x8c/0xa8
[   11.111735]  call_on_irq_stack+0x24/0x4c
[   11.111746]  do_interrupt_handler+0x88/0x94
[   11.111757]  el1_interrupt+0x34/0x64
[   11.111769]  el1h_64_irq_handler+0x18/0x24
[   11.111779]  el1h_64_irq+0x64/0x68
[   11.111786]  __add_wait_queue+0x0/0x4c
[   11.111795]  mmc_blk_rw_wait+0x84/0x118
[   11.111804]  mmc_blk_mq_issue_rq+0x5c4/0x654
[   11.111814]  mmc_mq_queue_rq+0x194/0x214
[   11.111822]  blk_mq_dispatch_rq_list+0x3ac/0x528
[   11.111834]  __blk_mq_sched_dispatch_requests+0x340/0x4d0
[   11.111847]  blk_mq_sched_dispatch_requests+0x38/0x70
[   11.111858]  blk_mq_run_work_fn+0x3c/0x70
[   11.111865]  process_one_work+0x17c/0x1f0
[   11.111876]  worker_thread+0x1d4/0x26c
[   11.111885]  kthread+0xe4/0xf4
[   11.111894]  ret_from_fork+0x10/0x20

Fixes: 51c5d84 ("MMC: meson: initial support for GX platforms")
Cc: stable@vger.kernel.org
Signed-off-by: Martin Hundebøll <martin@geanix.com>
Link: https://lore.kernel.org/r/20230607082713.517157-1-martin@geanix.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
commit 4f3175979e62de3b929bfa54a0db4b87d36257a7 upstream.

With hardened usercopy enabled (CONFIG_HARDENED_USERCOPY=y), using the
/proc/powerpc/rtas/firmware_update interface to prepare a system
firmware update yields a BUG():

  kernel BUG at mm/usercopy.c:102!
  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in:
  CPU: 0 PID: 2232 Comm: dd Not tainted 6.5.0-rc3+ #2
  Hardware name: IBM,8408-E8E POWER8E (raw) 0x4b0201 0xf000004 of:IBM,FW860.50 (SV860_146) hv:phyp pSeries
  NIP:  c0000000005991d0 LR: c0000000005991cc CTR: 0000000000000000
  REGS: c0000000148c76a0 TRAP: 0700   Not tainted  (6.5.0-rc3+)
  MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24002242  XER: 0000000c
  CFAR: c0000000001fbd34 IRQMASK: 0
  [ ... GPRs omitted ... ]
  NIP usercopy_abort+0xa0/0xb0
  LR  usercopy_abort+0x9c/0xb0
  Call Trace:
    usercopy_abort+0x9c/0xb0 (unreliable)
    __check_heap_object+0x1b4/0x1d0
    __check_object_size+0x2d0/0x380
    rtas_flash_write+0xe4/0x250
    proc_reg_write+0xfc/0x160
    vfs_write+0xfc/0x4e0
    ksys_write+0x90/0x160
    system_call_exception+0x178/0x320
    system_call_common+0x160/0x2c4

The blocks of the firmware image are copied directly from user memory
to objects allocated from flash_block_cache, so flash_block_cache must
be created using kmem_cache_create_usercopy() to mark it safe for user
access.

Fixes: 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
[mpe: Trim and indent oops]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230810-rtas-flash-vs-hardened-usercopy-v2-1-dcf63793a938@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
[ Upstream commit 00374d9b6d9f932802b55181be9831aa948e5b7c ]

Normally, x->replay_esn and x->preplay_esn should be allocated at
xfrm_alloc_replay_state_esn(...) in xfrm_state_construct(...), hence the
xfrm_update_ae_params(...) is okay to update them. However, the current
implementation of xfrm_new_ae(...) allows a malicious user to directly
dereference a NULL pointer and crash the kernel like below.

BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 8253067 P4D 8253067 PUD 8e0e067 PMD 0
Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 PID: 98 Comm: poc.npd Not tainted 6.4.0-rc7-00072-gdad9774deaf1 #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.o4
RIP: 0010:memcpy_orig+0xad/0x140
Code: e8 4c 89 5f e0 48 8d 7f e0 73 d2 83 c2 20 48 29 d6 48 29 d7 83 fa 10 72 34 4c 8b 06 4c 8b 4e 08 c
RSP: 0018:ffff888008f57658 EFLAGS: 00000202
RAX: 0000000000000000 RBX: ffff888008bd0000 RCX: ffffffff8238e571
RDX: 0000000000000018 RSI: ffff888007f64844 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff888008f57818
R13: ffff888007f64aa4 R14: 0000000000000000 R15: 0000000000000000
FS:  00000000014013c0(0000) GS:ffff88806d600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000054d8000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 ? __die+0x1f/0x70
 ? page_fault_oops+0x1e8/0x500
 ? __pfx_is_prefetch.constprop.0+0x10/0x10
 ? __pfx_page_fault_oops+0x10/0x10
 ? _raw_spin_unlock_irqrestore+0x11/0x40
 ? fixup_exception+0x36/0x460
 ? _raw_spin_unlock_irqrestore+0x11/0x40
 ? exc_page_fault+0x5e/0xc0
 ? asm_exc_page_fault+0x26/0x30
 ? xfrm_update_ae_params+0xd1/0x260
 ? memcpy_orig+0xad/0x140
 ? __pfx__raw_spin_lock_bh+0x10/0x10
 xfrm_update_ae_params+0xe7/0x260
 xfrm_new_ae+0x298/0x4e0
 ? __pfx_xfrm_new_ae+0x10/0x10
 ? __pfx_xfrm_new_ae+0x10/0x10
 xfrm_user_rcv_msg+0x25a/0x410
 ? __pfx_xfrm_user_rcv_msg+0x10/0x10
 ? __alloc_skb+0xcf/0x210
 ? stack_trace_save+0x90/0xd0
 ? filter_irq_stacks+0x1c/0x70
 ? __stack_depot_save+0x39/0x4e0
 ? __kasan_slab_free+0x10a/0x190
 ? kmem_cache_free+0x9c/0x340
 ? netlink_recvmsg+0x23c/0x660
 ? sock_recvmsg+0xeb/0xf0
 ? __sys_recvfrom+0x13c/0x1f0
 ? __x64_sys_recvfrom+0x71/0x90
 ? do_syscall_64+0x3f/0x90
 ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
 ? copyout+0x3e/0x50
 netlink_rcv_skb+0xd6/0x210
 ? __pfx_xfrm_user_rcv_msg+0x10/0x10
 ? __pfx_netlink_rcv_skb+0x10/0x10
 ? __pfx_sock_has_perm+0x10/0x10
 ? mutex_lock+0x8d/0xe0
 ? __pfx_mutex_lock+0x10/0x10
 xfrm_netlink_rcv+0x44/0x50
 netlink_unicast+0x36f/0x4c0
 ? __pfx_netlink_unicast+0x10/0x10
 ? netlink_recvmsg+0x500/0x660
 netlink_sendmsg+0x3b7/0x700

This Null-ptr-deref bug is assigned CVE-2023-3772. And this commit
adds additional NULL check in xfrm_update_ae_params to fix the NPD.

Fixes: d8647b7 ("xfrm: Add user interface for esn and big anti-replay windows")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
[ Upstream commit b616be6b97688f2f2bd7c4a47ab32f27f94fb2a9 ]

One missing check in virtio_net_hdr_to_skb() allowed
syzbot to crash kernels again [1]

Do not allow gso_size to be set to GSO_BY_FRAGS (0xffff),
because this magic value is used by the kernel.

[1]
general protection fault, probably for non-canonical address 0xdffffc000000000e: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077]
CPU: 0 PID: 5039 Comm: syz-executor401 Not tainted 6.5.0-rc5-next-20230809-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
RIP: 0010:skb_segment+0x1a52/0x3ef0 net/core/skbuff.c:4500
Code: 00 00 00 e9 ab eb ff ff e8 6b 96 5d f9 48 8b 84 24 00 01 00 00 48 8d 78 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e ea 21 00 00 48 8b 84 24 00 01
RSP: 0018:ffffc90003d3f1c8 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 000000000001fffe RCX: 0000000000000000
RDX: 000000000000000e RSI: ffffffff882a3115 RDI: 0000000000000070
RBP: ffffc90003d3f378 R08: 0000000000000005 R09: 000000000000ffff
R10: 000000000000ffff R11: 5ee4a93e456187d6 R12: 000000000001ffc6
R13: dffffc0000000000 R14: 0000000000000008 R15: 000000000000ffff
FS: 00005555563f2380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020020000 CR3: 000000001626d000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
udp6_ufo_fragment+0x9d2/0xd50 net/ipv6/udp_offload.c:109
ipv6_gso_segment+0x5c4/0x17b0 net/ipv6/ip6_offload.c:120
skb_mac_gso_segment+0x292/0x610 net/core/gso.c:53
__skb_gso_segment+0x339/0x710 net/core/gso.c:124
skb_gso_segment include/net/gso.h:83 [inline]
validate_xmit_skb+0x3a5/0xf10 net/core/dev.c:3625
__dev_queue_xmit+0x8f0/0x3d60 net/core/dev.c:4329
dev_queue_xmit include/linux/netdevice.h:3082 [inline]
packet_xmit+0x257/0x380 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3087 [inline]
packet_sendmsg+0x24c7/0x5570 net/packet/af_packet.c:3119
sock_sendmsg_nosec net/socket.c:727 [inline]
sock_sendmsg+0xd9/0x180 net/socket.c:750
____sys_sendmsg+0x6ac/0x940 net/socket.c:2496
___sys_sendmsg+0x135/0x1d0 net/socket.c:2550
__sys_sendmsg+0x117/0x1e0 net/socket.c:2579
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7ff27cdb34d9

Fixes: 3953c46 ("sk_buff: allow segmenting based on frag sizes")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Xin Long <lucien.xin@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230816142158.1779798-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
[ Upstream commit c1f848f12103920ca165758aedb1c10904e193e1 ]

When the tdm lane mask is computed, the driver currently fills the 1st lane
before moving on to the next. If the stream has less channels than the
lanes can accommodate, slots will be disabled on the last lanes.

Unfortunately, the HW distribute channels in a different way. It distribute
channels in pair on each lanes before moving on the next slots.

This difference leads to problems if a device has an interface with more
than 1 lane and with more than 2 slots per lane.

For example: a playback interface with 2 lanes and 4 slots each (total 8
slots - zero based numbering)
- Playing a 8ch stream:
  - All slots activated by the driver
  - channel #2 will be played on lane #1 - slot #0 following HW placement
- Playing a 4ch stream:
  - Lane #1 disabled by the driver
  - channel #2 will be played on lane #0 - slot #2

This behaviour is obviously not desirable.

Change the way slots are activated on the TDM lanes to follow what the HW
does and make sure each channel always get mapped to the same slot/lane.

Fixes: 1a11d88 ("ASoC: meson: add tdm formatter base driver")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20230809171931.1244502-1-jbrunet@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
Bing-Jhong Billy Jheng reported null-ptr-deref in unix_stream_sendpage()
with detailed analysis and a nice repro.

unix_stream_sendpage() tries to add data to the last skb in the peer's
recv queue without locking the queue.

If the peer's FD is passed to another socket and the socket's FD is
passed to the peer, there is a loop between them.  If we close both
sockets without receiving FD, the sockets will be cleaned up by garbage
collection.

The garbage collection iterates such sockets and unlinks skb with
FD from the socket's receive queue under the queue's lock.

So, there is a race where unix_stream_sendpage() could access an skb
locklessly that is being released by garbage collection, resulting in
use-after-free.

To avoid the issue, unix_stream_sendpage() must lock the peer's recv
queue.

Note the issue does not exist in 6.5+ thanks to the recent sendpage()
refactoring.

This patch is originally written by Linus Torvalds.

BUG: unable to handle page fault for address: ffff988004dd6870
PF: supervisor read access in kernel mode
PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
PREEMPT SMP PTI
CPU: 4 PID: 297 Comm: garbage_uaf Not tainted 6.1.46 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:kmem_cache_alloc_node+0xa2/0x1e0
Code: c0 0f 84 32 01 00 00 41 83 fd ff 74 10 48 8b 00 48 c1 e8 3a 41 39 c5 0f 85 1c 01 00 00 41 8b 44 24 28 49 8b 3c 24 48 8d 4a 40 <49> 8b 1c 06 4c 89 f0 65 48 0f c7 0f 0f 94 c0 84 c0 74 a1 41 8b 44
RSP: 0018:ffffc9000079fac0 EFLAGS: 00000246
RAX: 0000000000000070 RBX: 0000000000000005 RCX: 000000000001a284
RDX: 000000000001a244 RSI: 0000000000400cc0 RDI: 000000000002eee0
RBP: 0000000000400cc0 R08: 0000000000400cc0 R09: 0000000000000003
R10: 0000000000000001 R11: 0000000000000000 R12: ffff888003970f00
R13: 00000000ffffffff R14: ffff988004dd6800 R15: 00000000000000e8
FS:  00007f174d6f3600(0000) GS:ffff88807db00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff988004dd6870 CR3: 00000000092be000 CR4: 00000000007506e0
PKRU: 55555554
Call Trace:
 <TASK>
 ? __die_body.cold+0x1a/0x1f
 ? page_fault_oops+0xa9/0x1e0
 ? fixup_exception+0x1d/0x310
 ? exc_page_fault+0xa8/0x150
 ? asm_exc_page_fault+0x22/0x30
 ? kmem_cache_alloc_node+0xa2/0x1e0
 ? __alloc_skb+0x16c/0x1e0
 __alloc_skb+0x16c/0x1e0
 alloc_skb_with_frags+0x48/0x1e0
 sock_alloc_send_pskb+0x234/0x270
 unix_stream_sendmsg+0x1f5/0x690
 sock_sendmsg+0x5d/0x60
 ____sys_sendmsg+0x210/0x260
 ___sys_sendmsg+0x83/0xd0
 ? kmem_cache_alloc+0xc6/0x1c0
 ? avc_disable+0x20/0x20
 ? percpu_counter_add_batch+0x53/0xc0
 ? alloc_empty_file+0x5d/0xb0
 ? alloc_file+0x91/0x170
 ? alloc_file_pseudo+0x94/0x100
 ? __fget_light+0x9f/0x120
 __sys_sendmsg+0x54/0xa0
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x69/0xd3
RIP: 0033:0x7f174d639a7d
Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 8a c1 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 de c1 f4 ff 48
RSP: 002b:00007ffcb563ea50 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f174d639a7d
RDX: 0000000000000000 RSI: 00007ffcb563eab0 RDI: 0000000000000007
RBP: 00007ffcb563eb10 R08: 0000000000000000 R09: 00000000ffffffff
R10: 00000000004040a0 R11: 0000000000000293 R12: 00007ffcb563ec28
R13: 0000000000401398 R14: 0000000000403e00 R15: 00007f174d72c000
 </TASK>

Fixes: 869e7c6 ("net: af_unix: implement stream sendpage support")
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Reviewed-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
[ Upstream commit b888c510f7b3d64ca75fc0f43b4a4bd1a611312f ]

If ptp_clock_register() fails or CONFIG_PTP isn't enabled, avoid starting
PTP related workqueues.

In this way we can fix this:
 BUG: unable to handle page fault for address: ffffc9000440b6f8
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 100000067 P4D 100000067 PUD 1001e0067 PMD 107dc5067 PTE 0
 Oops: 0000 [#1] PREEMPT SMP
 [...]
 Workqueue: events igb_ptp_overflow_check
 RIP: 0010:igb_rd32+0x1f/0x60
 [...]
 Call Trace:
  igb_ptp_read_82580+0x20/0x50
  timecounter_read+0x15/0x60
  igb_ptp_overflow_check+0x1a/0x50
  process_one_work+0x1cb/0x3c0
  worker_thread+0x53/0x3f0
  ? rescuer_thread+0x370/0x370
  kthread+0x142/0x160
  ? kthread_associate_blkcg+0xc0/0xc0
  ret_from_fork+0x1f/0x30

Fixes: 1f6e817 ("igb: Prevent dropped Tx timestamps via work items and interrupts.")
Fixes: d339b13 ("igb: add PTP Hardware Clock code")
Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230821171927.2203644-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
goodmeow pushed a commit that referenced this pull request Sep 7, 2023
commit eac27a41ab641de074655d2932fc7f8cdb446881 upstream.

If received skb in batadv_v_elp_packet_recv or batadv_v_ogm_packet_recv
is either cloned or non linearized then its data buffer will be
reallocated by batadv_check_management_packet when skb_cow or
skb_linearize get called. Thus geting ethernet header address inside
skb data buffer before batadv_check_management_packet had any chance to
reallocate it could lead to the following kernel panic:

  Unable to handle kernel paging request at virtual address ffffff8020ab069a
  Mem abort info:
    ESR = 0x96000007
    EC = 0x25: DABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
    FSC = 0x07: level 3 translation fault
  Data abort info:
    ISV = 0, ISS = 0x00000007
    CM = 0, WnR = 0
  swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000040f45000
  [ffffff8020ab069a] pgd=180000007fffa003, p4d=180000007fffa003, pud=180000007fffa003, pmd=180000007fefe003, pte=0068000020ab0706
  Internal error: Oops: 96000007 [#1] SMP
  Modules linked in: ahci_mvebu libahci_platform libahci dvb_usb_af9035 dvb_usb_dib0700 dib0070 dib7000m dibx000_common ath11k_pci ath10k_pci ath10k_core mwl8k_new nf_nat_sip nf_conntrack_sip xhci_plat_hcd xhci_hcd nf_nat_pptp nf_conntrack_pptp at24 sbsa_gwdt
  CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.15.42-00066-g3242268d425c-dirty #550
  Hardware name: A8k (DT)
  pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : batadv_is_my_mac+0x60/0xc0
  lr : batadv_v_ogm_packet_recv+0x98/0x5d0
  sp : ffffff8000183820
  x29: ffffff8000183820 x28: 0000000000000001 x27: ffffff8014f9af00
  x26: 0000000000000000 x25: 0000000000000543 x24: 0000000000000003
  x23: ffffff8020ab0580 x22: 0000000000000110 x21: ffffff80168ae880
  x20: 0000000000000000 x19: ffffff800b561000 x18: 0000000000000000
  x17: 0000000000000000 x16: 0000000000000000 x15: 00dc098924ae0032
  x14: 0f0405433e0054b0 x13: ffffffff00000080 x12: 0000004000000001
  x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
  x8 : 0000000000000000 x7 : ffffffc076dae000 x6 : ffffff8000183700
  x5 : ffffffc00955e698 x4 : ffffff80168ae000 x3 : ffffff80059cf000
  x2 : ffffff800b561000 x1 : ffffff8020ab0696 x0 : ffffff80168ae880
  Call trace:
   batadv_is_my_mac+0x60/0xc0
   batadv_v_ogm_packet_recv+0x98/0x5d0
   batadv_batman_skb_recv+0x1b8/0x244
   __netif_receive_skb_core.isra.0+0x440/0xc74
   __netif_receive_skb_one_core+0x14/0x20
   netif_receive_skb+0x68/0x140
   br_pass_frame_up+0x70/0x80
   br_handle_frame_finish+0x108/0x284
   br_handle_frame+0x190/0x250
   __netif_receive_skb_core.isra.0+0x240/0xc74
   __netif_receive_skb_list_core+0x6c/0x90
   netif_receive_skb_list_internal+0x1f4/0x310
   napi_complete_done+0x64/0x1d0
   gro_cell_poll+0x7c/0xa0
   __napi_poll+0x34/0x174
   net_rx_action+0xf8/0x2a0
   _stext+0x12c/0x2ac
   run_ksoftirqd+0x4c/0x7c
   smpboot_thread_fn+0x120/0x210
   kthread+0x140/0x150
   ret_from_fork+0x10/0x20
  Code: f9403844 eb03009f 54fffee1 f94

Thus ethernet header address should only be fetched after
batadv_check_management_packet has been called.

Fixes: 0da0035 ("batman-adv: OGMv2 - add basic infrastructure")
Cc: stable@vger.kernel.org
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant