Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge/sound upstream 20230313 (v6.3-rc1 - try 2) #4241

Closed

Conversation

ujfalusi
Copy link
Collaborator

Another try...

picked https://lore.kernel.org/lkml/20230312200731.599706-3-masahiroy@kernel.org/ on top to test in CI also.

rddunlap and others added 30 commits March 1, 2023 19:32
REGMAP is a hidden (not user visible) symbol. Users cannot set it
directly thru "make *config", so drivers should select it instead of
depending on it if they need it.

Consistently using "select" or "depends on" can also help reduce
Kconfig circular dependency issues.

Therefore, change the use of "depends on REGMAP" to "select REGMAP".

Fixes: b474303 ("thermal: add Intel BXT WhiskeyCove PMIC thermal driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The of_iomap() function returns NULL if it fails.  It never returns
error pointers.  Fix the check accordingly.

Fixes: 6286bbb ("cpufreq: apple-soc: Add new driver to control Apple SoC CPU P-states")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Eric Curtin <ecurtin@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The argument to do_div() is a 32-bit integer, and it was read from a
32-bit register so there is no point in doing a 64-bit division on it.

On 32-bit arm, do_div() causes a compile-time warning here:

    include/asm-generic/div64.h:238:22: error: passing argument 1 of '__div64_32' from incompatible pointer type [-Werror=incompatible-pointer-types]
      238 |   __rem = __div64_32(&(n), __base); \
          |                      ^~~~
          |                      |
          |                      unsigned int *
    drivers/power/supply/qcom_battmgr.c:1130:4: note: in expansion of macro 'do_div'
     1130 |    do_div(battmgr->status.percent, 100);

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit f05f62d ("s390/vmem: get rid of memory segment list")
reshuffled the call to vmem_add_mapping() in __segment_load(), which now
overwrites rc after it was set to contain the segment type code.

As result, __segment_load() will now always return 0 on success, which
corresponds to the segment type code SEG_TYPE_SW, i.e. a writeable
segment. This results in a kernel crash when loading a read-only segment
as dcssblk block device, and trying to write to it.

Instead of reshuffling code again, make sure to return the segment type
on success, and also describe this rather delicate and unexpected logic
in the function comment. Also initialize new segtype variable with
invalid value, to prevent possible future confusion.

Fixes: f05f62d ("s390/vmem: get rid of memory segment list")
Cc: <stable@vger.kernel.org> # 5.9+
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Keep the config S390 select list sorted.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
…it/cel/linux

Pull nfsd fix from Chuck Lever:

 - Make new GSS Kerberos Kunit tests work on non-x86 platforms

* tag 'nfsd-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Properly terminate test case arrays
  SUNRPC: Let Kunit tests run with some enctypes compiled out
Applications need to be able to program the SBI implementation specific
or custom firmware events in addition to the standard firmware events.
Remove a check in the driver that prohibits the programming of the custom
firmware events.

Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230208074314.3661406-1-mchitale@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The xas_for_each loops added into fs/cifs/file.c need to go round again if
indicated by xas_retry().

Fixes: b8713c4 ("cifs: Add some helper functions")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Tom Talpey <tom@talpey.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Fix an uninitialised variable introduced in cifs.

Fixes: 3d78fe7 ("cifs: Build the RDMA SGE list directly from an iterator")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Tom Talpey <tom@talpey.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-rdma@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
match_address function matches the scope id for ipv6 addresses,
but cifs_match_ipaddr (which is another function used for comparison)
does not use scope id. Doing so with this change.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
We have two pieces of code that does pretty much the same
comparison. This change reuses cifs_match_ipaddr within
match_address.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
cifs_write_back_from_locked_folio() should return the number of bytes read,
but returns the result of ->async_writev(), which will be 0 on success.  As
it happens, this doesn't prevent cifs_writepages_region() from working as
it will then examine and ignore the pages that are no longer dirty rather
than just skipping over them.

Fixes: d08089f ("cifs: Change the I/O paths to use an iterator rather than a page list")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Tom Talpey <tom@talpey.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Fix the loop check in netfs_extract_user_to_sg() for extraction from
user-backed iterators to do the body if npages > 0, not if npages < 0
(which it can never be).

This isn't currently used by cifs, which only ever extracts data from BVEC,
KVEC and XARRAY iterators at this level, user-backed iterators having being
decanted into BVEC iterators at a higher level to accommodate the work
being done in a kernel thread.

Found by smatch:
	fs/netfs/iterator.c:139 netfs_extract_user_to_sg() warn: unsigned 'npages' is never less than zero.

Fixes: 0185846 ("netfs: Add a function to extract an iterator into a scatterlist")
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202302261115.P3TQi1ZO-lkp@intel.com/
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/Y/yYnAhoAYDBKixX@kili
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
Signed-off-by: Steve French <stfrench@microsoft.com>
Do not map STATUS_OBJECT_NAME_INVALID to -EREMOTE under non-DFS
shares, or 'nodfs' mounts or CONFIG_CIFS_DFS_UPCALL=n builds.
Otherwise, in the slow path, get a referral to figure out whether it
is an actual DFS link.

This could be simply reproduced under a non-DFS share by running the
following

  $ mount.cifs //srv/share /mnt -o ...
  $ cat /mnt/$(printf '\U110000')
  cat: '/mnt/'$'\364\220\200\200': Object is remote

Fixes: c877ce4 ("cifs: reduce roundtrips on create/qinfo requests")
CC: stable@vger.kernel.org # 6.2
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Make sure to get an up-to-date TCP_Server_Info::nr_targets value prior
to waiting the server to be reconnected in cifs_reconnect_tcon().  It
is set in cifs_tcp_ses_needs_reconnect() and protected by
TCP_Server_Info::srv_lock.

Create a new cifs_wait_for_server_reconnect() helper that can be used
by both SMB2+ and CIFS reconnect code.

Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When __cifs_readv() and __cifs_writev() extract pages from a user-backed
iterator into a BVEC-type iterator, they set ->bv_need_unpin to note
whether they need to unpin the pages later.  However, in both cases they
examine the BVEC-type iterator and not the source iterator - and so
bv_need_unpin doesn't get set and the pages are leaked.

I think this may be responsible for the generic/208 xfstest failing
occasionally with:

	WARNING: CPU: 0 PID: 3064 at mm/gup.c:218 try_grab_page+0x65/0x100
	RIP: 0010:try_grab_page+0x65/0x100
	follow_page_pte+0x1a7/0x570
	__get_user_pages+0x1a2/0x650
	__gup_longterm_locked+0xdc/0xb50
	internal_get_user_pages_fast+0x17f/0x310
	pin_user_pages_fast+0x46/0x60
	iov_iter_extract_pages+0xc9/0x510
	? __kmalloc_large_node+0xb1/0x120
	? __kmalloc_node+0xbe/0x130
	netfs_extract_user_iter+0xbf/0x200 [netfs]
	__cifs_writev+0x150/0x330 [cifs]
	vfs_write+0x2a8/0x3c0
	ksys_pwrite64+0x65/0xa0

with the page refcount going negative.  This is less unlikely than it seems
because the page is being pinned, not simply got, and so the refcount
increased by 1024 each time, and so only needs to be called around ~2097152
for the refcount to go negative.

Further, the test program (aio-dio-invalidate-failure) uses a 32MiB static
buffer and all the PTEs covering it refer to the same page because it's
never written to.

The warning in try_grab_page():

	if (WARN_ON_ONCE(folio_ref_count(folio) <= 0))
		return -ENOMEM;

then trips and prevents us ever using the page again for DIO at least.

Fixes: d08089f ("cifs: Change the I/O paths to use an iterator rather than a page list")
Reported-by: Murphy Zhou <jencce.kernel@gmail.com>
Link: https://lore.kernel.org/r/CAH2r5mvaTsJ---n=265a4zqRA7pP+o4MJ36WCQUS6oPrOij8cw@mail.gmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Increase COMMAND_LINE_SIZE as the current default value is too low
for syzbot kernel command line.

There has been considerable discussion on this patch that has led to a
larger patch set removing COMMAND_LINE_SIZE from the uapi headers on all
ports.  That's not quite done yet, but it's gotten far enough we're
confident this is not a uABI change so this is safe.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Link: https://lore.kernel.org/r/20210316193420.904-1-alex@ghiti.fr
[Palmer: it's not uabi]
Link: https://lore.kernel.org/linux-riscv/874b8076-b0d1-4aaa-bcd8-05d523060152@app.fastmail.com/#t
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Apple iMac11,2 (mid 2010) also with Radeon HD-4670 that has the same
issue as iMac10,1 (late 2009) where the internal eDP panel stays dark on
driver load.  This patch treats iMac11,2 the same as iMac10,1,
so the eDP panel stays active.

Additional steps:
Kernel boot parameter radeon.nomodeset=0 required to keep the eDP
panel active.

This patch is an extension of
commit 564d8a2 ("drm/radeon: Fix eDP for single-display iMac10,1 (v2)")
Link: https://lore.kernel.org/all/lsq.1507553064.833262317@decadent.org.uk/
Signed-off-by: Mark Hawrylak <mark.hawrylak@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
…used variable

Fixes following warnings:
warning: no previous prototype for 'umc_v8_10_convert_error_address'
warning: variable 'channel_index' set but not used

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This will let us pass the kms_hdr.bpc_switch IGT
test.

The reason the bpc restriction was required is
historical. At one point in time we were not falling
back to a lower bpc when we didn't have enough
bandwidth for the maximum bpc reported by a display.
This meant that we couldn't enable some high refresh
modes unless we limitted the bpc.

Starting with this patch the issue is fixed:
commit cbd14ae ("drm/amd/display: Fix
incorrectly pruned modes with deep color")

This patch implemented a fallback mechanism if mode
validation failed at the max bpc. This means users
now automatically get all modes that can be supported
by at least 6 bpc. The driver will enable the mode
with the highest possible bpc that is supported by
the display.

v2:
 - explain why this is no longer needed (Michel)
 - refer to commit that fixed bpc fallback (Michel)

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Pekka Paalanen <ppaalanen@gmail.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Vitaly.Prosyak@amd.com
Cc: Joshua Ashton <joshua@froggi.es>
Cc: dri-devel@lists.freedesktop.org
Cc: amd-gfx@lists.freedesktop.org
Cc: Michel Dänzer <michel.daenzer@mailbox.org>
Reviewed-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
A mistake has been made in the BIOS for some ASICs with NBIO 7.5.1
where some NBIO registers aren't properly setup.

Ensure that they're set during initialization.

Tested-by: Richard Gong <richard.gong@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
[Why]
Variable adev->crtc_irq.num_types was initialized as the value of
adev->mode_info.num_crtc at early_init stage, later at hw_init stage,
the num_crtc changed due to the display pipe harvest on some SKUs,
but the num_types was not updated accordingly, that cause below error
in gpu recover.

  *ERROR* amdgpu_dm_set_crtc_irq_state: crtc is NULL at id :3
  *ERROR* amdgpu_dm_set_crtc_irq_state: crtc is NULL at id :3
  *ERROR* amdgpu_dm_set_crtc_irq_state: crtc is NULL at id :3
  *ERROR* amdgpu_dm_set_pflip_irq_state: crtc is NULL at id :3
  *ERROR* amdgpu_dm_set_pflip_irq_state: crtc is NULL at id :3
  *ERROR* amdgpu_dm_set_pflip_irq_state: crtc is NULL at id :3
  *ERROR* amdgpu_dm_set_pflip_irq_state: crtc is NULL at id :3
  *ERROR* amdgpu_dm_set_vupdate_irq_state: crtc is NULL at id :3
  *ERROR* amdgpu_dm_set_vupdate_irq_state: crtc is NULL at id :3
  *ERROR* amdgpu_dm_set_vupdate_irq_state: crtc is NULL at id :3

[How]
Defer the initialization of num_types to eliminate the error logs.

Signed-off-by: tiancyin <tianci.yin@amd.com>
Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
building with gcc and W=1 reports
drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:81:29: error: variable
  ‘ring’ set but not used [-Werror=unused-but-set-variable]
   81 |         struct amdgpu_ring *ring;
      |                             ^~~~

ring is not used so remove it.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The call trace occurs when the amdgpu is removed after
the mode1 reset. During mode1 reset, from suspend to resume,
there is no need to reinitialize the ta firmware buffer
which caused the bo pin_count increase redundantly.

[  489.885525] Call Trace:
[  489.885525]  <TASK>
[  489.885526]  amdttm_bo_put+0x34/0x50 [amdttm]
[  489.885529]  amdgpu_bo_free_kernel+0xe8/0x130 [amdgpu]
[  489.885620]  psp_free_shared_bufs+0xb7/0x150 [amdgpu]
[  489.885720]  psp_hw_fini+0xce/0x170 [amdgpu]
[  489.885815]  amdgpu_device_fini_hw+0x2ff/0x413 [amdgpu]
[  489.885960]  ? blocking_notifier_chain_unregister+0x56/0xb0
[  489.885962]  amdgpu_driver_unload_kms+0x51/0x60 [amdgpu]
[  489.886049]  amdgpu_pci_remove+0x5a/0x140 [amdgpu]
[  489.886132]  ? __pm_runtime_resume+0x60/0x90
[  489.886134]  pci_device_remove+0x3e/0xb0
[  489.886135]  __device_release_driver+0x1ab/0x2a0
[  489.886137]  driver_detach+0xf3/0x140
[  489.886138]  bus_remove_driver+0x6c/0xf0
[  489.886140]  driver_unregister+0x31/0x60
[  489.886141]  pci_unregister_driver+0x40/0x90
[  489.886142]  amdgpu_exit+0x15/0x451 [amdgpu]

Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Signed-off-by: longlyao <Longlong.Yao@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Needs to set the default value of the LTTPR timeout after resume.

[How]
Set the default (3.2ms) timeout at resuming if the sink supports
LTTPR

Reviewed-by: Jerry Zuo <Jerry.Zuo@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Ryan Lin <tsung-hua.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
When PTEBufferSizeInRequests is zero, UBSAN reports the following
warning because dml_log2 returns an unexpected negative value:

  shift exponent 4294966273 is too large for 32-bit type 'int'

[HOW]

In the case PTEBufferSizeInRequests is zero, skip the dml_log2() and
assign the result directly.

Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 4f1b5e7.

[Why & How]
Original change causes a regression. Revert
until fix is available.

Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
More branch devices are able to support Freesync
over PCon so include them in the list of supporting devices.

[how]
Add more compatible PCon devices in the whitelist
for Freesync over Pcon.

Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Sung Joon Kim <sungjoon.kim@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch is used to fix following compilation issue with legacy gcc
error: ‘for’ loop initial declarations are only allowed in C99 mode
	for (int i = 0; i < adev->vcn.num_vcn_inst; ++i) {

Signed-off-by: bobzhou <bob.zhou@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
…oller registers

MT7621 SoC provides a system controller node for accessing to some registers.
Add a phandle in this node to avoid using MIPS related arch operations and
includes in watchdog driver code.

Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230214103936.1061078-2-sergio.paracuellos@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
ujfalusi and others added 3 commits March 13, 2023 10:17
…0313

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
KERNELRELEASE does not need to match the package version in changelog.
Rather, it conventially matches what is called 'ABINAME', which is a
part of the binary package names.

Both are the same by default, but the former might be overridden by
KDEB_PKGVERSION. In this case, the resulting package would not boot
because /lib/modules/$(uname -r) does not point the module directory.

Partially revert 3ab18a6 ("kbuild: deb-pkg: improve the usability
of source package").

Reported-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Fixes: 3ab18a6 ("kbuild: deb-pkg: improve the usability of source package")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
@ujfalusi ujfalusi changed the title Merge/sound upstream 20230313 Merge/sound upstream 20230313 (v6.3-rc1 - try 2) Mar 13, 2023
@ujfalusi
Copy link
Collaborator Author

Replaces #4223

@ujfalusi
Copy link
Collaborator Author

@plbossart, looks like nvme SSD fail on ADLP and the SDW resume issues...

The good news is that I can reproduce now locally after:

sudo pacman -S gcc-11
cd ~/bin
ln -s $(which gcc-11) gcc
ln -s $(which g++-11) g++

and rebuild the kernel with the same config.
the two logs (aplay -i -Dhw:0,0 -fdat /dev/zero):
resume_gcc11.txt
resume_gcc12.txt

There is certainly different sequence of events, the trigger start is at line 195 in both logs but in case of gcc-11 we have a trigger stop coming in instead of the stream to recover and play.

The machine is Dell SKU Number: 0A3E

@plbossart
Copy link
Member

plbossart commented Mar 13, 2023

@ujfalusi this Dell SKU 0A3E is used with IPC3 firmware with the production key, yes? I think I have one of these.

That would clearly point us to some sort of stack or memory corruption, exposed by differences in how the code is generated/optimized.

@ujfalusi
Copy link
Collaborator Author

@plbossart, yes, production key, IPC3.
6.3-rc2 is affected, 6.2 is working with gcc-11, but we did needed to bump the compiler version already for 6.2, so the root of the issue might be older and in 6.2.
Let's see what bisect will throw out and speculate after that.

@plbossart
Copy link
Member

I can also reproduce the issue @ujfalusi on this PR, with gcc-11 installed. Different model (TGLU_SKU0A32_SDCA) but same problem. This is good because it means it's not a random platform-specific issue.

@plbossart
Copy link
Member

[  304.800728] snd_sof:ipc3_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx: 0x60040000: GLB_STREAM_MSG: TRIG_START
[  304.836579] PM: suspend exit

<<< TRIGGER STOP AFTER 2s - sounds like xrun immediately after resume?

[  306.524775] snd_sof:sof_pcm_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm: trigger stream 0 dir 0 cmd 0
[  306.524788] snd_sof:ipc3_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx: 0x60050000: GLB_STREAM_MSG: TRIG_STOP

@plbossart
Copy link
Member

6.1 and 6.2 work fine with GCC-11, 6.3-rc1 is broken. More bisect needed - possibly leading to nonsensical results...

@plbossart
Copy link
Member

who said it'd be an easy one? git bisect is broken for this release, lots of commits will not boot or handle USB dongles that I need.

git checkout v6.2
git bisect good
git checkout v6.3-rc1
git bisect bad
...
git bisect skip v6.2..307e14c039063f0c9bd7a18a7add8f940580dcc9
git bisect skip v6.2..07db5e247ab5858439b14dd7cc1fe538b9efcf32
git bisect skip v6.2..982818426a0ffaf93b0621826ed39a84be3d7d62
git bisect skip v6.2..3ee88df1b063962e39d7798ccc3b18fd10cea813

man.

@plbossart
Copy link
Member

no luck @ujfalusi, I had to skip most of the -rc1 commits since the device would not boot. see log
bisect.log

I also tried with audio-only patches in PR #4244, nothing problematic seen in local tests.

The only reproducible point is to add the log in #4243, this works for me but don't ask me why. @bardliao maybe you can check if you see anything wrong with the dai->suspended flag?

Not an easy one to crack.

@ujfalusi
Copy link
Collaborator Author

@plbossart, I got it down to a single commit, but it does not help either :(

$ git bisect log   
git bisect start
# status: waiting for both good and bad commits
# bad: [eeac8ede17557680855031c6f305ece2378af326] Linux 6.3-rc2
git bisect bad eeac8ede17557680855031c6f305ece2378af326
# status: waiting for good commit(s), bad commit known
# good: [c9c3395d5e3dcc6daee66c6908354d47bf98cb0c] Linux 6.2
git bisect good c9c3395d5e3dcc6daee66c6908354d47bf98cb0c
# bad: [307e14c039063f0c9bd7a18a7add8f940580dcc9] Merge tag '6.3-rc-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
git bisect bad 307e14c039063f0c9bd7a18a7add8f940580dcc9
# good: [36289a03bcd3aabdf66de75cb6d1b4ee15726438] Merge tag 'v6.3-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
git bisect good 36289a03bcd3aabdf66de75cb6d1b4ee15726438
# good: [1a30a6b25f263686dbf2028d56041ac012b10dcb] wifi: brcmfmac: p2p: Introduce generic flexible array frame member
git bisect good 1a30a6b25f263686dbf2028d56041ac012b10dcb
# bad: [13e574b4941ee1931f8c70f33c3011f74e5fbd30] Merge tag 'spi-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
git bisect bad 13e574b4941ee1931f8c70f33c3011f74e5fbd30
# good: [7933b90b42896f5b6596e6a829bb31c5121fc2a9] Merge branch 'for-linus' into for-next
git bisect good 7933b90b42896f5b6596e6a829bb31c5121fc2a9
# good: [01bb11ad828b320749764fa93ad078db20d08a9e] sched/topology: fix KASAN warning in hop_cmp()
git bisect good 01bb11ad828b320749764fa93ad078db20d08a9e
# good: [4d4266e3fd321fadb628ce02de641b129522c39c] page_pool: add a comment explaining the fragment counter usage
git bisect good 4d4266e3fd321fadb628ce02de641b129522c39c
# good: [937ca916bf4de427242fb78105a0089af0dd43b3] MAINTAINERS: Remove file reference for Broadcom Broadband SoC HS SPI driver entry
git bisect good 937ca916bf4de427242fb78105a0089af0dd43b3
# good: [7f62cb8861190e7cc1018ff37597fc49b2eaafa8] regulator: max597x: Align for simple_mfd_i2c driver
git bisect good 7f62cb8861190e7cc1018ff37597fc49b2eaafa8
# bad: [064d7dcf51a82b480e953a15cca47e5df0426502] Merge tag 'sound-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect bad 064d7dcf51a82b480e953a15cca47e5df0426502
# good: [d1b4b4117f890978c23c8e84a58a6c4c713fb400] MAINTAINERS: Switch maintenance for mcr20a driver over
git bisect good d1b4b4117f890978c23c8e84a58a6c4c713fb400
# good: [b60417a9f2b890a8094477b2204d4f73c535725e] selftest: fib_tests: Always cleanup before exit
git bisect good b60417a9f2b890a8094477b2204d4f73c535725e
# good: [d1fabc68f8e0541d41657096dc713cb01775652d] Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
git bisect good d1fabc68f8e0541d41657096dc713cb01775652d
# good: [f3dd0c53370e70c0f9b7e931bbec12916f3bb8cc] bpf: add missing header file include
git bisect good f3dd0c53370e70c0f9b7e931bbec12916f3bb8cc
# first bad commit: [064d7dcf51a82b480e953a15cca47e5df0426502] Merge tag 'sound-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

@ujfalusi
Copy link
Collaborator Author

No matter how hard I try, I always end up with that merge commit to blame, these are the changes behind: the-blame.txt

Curiously it includes single SDW core change, which looks too familiar:
43f1a7f ("soundwire: stream: Add specific prep/deprep commands to port_prep callback")
Reverting it and (again) makes suspend/resume working.

This still does not make any sense as this is a NOP in our machines.

@plbossart
Copy link
Member

@ujfalusi The only common thing I can find is that something happens in the prepare step.

My PR #4243 adds a log and changes the timing and I would guess that 43f1a7f ("soundwire: stream: Add specific prep/deprep commands to port_prep callback") does the same. It's a nop in all cases but enough to make the test pass.

My guess is that the tests fail because we end-up NOT resending the IPC to the DSP on prepare (invoked by the ALSA/ASoC core on resume), and that causes an xrun and and error.

The only odd thing that I found is that the order is not exactly the same for hw_params, prepare and trigger, but I didn't see a smoking gun here.

hw_params:
     snd_soc_link_hw_params
     snd_soc_dai_hw_params (codec then cpu)
     snd_soc_pcm_component_hw_params

prepare
    snd_soc_link_prepare
    snd_soc_pcm_component_prepare
    snd_soc_pcm_dai_prepare

trigger
     snd_soc_link_trigger
     snd_soc_pcm_component_trigger
    snd_soc_pcm_dai_trigger

@ujfalusi
Copy link
Collaborator Author

Closing, new merge created: #4256

@ujfalusi ujfalusi closed this Mar 17, 2023
@ujfalusi ujfalusi deleted the merge/sound-upstream-20230313 branch March 21, 2023 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.