Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [MTL] set pipeline state returned error when running RTC AEC on secondary core #8024

Closed
yongzhi1 opened this issue Aug 9, 2023 · 12 comments
Assignees
Labels
bug Something isn't working as expected Echo Reference Applies to echo reference IPC timeout IPC timeout observed MTL Applies to Meteor Lake platform multicore Issues observed when not only core#0 is used. multicore-2cores Issues observed when 2 cores are used. P1 Blocker bugs or important features

Comments

@yongzhi1
Copy link
Contributor

yongzhi1 commented Aug 9, 2023

Describe the bug

With exact same test environment as described in #8022, now update topology to run AEC capture on secondary core (using core 1 as an example):

sof-mtl-max98357a-rt5682-ssp2-ssp0

To Reproduce

  1. Start spk playback:
aplay -Dhw:0,1 -c2 -r48000 -fS16_LE /home/LR.wav -vv
  1. Start spk feedback capture:
arecord -Dhw:0,29 /tmp/tmp.wav -c2 -r48000 -fs32_le -vv
  1. Start AEC capture:
arecord -Dhw:0,27 /tmp/mock_aec.wav -c4 -r48000 -fs32_le -vv -d20

It failed at step 3 for IPC time out:

7.445,208] <err> ipc: failed to wait schedule thread
[00:00:07.445,223] <err> ipc: ipc4: failed to send delayed reply
[00:00:07.491,650] <inf> ipc: ipc_cmd: rx	: 0x13000002|0x1
[00:00:07.491,705] <err> ipc: ipc: invalid status 3 for RESET
[00:00:07.491,716] <err> ipc: failed to process msg 19 status 7
[00:00:07.491,728] <err> ipc: ipc4: FW_GEN_MSG failed with err 7
[00:00:07.521,030] <inf> ipc: ipc_cmd: rx	: 0x46060004|0x70004
[00:00:07.521,546] <inf> ipc: ipc_cmd: rx	: 0x12040000|0x0
[00:00:07.521,595] <err> ipc: ipc_comp_free(): comp id: 262149 state is 3 cannot be freed
[00:00:07.521,608] <err> ipc: ipc_pipeline_free(): module free () failed
[00:00:07.521,620] <err> ipc: ipc4: FW_GEN_MSG failed with err 12

Reproduction Rate
100%

Expected behavior
Step 3 should work as good as Step 1 & 2.

Impact
Same results are observed with Google AEC lib instead of CONFIG_GOOGLE_RTC_AUDIO_PROCESSING_MOCK.

Environment

  1. Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
  2. Name of the topology file
  3. Name of the platform(s) on which the bug is observed.
    • Platform: {MTL/Rex}

mtrace-core1.txt
dmesg-mock-aec-core1.txt

@yongzhi1 yongzhi1 added bug Something isn't working as expected MTL Applies to Meteor Lake platform multicore Issues observed when not only core#0 is used. labels Aug 9, 2023
@yongzhi1 yongzhi1 changed the title [BUG] [MTL] failed to run RTC AEC capture on secondary core [BUG] [MTL] set pipeline state returned error when running RTC AEC on secondary core Aug 11, 2023
@yongzhi1
Copy link
Contributor Author

yongzhi1 commented Aug 11, 2023

Cherry-picked #8000 to get new mtrace for more insight:
mtrace-echo-core1.txt

The issue is not reproducible if google-rtc-aec.18.1 being removed from the connection:

sof-mtl-max98357a-rt5682-ssp2-ssp0-modified

cc @abonislawski @lgirdwood @RanderWang

@RanderWang
Copy link
Collaborator

RanderWang commented Aug 14, 2023

We have observed various IPC timeout with multi-core even with normal playback, especially if we disable mtrace output.
#7774 (comment)

@mengdonglin mengdonglin added IPC timeout IPC timeout observed P1 Blocker bugs or important features labels Aug 14, 2023
@yongzhi1
Copy link
Contributor Author

yongzhi1 commented Aug 14, 2023

The error is slightly different than #7774 though, in this case:

  1. The 300ms timed out happens in ipc_wait_for_compound_msg(), driver side reported IPC4_FAILURE for SOF_IPC4_GLB_SET_PIPELINE_STATE:
[   82.420025] sof-audio-pci-intel-mtl 0000:00:1f.3: FW reported error: 6 - Unknown error while processing the request
[   82.431704] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc error for msg 0x13000004|0x1
  1. Only two cores are involved for the test.

@yongzhi1 yongzhi1 added the multicore-2cores Issues observed when 2 cores are used. label Aug 14, 2023
@yongzhi1
Copy link
Contributor Author

cc @serhiy-katsyuba-intel

@serhiy-katsyuba-intel serhiy-katsyuba-intel self-assigned this Aug 18, 2023
@mengdonglin mengdonglin added the duplicate This issue or pull request already exists label Aug 21, 2023
@mengdonglin
Copy link
Collaborator

This is one kinds of IPC timed out tracked by #7774 with multi-core , seems to be a duplicate .

@yongzhi1
Copy link
Contributor Author

This is one kinds of IPC timed out tracked by #7774 with multi-core , seems to be a duplicate .

If I remove google-rtc-aec.18.1 from the pipeline, then the issue is not reproducible, otherwise it's 100%, so there seems to be a very specific reason.

@abonislawski
Copy link
Member

Please verify if this will work with IMR context restore disabled (or just before first suspend)

@serhiy-katsyuba-intel
Copy link
Contributor

The issue is no longer reproducible due to another issue. Now (with kernel from August 21) running arecord on AEC capture fails on driver side, driver does not even send any IPC to firmware:

localhost-rev1 ~ # arecord -Dhw:0,27 /tmp/mock_aec.wav -c4 -r48000 -fs32_le -vv -d20
Recording WAVE '/tmp/mock_aec.wav' : Signed 32 bit Little Endian, Rate 48000 Hz, Channels 4
arecord: set_params:1416: Unable to install hw params:
ACCESS:  RW_INTERLEAVED
FORMAT:  S32_LE
SUBFORMAT:  STD
SAMPLE_BITS: 32
FRAME_BITS: 128
CHANNELS: 4
RATE: 48000
PERIOD_TIME: (21333 21334)
PERIOD_SIZE: 1024
PERIOD_BYTES: 16384
PERIODS: 4
BUFFER_TIME: (85333 85334)
BUFFER_SIZE: 4096
BUFFER_BYTES: 65536
TICK_TIME: 0

In dmesg:

[  929.217962] sof-audio-pci-intel-mtl 0000:00:1f.3: Init input audio formats for google-rtc-aec.18.1
[  929.217964] sof-audio-pci-intel-mtl 0000:00:1f.3: Pin index #0: 48000Hz, 16bit (ch_map 0xffffff10 ch_cfg 1 interleaving_style 0 fmt_cfg 0x1002) buffer size 192
[  929.217966] sof-audio-pci-intel-mtl 0000:00:1f.3: RTC_AEC (UUID: B780A0A6-269F-466F-B477-23DFA05AF758): No mod_cfg available for CPC lookup in the firmware file's manifest (ibs/obs: 192/192)
[  929.236913] sof-audio-pci-intel-mtl 0000:00:1f.3: google-rtc-aec.18.1: ibs / obs / cpc: 192 / 192 / 0
[  929.236914] sof-audio-pci-intel-mtl 0000:00:1f.3: input pin 1 format not found for google-rtc-aec.18.1
[  929.247315] sof-audio-pci-intel-mtl 0000:00:1f.3: failed to prepare widget google-rtc-aec.18.1
[  929.256941] sof-audio-pci-intel-mtl 0000:00:1f.3: Failed to prepare connected widgets
[  929.265691] sof-audio-pci-intel-mtl 0000:00:1f.3: error: failed widget list set up for pcm 27 dir 1
[  929.275799] sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_hw_params on 0000:00:1f.3: -22
[  929.287718]  DMIC0 RTC AEC: ASoC: error at __soc_pcm_hw_params on DMIC0 RTC AEC: -22
[  929.296374]  DMIC0 RTC AEC: ASoC: error at dpcm_fe_dai_hw_params on DMIC0 RTC AEC: -22

Maybe we have missing Input Pin 1 format description in manifest? Somethere on start in dmesg (note, Pin index #0 is present 2 times but no Pin index # 1):

[    7.055654] sof-audio-pci-intel-mtl 0000:00:1f.3: widget google-rtc-aec.18.1: is_pages: 1
[    7.055656] sof-audio-pci-intel-mtl 0000:00:1f.3: Input audio formats for google-rtc-aec.18.1
[    7.055657] sof-audio-pci-intel-mtl 0000:00:1f.3: Pin index #0: 48000Hz, 16bit (ch_map 0xffffff10 ch_cfg 1 interleaving_style 0 fmt_cfg 0x1002) buffer size 192
[    7.055658] sof-audio-pci-intel-mtl 0000:00:1f.3: Pin index #0: 48000Hz, 16bit (ch_map 0xffffff10 ch_cfg 1 interleaving_style 0 fmt_cfg 0x1002) buffer size 192
[    7.055660] sof-audio-pci-intel-mtl 0000:00:1f.3: Output audio formats for google-rtc-aec.18.1
[    7.055661] sof-audio-pci-intel-mtl 0000:00:1f.3: Pin index #0: 48000Hz, 16bit (ch_map 0xffffff10 ch_cfg 1 interleaving_style 0 fmt_cfg 0x1002) buffer size 192

Tested with firmware build from mtl-005-drop-stable branch with enabled

CONFIG_COMP_GOOGLE_RTC_AUDIO_PROCESSING=y
CONFIG_GOOGLE_RTC_AUDIO_PROCESSING_MOCK=y

with this rimage manifest commit RanderWang/rimage@bbdbf09
and topology from PR build #8050.

@RanderWang , @yongzhi1 , can you check if AEC component input pin 1 description is correct (is present) in manifest? Or maybe you have any idea why recent driver is complaining for missing input pin 1 format of AEC?

@yongzhi1
Copy link
Contributor Author

yongzhi1 commented Aug 22, 2023

Hi, @serhiy-katsyuba-intel , the input pin 1 is defined here:

RanderWang/rimage@bbdbf09#diff-7d97bf1f90e10dc1f06355450d648d5dcd03b8010aaad7101077fe54611012d6R503

For the ACE use case, playback needs to be started first, followed by EchoRef, then "DMIC0 RTC AEC" at last, did you try in that order? thanks,

@RanderWang
Copy link
Collaborator

Thanks! yong. the PR was thesofproject/rimage#125, but too many requirement can't be supported and it was only a example for AEC, so closed it. Now chrome team are working on it so @yongzhi1 you can submit one.

@serhiy-katsyuba-intel
Copy link
Contributor

The problem is caused by one of the components was created with extension.r.proc_domain == COMP_PROCESSING_DOMAIN_DP.

DP scheduling is NOT supported on mtl-005-drop-stable branch. Sadly, there is no clear error message when trying to use DP scheduling but just crash. That crash resulted in IPC timeout.

@serhiy-katsyuba-intel
Copy link
Contributor

The problem is in mtl.toml. domain_types = "1" is specified for RTC_AEC component. That parameter selects type of processing: 0 for LL, 1 for DP. Modifying domain_types to "0" solves the problem.

cc: @RanderWang , @abonislawski .

@mengdonglin mengdonglin removed the duplicate This issue or pull request already exists label Aug 29, 2023
@mengdonglin mengdonglin added the Echo Reference Applies to echo reference label Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected Echo Reference Applies to echo reference IPC timeout IPC timeout observed MTL Applies to Meteor Lake platform multicore Issues observed when not only core#0 is used. multicore-2cores Issues observed when 2 cores are used. P1 Blocker bugs or important features
Projects
None yet
Development

No branches or pull requests

5 participants