Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] need to rate-limit 'nothing to copy' traces when mixer source is paused. #4672

Closed
plbossart opened this issue Aug 24, 2021 · 18 comments
Closed
Labels
area:SOF logging bug Something isn't working as expected P2 Critical bugs or normal features
Milestone

Comments

@plbossart
Copy link
Member

Describe the bug

When we use a mixer, pausing while using a single source will cause a lot of trace messages

[    16189998.887917] (        1237.760376) c0 dw-dma                 src/drivers/dw/dma.c:1069 INFO dw_dma_free_data_size() size is 0!
[    16190010.710833] (          11.822916) c0 dai          1.2            src/audio/dai.c:957  WARN dai_copy(): nothing to copy
[    16191249.200368] (        1238.489502) c0 dw-dma                 src/drivers/dw/dma.c:1069 INFO dw_dma_free_data_size() size is 0!
[    16191261.127450] (          11.927083) c0 dai          1.2            src/audio/dai.c:957  WARN dai_copy(): nothing to copy
[    16192498.991985] (        1237.864502) c0 dw-dma                 src/drivers/dw/dma.c:1069 INFO dw_dma_free_data_size() size is 0!
[    16192510.814901] (          11.822916) c0 dai          1.2            src/audio/dai.c:957  WARN dai_copy(): nothing to copy
[    16193749.200268] (        1238.385376) c0 dw-dma                 src/drivers/dw/dma.c:1069 INFO dw_dma_free_data_size() size is 0!

This is due to the fact that we don't pause a mixer but keep it active.

To Reproduce

use a topology with a mixer, e.g. cavs-nocodec
aplay -Dhw:0,0 -c2 -r48000 -fS16_LE /dev/zero -vv -i
press space bar
in another terminal, run sof-logger

Reproduction Rate
100%

Expected behavior
rate-limit the trace messages in that case. There's nothing to copy because the sources are paused.

Impact
annoyance

@plbossart plbossart added the bug Something isn't working as expected label Aug 24, 2021
@bkokoszx bkokoszx self-assigned this Aug 31, 2021
@lgirdwood
Copy link
Member

@lyakh @keyonjie did one of you work on a PR that updated this message to dbg level ? or a similar fix ?

@lgirdwood lgirdwood added this to the v2.1 milestone Feb 9, 2022
@lgirdwood lgirdwood added the P3 Low-impact bugs or features label Feb 9, 2022
@kv2019i kv2019i removed this from the v2.1 milestone Feb 15, 2023
@marc-hb
Copy link
Collaborator

marc-hb commented Jun 9, 2023

Still happening with Zephyr
https://sof-ci.01.org/sofpr/PR7763/build9190/devicetest/index.html?model=TGLU_RVP_SDW_IPC4ZPH&testcase=multiple-pause-resume-
#7763

then this test failed with [ 74.224340] <err> dma_dw_common: xrun detected

[   74.211185] <err> dma_dw_common: xrun detected
[   74.211198] <wrn> dai_comp: comp:3 0x40003 dai_zephyr_copy(): nothing to copy
[   74.212185] <err> dma_dw_common: xrun detected
[   74.212220] <wrn> dai_comp: comp:3 0x40003 dai_zephyr_copy(): nothing to copy
[   74.213185] <err> dma_dw_common: xrun detected
[   74.213208] <wrn> dai_comp: comp:3 0x40003 dai_zephyr_copy(): nothing to copy
[   74.214185] <err> dma_dw_common: xrun detected
[   74.214198] <wrn> dai_comp: comp:3 0x40003 dai_zephyr_copy(): nothing to copy
[   74.215185] <err> dma_dw_common: xrun detected
[   74.215218] <wrn> dai_comp: comp:3 0x40003 dai_zephyr_copy(): nothing to copy
[   74.216186] <err> dma_dw_common: xrun detected
[   74.216208] <wrn> dai_comp: comp:3 0x40003 dai_zephyr_copy(): nothing to copy
[   74.217185] <err> dma_dw_common: xrun detected
[   74.217198] <wrn> dai_comp: comp:3 0x40003 dai_zephyr_copy(): nothing to copy
[   74.218185] <err> dma_dw_common: xrun detected
[   74.218220] <wrn> dai_comp: comp:3 0x40003 dai_zephyr_copy(): nothing to copy
[   74.219185] <err> dma_dw_common: xrun detected
[   74.219208] <wrn> dai_comp: comp:3 0x40003 dai_zephyr_copy(): nothing to copy
[   74.220185] <err> dma_dw_common: xrun detected
[   74.220198] <wrn> dai_comp: comp:3 0x40003 dai_zephyr_copy(): nothing to copy
[   74.221185] <err> dma_dw_common: xrun detected
[   74.221220] <wrn> dai_comp: comp:3 0x40003 dai_zephyr_copy(): nothing to copy
[   74.222185] <err> dma_dw_common: xrun detected
[   74.222208] <wrn> dai_comp: comp:3 0x40003 dai_zephyr_copy(): nothing to copy
[   74.223185] <err> dma_dw_common: xrun detected
[   74.223198] <wrn> dai_comp: comp:3 0x40003 dai_zephyr_copy(): nothing to copy
[   74.223251] <inf> ipc: rx	: 0x13000003|0x1
[   74.223286] <inf> pipe: pipe:2 0x0 pipe trigger cmd 2
[   74.223308] <inf> pipe: pipe:3 0x0 pipe trigger cmd 2
[   74.224253] <inf> ll_schedule: task complete 0xbe0b8bc0 0x20180U
[   74.224273] <inf> ll_schedule: num_tasks 2 total_num_tasks 2
[   74.224340] <err> dma_dw_common: xrun detected
[   74.224360] <inf> ll_schedule: task complete 0xbe0b94c0 0x20180U
[   74.224370] <inf> ll_schedule: num_tasks 1 total_num_tasks 1
[   74.224395] <inf> ll_schedule: zephyr_domain_unregister domain->type 1 domain->clk 4
[   74.230456] <inf> ipc: rx	: 0x13000002|0x1
[   74.230480] <inf> pipe: pipe:3 0x0 pipe trigger cmd 0
[   74.230646] <inf> pipe: pipe:2 0x0 pipe trigger cmd 0
[   74.230681] <wrn> copier: comp:2 0x40002 dai is not ready
[   74.231325] <inf> ipc: rx	: 0x46000002|0x3
[   74.231821] <inf> ipc: rx	: 0x12020000|0x0
[   74.231850] <inf> dma: dma_put(), dma = 0x9e093540, sref = 0
[   74.232636] <inf> ipc: rx	: 0x12030000|0x0
[   74.232870] <inf> dma: dma_put(), dma = 0x9e0934a0, sref = Terminated

@lgirdwood
Copy link
Member

@kv2019i any API in Zephyr to rate limit logs ?

@lgirdwood lgirdwood added this to the TBD milestone Jul 4, 2023
@lyakh
Copy link
Collaborator

lyakh commented Jul 5, 2023

@kv2019i any API in Zephyr to rate limit logs ?

@lgirdwood both SOF and Zephyr have means to drop logging entries in case of flooding, SOF also can merge "similar" messages in such situations but I don't think either has a special call to mark individual logs for rate-limiting. I was looking for that recently too and haven't found anything.

@lgirdwood
Copy link
Member

@kv2019i any API in Zephyr to rate limit logs ?

@lgirdwood both SOF and Zephyr have means to drop logging entries in case of flooding, SOF also can merge "similar" messages in such situations but I don't think either has a special call to mark individual logs for rate-limiting. I was looking for that recently too and haven't found anything.

Need to use Zephyr methods, SOF logging will go away.

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 5, 2023

@kv2019i any API in Zephyr to rate limit logs ?

I had a look and I could not find anything either, just "drop newest" vs "drop oldest" when full.
https://docs.zephyrproject.org/latest/services/logging/index.html#log-message-allocation

I don't think either has a special call to mark individual logs for rate-limiting

I don't see when someone would want to rate-limit one specific log statement while letting others flood the logs. EDIT: you could want a higher threshold for some statements versus others, see example in #5597. Except this not really an example because it mentions only one statement, so there's a chance changing global thresholds is still enough even in 5997

The old sof-logger throttling is smart enough to automatically throttle only abusive log statements, leaving other logs unaffected. I'd expect something similar from Zephyr (if there was anything)

Need to use Zephyr methods, SOF logging will go away.

I don't see how one could affect the other.

@marc-hb
Copy link
Collaborator

marc-hb commented Dec 12, 2023

Tentative fix submitted, buried in monster PR #8571

@andyross
Copy link
Contributor

My changes were predicated on the DP scheduler, I didn't realize there were other circumstances that tripped over this too. So they won't fix this in particular, even though it's the same warning. Maybe we should just remove the warning entirely if it's known benign in other situations? Buffer over/underflow is already an error condition, I don't see why this is needed?

(DP hits this due to asynchrony in the pipeline updates: the DP component sinks data synchronously in the source pipeline, but the output doesn't show up right away, so things downstream yell that they have nothing to do until the next time the pipeline is scheduled. But... that's precisely the design of the DP scheduler.)

andyross added a commit to andyross/sof that referenced this issue Dec 13, 2023
The DAI emits a flood of warnings when presented with empty buffers at
copy time.  That's not really a reasonable warning condition.  There
are multiple situations where components upstream may be paused or
asynchronous, leading to starvation in any given cycle.

Earlier code has hit this with paused components, where the log
messages are merely annoying.

One new situation is that when using the DP scheduler, updates are
async and may happen at a different cadence than the pipeline the DAI
is on; the upstream component will be presented with data in a (for
example) 1ms pipeline tick, but then send it to a different component
(echo cancellation, say) that batches it up into larger buffers (10ms)
and releases it downstream only at the slower cadence.

In that situation the flood of messages is being emitted during an
active stream, and tends to cause glitches all by itself after a few
seconds (and even where it doesn't, it floods the Zephyr log backend
to the extent that literally every message is dropped).

(I don't know that all such warnigns are removed by this patch.  These
are only the ones I've seen in practice.)

Fixes thesofproject#4672

Signed-off-by: Andy Ross <andyross@google.com>
marc-hb pushed a commit to marc-hb/sof that referenced this issue Dec 13, 2023
Trying to reproduce DSP panic thesofproject#8621

 -------

The DAI emits a flood of warnings when presented with empty buffers at
copy time.  That's not really a reasonable warning condition.  There
are multiple situations where components upstream may be paused or
asynchronous, leading to starvation in any given cycle.

Earlier code has hit this with paused components, where the log
messages are merely annoying.

One new situation is that when using the DP scheduler, updates are
async and may happen at a different cadence than the pipeline the DAI
is on; the upstream component will be presented with data in a (for
example) 1ms pipeline tick, but then send it to a different component
(echo cancellation, say) that batches it up into larger buffers (10ms)
and releases it downstream only at the slower cadence.

In that situation the flood of messages is being emitted during an
active stream, and tends to cause glitches all by itself after a few
seconds (and even where it doesn't, it floods the Zephyr log backend
to the extent that literally every message is dropped).

(I don't know that all such warnigns are removed by this patch.  These
are only the ones I've seen in practice.)

Fixes thesofproject#4672

Signed-off-by: Andy Ross <andyross@google.com>
(cherry picked from commit 514576e)
@kv2019i kv2019i added P2 Critical bugs or normal features and removed P3 Low-impact bugs or features labels Dec 15, 2023
@kv2019i kv2019i modified the milestones: TBD, v2.9 Dec 15, 2023
@kv2019i
Copy link
Collaborator

kv2019i commented Dec 15, 2023

I think this is a bug. If the mixer itself is active, it should continue to produce data to the DAI if the DAI is running. I'll bump priority to P2 and assign to v2.9.

@marcinszkudlinski
Copy link
Contributor

marcinszkudlinski commented Dec 15, 2023

I'm still not comfortable with silencing those messages, they have been very useful for me recently

do what do you think about this sollution:

as cycles with no data may naturally happen at pipeline startup, so silence them.
BUT - once a first data portion arrives, any further "no data" cycle means a glitch.

  • introduce a status flag - "no data seen yet", set it at startup and clear at first cycle with data
  • send not a warning but an error log in case of "no data" cycle after startup (when no data seen yet is cleared and there's no data)

@lgirdwood
Copy link
Member

@kv2019i your fix now merged, good to close ?

@kv2019i
Copy link
Collaborator

kv2019i commented Dec 20, 2023

@lgirdwood I think what @plbossart described here originally is actually a bug, but when I tried to reproduce this today, I could not. Even without my recent fix PR to (8649), pausing PCMs with mixer caused no flood of "nothing to copy" messages. So unless @plbossart disagrees, I think this can be closed. The original functional problem has been solved, and also "nothing to copy" messages are now limited to debug builds only.

@plbossart
Copy link
Member Author

I am not sure what to make of all the comments.

This is a 2.5yr old bug, I am not even sure it this was reported in the context of IPC4.

On one side we have our trusted @kv2019i who cannot reproduce the error, and on the other side no consensus with a remark from @marcinszkudlinski and a reference to Google AEC which cannot possibly be related.

I would err on the side of keeping this open until there's consensus.

@kv2019i
Copy link
Collaborator

kv2019i commented Jan 12, 2024

@marcinszkudlinski Are you good to close this now? We have kept the warning logs in code, but they are behind a build option (#8649). I think in any case, all known problems (that people can reproduce have been fixed), so I don't want to keep this open for v2.9 if it's not clear anything is broken or needs to be done.

@marcinszkudlinski
Copy link
Contributor

for me - ok to close

@kv2019i kv2019i closed this as completed Jan 15, 2024
@marc-hb
Copy link
Collaborator

marc-hb commented Jan 18, 2024

The mtrace spam still makes logs unusable.

Try opening
https://sof-ci.01.org/sofpr/PR8754/build1979/devicetest/index.html?model=MTLP_RVP_NOCODEC&testcase=multiple-pause-resume-50 ,then click on the "mtrace" tab. The logs are so big that it takes at least 10-15 seconds to just open that tab. Probably longer if you don't have a high speed connection.

[ 2782.210465] <inf> pipe: pipeline_trigger: pipe:2 0x0 pipe trigger cmd 2
[ 2782.211160] <inf> ll_schedule: zephyr_ll_task_done: task complete 0xa0117cc0 0x20210U
[ 2782.211178] <inf> ll_schedule: zephyr_ll_task_done: num_tasks 1 total_num_tasks 3
[ 2782.211186] <inf> ll_schedule: zephyr_domain_unregister: zephyr_domain_unregister domain->type 1 domain->clk 3
[ 2782.221346] <inf> ipc: ipc_cmd: rx	: 0x13000004|0x1
[ 2782.221371] <inf> pipe: pipeline_trigger: pipe:2 0x0 pipe trigger cmd 8
[ 2782.221380] <inf> ll_schedule: zephyr_ll_task_schedule_common: task add 0xa0117cc0 0x20210U priority 0 flags 0x0
[ 2782.221398] <inf> ll_schedule: zephyr_domain_register: zephyr_domain_register domain->type 1 domain->clk 3 domain->ticks_per_ms 38400 period 1000
[ 2782.222170] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.223161] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.224160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.225161] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.226160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.227160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.228160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.229160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.230160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.231158] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.231463] <inf> pipe: pipeline_trigger: pipe:1 0x0 pipe trigger cmd 8
[ 2782.231488] <inf> ll_schedule: zephyr_ll_task_schedule_common: task add 0xa0118dc0 0x20210U priority 0 flags 0x0
[ 2782.232158] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.233161] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.234161] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.235161] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.236160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.237160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.238160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.239161] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.240160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.241158] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.241546] <inf> pipe: pipeline_trigger: pipe:0 0x0 pipe trigger cmd 8
[ 2782.241571] <inf> ll_schedule: zephyr_ll_task_schedule_common: task add 0xa0119240 0x20210U priority 0 flags 0x0
[ 2782.242231] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.242280] <inf> dma_intel_adsp_gpdma: intel_adsp_gpdma_power_off: intel_adsp_gpdma_power_off: dma dma@7c000 power off
[ 2782.242320] <inf> dma_intel_adsp_gpdma: intel_adsp_gpdma_power_on: intel_adsp_gpdma_power_on: dma dma@7c000 initialized
[ 2782.242363] <inf> dai_intel_dmic: dai_dmic_update_bits: dai_dmic_update_bits base 10000, reg 0, mask 6000000, value 4000000
[ 2782.242370] <inf> dai_intel_dmic: dai_dmic_update_bits: dai_dmic_update_bits base 10000, reg 1000, mask 10000, value 0

@marc-hb marc-hb reopened this Jan 18, 2024
@kv2019i
Copy link
Collaborator

kv2019i commented Jan 18, 2024

@marc-hb Please open a different bug for this. This is a log coming from different component and in different case, so we are reusing same bug id for multiple issues here. These logs in latest comment are expected, but we can debate the volume.

@marc-hb
Copy link
Collaborator

marc-hb commented Jan 18, 2024

Opened new #8761

@marc-hb marc-hb closed this as completed Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:SOF logging bug Something isn't working as expected P2 Critical bugs or normal features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants