Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG][Multi-core] MTL RVP hangs when core1 & core2 are used simultaneously #7705

Closed
RanderWang opened this issue May 30, 2023 · 14 comments
Closed
Assignees
Labels
bug Something isn't working as expected MTL Applies to Meteor Lake platform multicore Issues observed when not only core#0 is used. multicore-3cores Issues observed when 3 cores are used. P1 Blocker bugs or important features

Comments

@RanderWang
Copy link
Collaborator

RanderWang commented May 30, 2023

Describe the bug
MTL RVP hangs when core1 & core2 are used simultaneously. We need to rescue the MTL RVP by cold boot.

To Reproduce
Do two streams playback on core1 & core2

Reproduction Rate
100%

Environment

  1. Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
    • Kernel: main branch : 0035b02b10f1ef6417d76f469c04572a3af5b9a3
    • SOF: 005-drop-stable
  2. Name of the topology file
    • Topology: development/sof-mtl-nocodec-multicore.tplg
  3. Name of the platform(s) on which the bug is observed.
    • Platform: MTL RVP B0

We created internal bug 15013565662 and ask for help from SOC team

@RanderWang RanderWang added bug Something isn't working as expected MTL Applies to Meteor Lake platform blocked progress blocked by something else, applies to either feature or bug labels May 30, 2023
@abonislawski
Copy link
Member

Same on main branch?

@mengdonglin mengdonglin added P1 Blocker bugs or important features multicore Issues observed when not only core#0 is used. labels May 30, 2023
@mengdonglin mengdonglin changed the title [BUG][IPC4] MTL RVP hangs when core1 & core2 are used simultaneously [BUG][Multi-core] MTL RVP hangs when core1 & core2 are used simultaneously May 30, 2023
@abonislawski
Copy link
Member

@RanderWang checked this today with our validation and we have good results when core1 & core2 are used simultaneously.
Looks like driver is not programming something correctly for this scenario?

@RanderWang
Copy link
Collaborator Author

@RanderWang checked this today with our validation and we have good results when core1 & core2 are used simultaneously. Looks like driver is not programming something correctly for this scenario?

Thanks! As you know, driver doesn't touch core1 & core2, which are maintained by FW. SOC team is checking the scan dump. I will share you the finding

@RanderWang
Copy link
Collaborator Author

@abonislawski We got the reply from SOC team and you are in the loop. Do you have any idea ? Thanks!

@RanderWang
Copy link
Collaborator Author

the scandump show DSP0 and DSP2 are issuing exclusive read to L2SRAM address 0x400ED940. however the transaction is not seen at fabric-to-L2SRAM interface.

question: are these valid program counter: DSP0 = 0x4006 600C, DSP2 = 0x400E CE82

@abonislawski
Copy link
Member

abonislawski commented Jun 6, 2023

@RanderWang it looks fine. To see exactly what's under it we need to check files from this fw build.
On my local build (releases/mtl/v5.0.1) 0x400ED940 = log_buffer so for quick check you can try if there is any difference on build without logs (CONFIG_LOG=n) but this may not be the case with a build that has actually been tested.

@RanderWang
Copy link
Collaborator Author

@abonislawski yes, the address is for log_buffer for this scan dump. I disable log but make no effect on 005 branch. I tested main branch last week, but to my surprise this bug doesn't happen on latest main branch !

@RanderWang
Copy link
Collaborator Author

@abonislawski Can you give education about "program counter: DSP0 = 0x4006 600C, DSP2 = 0x400E CE82". As I know dsp address is based on 0xa0002000, why it is 0x400xxxxx? Thanks!

@marcinszkudlinski
Copy link
Contributor

0xAxxxxx is cached alias
0x4xxxxx is uncached alias
Same SRAM, just accessed through L1 or directly

@RanderWang
Copy link
Collaborator Author

@abonislawski no issue on latest main branch. The issue was fixed or workaround by e2d0d1

 kernel: allow for arch specific processing within busy loops

Give architectures that need it the ability to perform special checks
while e.g. waiting for a spinlock to become available.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>

This is proved that our SOC team reported that DSP0 and DSP2 are issuing exclusive read to L2SRAM address 0x400ED940 (log_buffer).

@mengdonglin mengdonglin removed the blocked progress blocked by something else, applies to either feature or bug label Jun 8, 2023
@mengdonglin
Copy link
Collaborator

@abonislawski no issue on latest main branch. The issue was fixed or workaround by e2d0d1

 kernel: allow for arch specific processing within busy loops

Give architectures that need it the ability to perform special checks
while e.g. waiting for a spinlock to become available.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>

This is proved that our SOC team reported that DSP0 and DSP2 are issuing exclusive read to L2SRAM address 0x400ED940 (log_buffer).

@wszypelt
Copy link

@nashif the problem is not on the FW side, please assign the right person to issue

@mengdonglin mengdonglin added the multicore-3cores Issues observed when 3 cores are used. label Jun 21, 2023
@wszypelt
Copy link

Fixed with latest zephyr version, @RanderWang can you check?

@RanderWang
Copy link
Collaborator Author

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected MTL Applies to Meteor Lake platform multicore Issues observed when not only core#0 is used. multicore-3cores Issues observed when 3 cores are used. P1 Blocker bugs or important features
Projects
None yet
Development

No branches or pull requests

5 participants