Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested interrupts crash on Xtensa architecture #3400

Closed
zephyrbot opened this issue Mar 22, 2017 · 7 comments
Closed

Nested interrupts crash on Xtensa architecture #3400

zephyrbot opened this issue Mar 22, 2017 · 7 comments
Labels
area: Xtensa Xtensa Architecture bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Milestone

Comments

@zephyrbot
Copy link
Collaborator

zephyrbot commented Mar 22, 2017

Reported by Andreas Lenz:

Nested interrupts cause crash when the second interrupt happens during the first few lines of "dispatch_c_isr" macro. The high level interrupt is handled fine but returning to the lower level interrupt causes crash.
Nested interrupts otherwise work fine if the second interrupt does not happen during those few lines.

This is the last working point before the crash:

Thread <span>#</span>1 <main> (Suspended : Breakpoint)	
	_xt_medint2_exit() at xtensa_vectors.S:1,092
	_xt_lowint1() at xtensa_vectors.S:989

And next thing to happen:

Thread <span>#</span>1 <main> (Suspended : Signal : SIGTRAP:Trace/breakpoint trap)	
	_DoubleExceptionVector() at xtensa_vectors.S:443
	0x4014a4c6	

(Imported from Jira ZEP-1955)

@zephyrbot
Copy link
Collaborator Author

by Mazen NEIFER:

Can you please precise what do you mean by +the first few lines+?

{code:none}
rsr a2, INTENABLE
rsr a3, INTERRUPT
movi a4, \mask
and a2, a2, a3
and a2, a2, a4
beqz a2, 9f /* nothing to do */
{code}

@zephyrbot
Copy link
Collaborator Author

by Andrew Boie:

Mazen NEIFER who is your question directed to?

@zephyrbot
Copy link
Collaborator Author

by Andreas Lenz:

There seem to be 10 to 15 critical CPU cycles but it's difficult to narrow down further. It will take some time to find out the precise range.

@zephyrbot
Copy link
Collaborator Author

by Andreas Lenz:

It could not be reproduced after replacing the kernel logging call8 with call4 (two places).
Mazen, please see also my email from 1.12.2016

@zephyrbot
Copy link
Collaborator Author

by Mazen NEIFER:

I can understand that call4 may hide the issue, but not fix it.

If this issue happen, this means that there are some issue somewhere in the code and that are triggered in special cases. This will soon or late happen ever with a call4.

According to is_rm.pdf in the section describing call4 and call8, the only difference between the two is that call8 will save more registers. This is a caller decision and not a callee one, so my code can decide what registers to save depending on the need. I agree that a call8 is not strictly required here, but if it allows to see the issue we shall keep it until we fix that issue. Then later we may optimize and use call4.

For now, I need to learn how to reproduce this issue. Can you please tell me which core are you using and which test/code?

@zephyrbot
Copy link
Collaborator Author

by Andreas Lenz:

Mazen NEIFER the issue appeared in our test environment that uses level 2 test interrupts and level 1 timer interrupts. With certain settings, I could reproduce the crash 100%. With call4 it's gone for sure. It might be that it's just hidden but I can't reproduce it. If you need more details or you have any ideas to try out, please contact me in PM.

@zephyrbot
Copy link
Collaborator Author

by Mazen NEIFER:

Thanks Andreas for the analysis you provided by mail.
You are indeed right and I've committed a fix as you suggested.
Please review https://gerrit.zephyrproject.org/r/#/c/13177/1

The issue was that the interrupt stack frame only
allocates 4 registers. This means that if any window overflow happens,
only 4 registers can be saved. This implies that the interrupt handler
can not call functions other than using call4. If this rule is not
honored, then it will result in the registers being overwriting other
context information and thus a stack corruption.

The fix consists on using call4 for calling even t logger function,
which is by the way more optimal as the interrupt handler does not need
to save more than 4 registers when these functions are called.

@zephyrbot zephyrbot added priority: medium Medium impact/importance bug area: Xtensa Xtensa Architecture bug The issue is a bug, or the PR is fixing a bug labels Sep 23, 2017
@zephyrbot zephyrbot added this to the v1.8.0 milestone Sep 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Xtensa Xtensa Architecture bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

No branches or pull requests

1 participant