-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cpu/cortexm_common: replace irq_restore by __set_PRIMASK for stm32l152re #11919
Conversation
I'd expect that With that in mind: Wouldn't it be much better to have |
I agree, I had though of it when first making the change. The main reason I decided to push a intermediate change was that That being said, how would you have header only functions in this case? |
+1
I'd say it should forward declare them as The forward declaration will allow the compiler to verify that the architecture specific implementation has the correct signature. Also, it will allow to have the doxygen doc written once at the generic header, and not for every implementation. |
I do not have the hardware to test, sadly. Otherwise I'd ACK |
@jia200x the maintainer hardware access wiki says you have a nucleo-l152re, would you mind testing? |
Question, I do not see the
I think it was "loss of the content of r0" that, had the consequence to crash when jumping. Maybe I am wrong and missing the right pointer. If my previous statement is correct and that |
I think I copied the wrong file disassembly, I posted to correct one now. |
Sure! |
Talking to @cladmi IRL he suggested I mention more details on the possible explanation I have for this bug, even if still very unclear... The faulty scenario So from debugging output at some point after wake-up the
Hints to cause Looking around in In the case of
This lead me to believe that for some reason when the branch instruction is present it is executing a corrupted pre-fetch buffer instruction, or in other terms an instruction that isn't present. For some reason this only happens when the HCLK and FCLK stays enabled in sleep mode. This might have something to do with different wake-up times since the clock is always enabled for the core?? I wasn't able to find many details of what happens on wake-up, and what could be different when HCLK stays enabled. In conclusion, I don't really know the reason why the hard fault occurs, why jumping just after |
I tried the given test:
From the sources I think they would advice to put 3 NOPs as workaround, but as gut feeling I can understand how With the references the pull request gives enough context on where to start looking again if this re-appears in the future so think it is fine to merge even without understanding why it works (we did not know when merging the @STMicroelectronics may know about it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested this PR and confirm that it fixes the described problem. I also agree with @cladmi that we can merge this even if we don't fully understand why it fixes the problem.
I found an unrelated change. Please squash it directly and I'll ACK when this is done.
cpu/cortexm_common/include/cpu.h
Outdated
@@ -165,16 +165,16 @@ static inline void cortexm_sleep(int deep) | |||
else { | |||
SCB->SCR &= ~(SCB_SCR_SLEEPDEEP_Msk); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this change is unrelated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
- The __NOP() that was added in RIOT-OS#8518 is now remooved. - When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When enabled, a hardfault occured when returning from a branch to irq_restore() we avoid the call by inlining the function call. See RIOT-OS#11830 for more details.
48a7930
to
d075e55
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now, ACK
And go! |
@aabadie Thanks for the review! It's great to having |
stm32l152re is oddly specific - should this affect all stm32l1x then? |
No, see #8518, #11820, #11919 and #8024 for more details. EDIT: at lesast not present on stm32l151 #8518 (comment) |
Contribution description
This PR was split from #11830.
When #7385 was introduced when going to sleep (i.e. calling
__WFI()
),irq_enable
was changed toirq_restore
and that brokestm32l152re
.#8518 fixed the issue by introducing a
__NOP()
after__WFI()
. This fixed the issue until #11159 where the call tocortexm_sleep
was changed and now__NOP()
didn't fix the issue but instead somehow triggered it.After debugging in #11820 it was discovered that the changes in #7385 didn't actually break the code, but broke the code only when
DBGMCU_CR_DBG_STANDBY | DBGMCU_CR_DBG_STOP | DBGMCU_CR_DBG_SLEEP
where enabled. By default openocd sets these bits after anexamine-end
event. This is done by default for all stm32 boards.https://github.com/ntfreak/openocd/blob/263deb3802a515ba8155b6c59146f0f539de4e43/tcl/target/stm32l1.cfg#L93-L100
This could be fixed by 510ffce so those bits are set only when
gdb
is launched. No need to enable debugging in other cases. When debugging this issue was still present.Multiple changes/fixes where attempted in #11820 to fix the issue. Non of them sufficiently consistent or un-intrusive.
As for all I can tell, and as stated in #8518, the issue was the loss of the content of
r0
when jumping toirq_restore
which resulted in a corruptedpc
trying to execute instructions out of the flash region.Since the problem was the branch, with 5d96127
irq_restore
was replaced by__set_PRIMASK(state);
which inlines the function call avoiding the jump and the whole issue all together.The disassembly:
Original disassembly:
Testing procedure
In master this hardfaults:
make -C tests/xtimer_drift BOARD=nucleo-l152re -j3 flash term
For the same command, verify the change robustness:
to be safe run
python dist/tools/compile_and_test_for_board/compile_and_test_for_board.py --jobs 4 . nucleo-l152re
, I got the following tests fails (expected fails)Issues/PRs references
Fixes #11820
See also #8518 & #8024