Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stm32152re: hardfault when DBGMCU_CR_DBG* bits are set and branch after __WFI() #14015

Closed
fjmolinas opened this issue May 4, 2020 · 1 comment · Fixed by #20149
Closed

stm32152re: hardfault when DBGMCU_CR_DBG* bits are set and branch after __WFI() #14015

fjmolinas opened this issue May 4, 2020 · 1 comment · Fixed by #20149
Labels
Area: cpu Area: CPU/MCU ports Area: pm Area: (Low) power management Platform: ARM Platform: This PR/issue effects ARM-based platforms State: don't stale State: Tell state-bot to ignore this issue Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@fjmolinas
Copy link
Contributor

Description

This issue wants to document a recurrent issue that has been seen on stm32l152re platforms.

The history so far:

When #7385 was introduced when going to sleep (i.e. calling __WFI()), irq_enable was changed to irq_restore and that broke stm32l152re.

#8518 fixed the issue by introducing a __NOP() after __WFI(). This fixed the issue until #11159 where the call to cortexm_sleep was changed and now __NOP() didn't fix the issue but instead somehow triggered it.

#11159 re-introduced the issue by changing the way pm_set_lowest() was called since pm_set() was now implemented for STM32L1. A single __NOP() did not fix the issue anymore.

In #11820 it was discovered that the changes in #7385 didn't actually break the code, but broke the code only when DBGMCU_CR_DBG_STANDBY | DBGMCU_CR_DBG_STOP | DBGMCU_CR_DBG_SLEEP where enabled. By default openocd sets these bits after an examine-end event. This is done by default for all stm32 boards.

In #11919 Since the problem was the branch, with 5d96127 irq_restore was replaced by __set_PRIMASK(state); which inlines the function call avoiding the jump and the whole issue all together.

#13999 inline the implementation of irq_restore so the fix in #11919 will be removed.

The faulty scenario

So from debugging output at some point after wake-up the pc gets corrupted and un-reachable instructions gets executed. This only happens when DBGMCU_CR_DBG_STANDBY | DBGMCU_CR_DBG_STOP | DBGMCU_CR_DBG_SLEEP are set. An example for the debugging output is here:

2019-07-09 17:38:58,314 - INFO # Attempting to reconstruct state for debugging...
2019-07-09 17:38:58,315 - INFO # In GDB:
2019-07-09 17:38:58,316 - INFO #   set $pc=0x7822d5e
2019-07-09 17:38:58,317 - INFO #   frame 0
2019-07-09 17:38:58,317 - INFO #   bt
2019-07-09 17:38:58,318 - INFO # 
2019-07-09 17:38:58,319 - INFO # ISR stack overflowed by at least 16 bytes.

Hints to cause

Looking around in stm32 and cortex-m3 erratas and datasheet. I found a mention of a similar issue with stm32f4 in this ERRATA section 2.1.3. In this errata there are some hints as issue that happen the WFE/WFI are placed at 4 byte alignment and problems with the pref-etch buffer. Although this is a differtent cpu (cortex-m4), it made me snoop around the pref-etch buffer and made me think a similar issue might be happening on cortex-m3

In the case of cortex-m3 the pref-etch buffer can fetch two 32bits instructions or 4 16bits instructions but only in sequential code execution. In our code the PRFTEN and ACC64 are enabled so we are reading 64 bits at a time. It stm32l1xxx reference manual it is stated:

When the code is not sequential (branch), the instruction may not be present neither in the
current instruction line used nor in the prefetched instruction line. In this case, the penalty in
terms of number of cycles is at least equal to the number of Wait States.

This lead me to believe that for some reason when the branch instruction is present it is executing a corrupted pre-fetch buffer instruction, or in other terms an instruction that isn't present. For some reason this only happens when the HCLK and FCLK stays enabled in sleep mode. This might have something to do with different wake-up times since the clock is always enabled for the core?? I wasn't able to find many details of what happens on wake-up, and what could be different when HCLK stays enabled.

Steps to reproduce the issue

This issue does not show up currently in master unless a single __NOP is added after https://github.com/fjmolinas/RIOT/blob/e7a1b40cde17dc5f407c9b3884a2603ab656ac7e/cpu/cortexm_common/include/cpu.h#L172. This may change depending on current master, since for a while a single __NOP() fixed the issue.

Expected results

No crash ever..

Actual results

Has crashed in the past.

Possible FIXES

If the issue shows up again 3 __NOP could fix the issue.

@fjmolinas fjmolinas added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Platform: ARM Platform: This PR/issue effects ARM-based platforms Area: pm Area: (Low) power management Area: cpu Area: CPU/MCU ports State: don't stale State: Tell state-bot to ignore this issue labels May 4, 2020
fjmolinas added a commit to fjmolinas/RIOT that referenced this issue May 4, 2020
__set_PRIMASK(state) had been directly inlined to avoid a hardfault that
occured when branching after waking up from sleep with DBG_STANDBY,
DBG_STOP or DBG_SLEEP set in DBG_CR.

The hardfault occured when returning from the branch to irq_restore,
since the function is now inlined the branch does not happend either.

See RIOT-OS#14015 for more details.
fjmolinas added a commit to fjmolinas/RIOT that referenced this issue May 4, 2020
__set_PRIMASK(state) had been directly inlined to avoid a hardfault that
occured when branching after waking up from sleep with DBG_STANDBY,
DBG_STOP or DBG_SLEEP set in DBG_CR.

The hardfault occured when returning from the branch to irq_restore,
since the function is now inlined the branch does not happend either.

See RIOT-OS#14015 for more details.
fjmolinas added a commit to fjmolinas/RIOT that referenced this issue May 5, 2020
__set_PRIMASK(state) had been directly inlined to avoid a hardfault that
occured when branching after waking up from sleep with DBG_STANDBY,
DBG_STOP or DBG_SLEEP set in DBG_CR.

The hardfault occured when returning from the branch to irq_restore,
since the function is now inlined the branch does not happend either.

See RIOT-OS#14015 for more details.
fjmolinas added a commit to fjmolinas/RIOT that referenced this issue May 7, 2020
__set_PRIMASK(state) had been directly inlined to avoid a hardfault that
occured when branching after waking up from sleep with DBG_STANDBY,
DBG_STOP or DBG_SLEEP set in DBG_CR.

The hardfault occured when returning from the branch to irq_restore,
since the function is now inlined the branch does not happen either.

See RIOT-OS#14015 for more details.
fjmolinas added a commit to fjmolinas/RIOT that referenced this issue May 7, 2020
__set_PRIMASK(state) had been directly inlined to avoid a hardfault that
occured when branching after waking up from sleep with DBG_STANDBY,
DBG_STOP or DBG_SLEEP set in DBG_CR.

The hardfault occured when returning from the branch to irq_restore,
since the function is now inlined the branch does not happen either.

Refer to RIOT-OS#14015 for more details.
fjmolinas added a commit to fjmolinas/RIOT that referenced this issue May 12, 2020
__set_PRIMASK(state) had been directly inlined to avoid a hardfault that
occured when branching after waking up from sleep with DBG_STANDBY,
DBG_STOP or DBG_SLEEP set in DBG_CR.

The hardfault occured when returning from the branch to irq_restore,
since the function is now inlined the branch does not happen either.

Refer to RIOT-OS#14015 for more details.
@miri64 miri64 added this to the Release 2020.07 milestone Jul 6, 2020
bergzand pushed a commit to bergzand/RIOT that referenced this issue Jul 15, 2020
__set_PRIMASK(state) had been directly inlined to avoid a hardfault that
occured when branching after waking up from sleep with DBG_STANDBY,
DBG_STOP or DBG_SLEEP set in DBG_CR.

The hardfault occured when returning from the branch to irq_restore,
since the function is now inlined the branch does not happen either.

Refer to RIOT-OS#14015 for more details.
jia200x pushed a commit to jia200x/RIOT that referenced this issue Mar 9, 2021
__set_PRIMASK(state) had been directly inlined to avoid a hardfault that
occured when branching after waking up from sleep with DBG_STANDBY,
DBG_STOP or DBG_SLEEP set in DBG_CR.

The hardfault occured when returning from the branch to irq_restore,
since the function is now inlined the branch does not happen either.

Refer to RIOT-OS#14015 for more details.
@MrKevinWeiss MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021
@cbiffle
Copy link

cbiffle commented Jul 17, 2023

Random bystander here: it is possible you're hitting the same STM32 WFI bug that I've been hitting. I've got a workaround you might try.

Short version: put an isb after your wfi.

Long version: https://cliffle.com/blog/stm32-wfi-bug/

Hope that helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: cpu Area: CPU/MCU ports Area: pm Area: (Low) power management Platform: ARM Platform: This PR/issue effects ARM-based platforms State: don't stale State: Tell state-bot to ignore this issue Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants