Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu/stm32f1: CPU hangs after wake-up from STOP power mode #13918

Closed
jue89 opened this issue Apr 21, 2020 · 31 comments · Fixed by #20149
Closed

cpu/stm32f1: CPU hangs after wake-up from STOP power mode #13918

jue89 opened this issue Apr 21, 2020 · 31 comments · Fixed by #20149
Assignees
Labels
Area: cpu Area: CPU/MCU ports Platform: ARM Platform: This PR/issue effects ARM-based platforms Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@jue89
Copy link
Contributor

jue89 commented Apr 21, 2020

Description

I am experiencing odd behavior with the STM32F103C8 CPU on current master. I am running the application tests/periph_pm on the board bluepill. After the RTC woke up the board from STM32_PM_STOP, it hangs.

After digging around I found out that the problem is solved by introducing two NOPs before re-initializing the clocks:

diff --git a/cpu/stm32_common/periph/pm.c b/cpu/stm32_common/periph/pm.c
index 4b4875ebf..b51b9d817 100644
--- a/cpu/stm32_common/periph/pm.c
+++ b/cpu/stm32_common/periph/pm.c
@@ -135,6 +136,8 @@ void pm_set(unsigned mode)
 
     if (deep) {
         /* Re-init clock after STOP */
+        __asm__("nop\n\t");
+        __asm__("nop\n\t");
         stmclk_init_sysclk();
     }
 }

Steps to reproduce the issue

Flash tests/periph_pm on the board bluepill:

Expected results

(The first unblock is required to release the block issued by the UART behind stdio_uart.)

cd tests/periph_pm
make BOARD=bluepill flash term
unblock 1
2020-04-21 18:44:05,125 #  unblock 1
2020-04-21 18:44:05,126 # Unblocking power mode 1.
unblock_rtc 1 2
2020-04-21 18:44:07,139 #  unblock_rtc 1 2
2020-04-21 18:44:07,140 # Unblocking power mode 1 for 2 seconds.
(... wait at least 2 secs)
help
2020-04-21 18:44:10,816 #  help
2020-04-21 18:44:10,817 # Command              Description
2020-04-21 18:44:10,817 # ---------------------------------------
2020-04-21 18:44:10,818 # off                  turn off
2020-04-21 18:44:10,818 # reboot               reboot
2020-04-21 18:44:10,819 # block                block power mode
2020-04-21 18:44:10,820 # set                  set power mode
2020-04-21 18:44:10,821 # unblock              unblock power mode
2020-04-21 18:44:10,822 # unblock_rtc          temporary unblock power mode

Actual results

cd tests/periph_pm
make BOARD=bluepill flash term
unblock 1
2020-04-21 18:44:05,125 #  unblock 1
2020-04-21 18:44:05,126 # Unblocking power mode 1.
unblock_rtc 1 2
2020-04-21 18:44:07,139 #  unblock_rtc 1 2
2020-04-21 18:44:07,140 # Unblocking power mode 1 for 2 seconds.
(... wait at least 2 secs)
help
(... the system is dead)

Versions

Operating System Environment
----------------------------
         Operating System: "Manjaro Linux" 
                   Kernel: Linux 5.4.33-3-MANJARO x86_64 unknown
             System shell: GNU bash, version 5.0.16(1)-release (x86_64-pc-linux-gnu)
             make's shell: GNU bash, version 5.0.16(1)-release (x86_64-pc-linux-gnu)

Installed compiler toolchains
-----------------------------
               native gcc: gcc (Arch Linux 9.3.0-1) 9.3.0
        arm-none-eabi-gcc: arm-none-eabi-gcc (Arch Repository) 9.3.0
                  avr-gcc: missing
         mips-mti-elf-gcc: missing
               msp430-gcc: missing
     riscv-none-embed-gcc: missing
     xtensa-esp32-elf-gcc: missing
   xtensa-esp8266-elf-gcc: missing
                    clang: clang version 9.0.1 

Installed compiler libs
-----------------------
     arm-none-eabi-newlib: "3.3.0"
      mips-mti-elf-newlib: missing
  riscv-none-embed-newlib: missing
  xtensa-esp32-elf-newlib: missing
xtensa-esp8266-elf-newlib: missing
                 avr-libc: missing (missing)

Installed development tools
---------------------------
                   ccache: missing
                    cmake: cmake version 3.17.1
                 cppcheck: missing
                  doxygen: 1.8.17
                      git: git version 2.26.1
                     make: GNU Make 4.3
                  openocd: Open On-Chip Debugger 0.10.0
                   python: Python 3.8.2
                  python2: Python 2.7.17
                  python3: Python 3.8.2
                   flake8: error: /usr/bin/python3: No module named flake8
               coccinelle: missing
@jue89
Copy link
Contributor Author

jue89 commented Apr 23, 2020

@leandrolanzieri sry for contacting you directly!
In another PR you mentioned that you own a bluepill. May I ask you to reproduce this issue? I'm trying to find find out whats going wrong but it feels like ghost busting ;) I am curious if another build system produces a binary that is affected by the hanging-problem as described above.

@leandrolanzieri
Copy link
Contributor

@leandrolanzieri sry for contacting you directly!

No worries!

In another PR you mentioned that you own a bluepill. May I ask you to reproduce this issue? I'm trying to find find out whats going wrong but it feels like ghost busting ;) I am curious if another build system produces a binary that is affected by the hanging-problem as described above.

I can successfully reproduce the error, but sadly adding the nops does not solve it for me.

help
2020-04-23 16:50:26,555 # help
2020-04-23 16:50:26,572 # Command              Description
2020-04-23 16:50:26,574 # ---------------------------------------
2020-04-23 16:50:26,576 # off                  turn off
2020-04-23 16:50:26,576 # reboot               reboot
2020-04-23 16:50:26,577 # block                block power mode
2020-04-23 16:50:26,577 # set                  set power mode
2020-04-23 16:50:26,578 # unblock              unblock power mode
2020-04-23 16:50:26,582 # unblock_rtc          temporary unblock power mode
reboot
2020-04-23 16:50:30,800 #  reboot
2020-04-23 16:50:30,801 # CPU will reboot.
2020-04-23 16:50:31,374 # �main(): This is RIOT! (Version: 2020.07-devel-180-g34191)
2020-04-23 16:50:31,380 # This application allows you to test the CPU power management.
2020-04-23 16:50:31,394 # The available power modes are 0 - 1. Lower-numbered power modes
2020-04-23 16:50:31,397 # save more power, but may require an event/interrupt to wake up
2020-04-23 16:50:31,397 # the CPU. Reset the CPU if needed.
> unblock 1
2020-04-23 16:50:33,624 #  unblock 1
2020-04-23 16:50:33,625 # Unblocking power mode 1.
> unblock_rtc 1 2
2020-04-23 16:50:37,955 #  unblock_rtc 1 2
2020-04-23 16:50:37,959 # Unblocking power mode 1 for 2 seconds.
> help
help
help

@kaspar030
Copy link
Contributor

IIRC we've had this issue before. I also remember adding nops for stm32f1. But my memory is cloudy...

@jue89
Copy link
Contributor Author

jue89 commented Apr 23, 2020

I can successfully reproduce the error, but sadly adding the nops does not solve it for me.

Can you add a few more nops? Let's say maybe 10? It really smells like some synchronization error.

@leandrolanzieri
Copy link
Contributor

I can successfully reproduce the error, but sadly adding the nops does not solve it for me.

Can you add a few more nops? Let's say maybe 10?

Still does not come back :-/

@jue89
Copy link
Contributor Author

jue89 commented Apr 23, 2020

Okay. Thanks for testing :)

@fjmolinas
Copy link
Contributor

IIRC we've had this issue before. I also remember adding nops for stm32f1. But my memory is cloudy...

I was done for stm321l52re nucleo-l152re

@fjmolinas
Copy link
Contributor

fjmolinas commented Apr 24, 2020

I can't reproduce with a nucleo-f103rb. @jue89 could you try if the issue is still present after a cold boot? Flash, the board, unplug/plug and replicate your testing procedure?

@fjmolinas
Copy link
Contributor

Also just to make sure we have the same environment, can you BUILD_IN_DOCKER?

@fjmolinas fjmolinas added Area: cpu Area: CPU/MCU ports Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) labels Apr 24, 2020
@fjmolinas
Copy link
Contributor

This ERRATA has some mentions of wakeup issues under specific conditions, could you take a look if one of them applies?

@jue89
Copy link
Contributor Author

jue89 commented Apr 29, 2020

@fjmolinas Thanks for you response!

Also just to make sure we have the same environment, can you BUILD_IN_DOCKER?

Done that. The problem still persists.

I can't reproduce with a nucleo-f103rb. @jue89 could you try if the issue is still present after a cold boot? Flash, the board, unplug/plug and replicate your testing procedure?

The cold boot seems to fix the problem :) Do you have an idea why the CPU hangs after being in a freshly flashed state? Have you observed something similar on STM CPUs?

@fjmolinas
Copy link
Contributor

I had the same issue with stm32l152re. Can you check if its under the same conditions as described in #11830 (remove DBGMCU_CR_DBG_%), maybe the workaround in #11919 would work for this cpu as well?

If its of any use I think this comment #11919 (comment) resumed what I had found out for stm32l152re/

Let me know if the workaround in #11919, I could go back to the suggestion in #11919 (comment). I remember implementing it but having many compiler issues that blocked me, and then I forgot about that.

@jue89
Copy link
Contributor Author

jue89 commented Apr 29, 2020

This following modifications of the openocd configs fixed the issue. (But I haven't tested if debugging still works, yet.)

diff --git a/boards/common/blxxxpill/dist/openocd-128kib.cfg b/boards/common/blxxxpill/dist/openocd-128kib.cfg
index ccaf60b77..d40a29f05 100644
--- a/boards/common/blxxxpill/dist/openocd-128kib.cfg
+++ b/boards/common/blxxxpill/dist/openocd-128kib.cfg
@@ -20,3 +20,31 @@ set FLASH_SIZE 0x20000
 source [find target/stm32f1x.cfg]
 
 $_TARGETNAME configure -rtos auto
+
+# We override openocd defualt configuration so DEBUG in sleep mode is
+# not enabled by default
+$_TARGETNAME configure -event examine-end {
+    # Stop watchdog counters during halt
+       # DBGMCU_APB1_FZ |= DBG_IWDG_STOP | DBG_WWDG_STOP
+       mmw 0xE0042008 0x00001800 0
+}
+
+# Add gdb-attach/gdb-detach events, DEBUG in sleep modes in now only
+# enabled when a gdb session is started.
+$_TARGETNAME configure -event gdb-attach {
+       # DBGMCU_CR |= DBG_STANDBY | DBG_STOP | DBG_SLEEP
+       mmw 0xE0042004 0x00000007 0
+
+       # Stop watchdog counters during halt
+       # DBGMCU_APB1_FZ |= DBG_IWDG_STOP | DBG_WWDG_STOP
+       mmw 0xE0042008 0x00001800 0
+}
+
+$_TARGETNAME configure -event gdb-detach  {
+       # DBGMCU_CR &= ~(DBG_STANDBY | DBG_STOP | DBG_SLEEP)
+       mmw 0xE0042004 0x00000000 0x00000007
+
+       # Stop watchdog counters during halt diagnoized
+       # DBGMCU_APB1_FZ &= ~(DBG_IWDG_STOP | DBG_WWDG_STOP)
+       mmw 0xE0042008 0x00000000 0x00001800
+}
diff --git a/boards/common/blxxxpill/dist/openocd.cfg b/boards/common/blxxxpill/dist/openocd.cfg
index dd5f28645..dc0f55ad0 100644
--- a/boards/common/blxxxpill/dist/openocd.cfg
+++ b/boards/common/blxxxpill/dist/openocd.cfg
@@ -17,3 +17,31 @@ set CONNECT_UNDER_RESET 1
 source [find target/stm32f1x.cfg]
 
 $_TARGETNAME configure -rtos auto
+
+# We override openocd defualt configuration so DEBUG in sleep mode is
+# not enabled by default
+$_TARGETNAME configure -event examine-end {
+    # Stop watchdog counters during halt
+       # DBGMCU_APB1_FZ |= DBG_IWDG_STOP | DBG_WWDG_STOP
+       mmw 0xE0042008 0x00001800 0
+}
+
+# Add gdb-attach/gdb-detach events, DEBUG in sleep modes in now only
+# enabled when a gdb session is started.
+$_TARGETNAME configure -event gdb-attach {
+       # DBGMCU_CR |= DBG_STANDBY | DBG_STOP | DBG_SLEEP
+       mmw 0xE0042004 0x00000007 0
+
+       # Stop watchdog counters during halt
+       # DBGMCU_APB1_FZ |= DBG_IWDG_STOP | DBG_WWDG_STOP
+       mmw 0xE0042008 0x00001800 0
+}
+
+$_TARGETNAME configure -event gdb-detach  {
+       # DBGMCU_CR &= ~(DBG_STANDBY | DBG_STOP | DBG_SLEEP)
+       mmw 0xE0042004 0x00000000 0x00000007
+
+       # Stop watchdog counters during halt diagnoized
+       # DBGMCU_APB1_FZ &= ~(DBG_IWDG_STOP | DBG_WWDG_STOP)
+       mmw 0xE0042008 0x00000000 0x00001800
+}

The workaround you mentioned leads to a hardfault:

diff --git a/cpu/cortexm_common/include/cpu.h b/cpu/cortexm_common/include/cpu.h
index d79d65495..056ba3aa1 100644
--- a/cpu/cortexm_common/include/cpu.h
+++ b/cpu/cortexm_common/include/cpu.h
@@ -170,7 +170,7 @@ static inline void cortexm_sleep(int deep)
     unsigned state = irq_disable();
     __DSB();
     __WFI();
-#if defined(CPU_MODEL_STM32L152RE)
+#if defined(CPU_MODEL_STM32L152RE) || defined(CPU_MODEL_STM32F103CB) || defined(CPU_MODEL_STM32F103C8)
     /* STM32L152RE crashes if branching to irq_restore(state). See #11830. */
     __set_PRIMASK(state);
 #else
2020-04-29 22:56:11,360 # main(): This is RIOT! (Version: 2020.04-devel-2355-g919249)
2020-04-29 22:56:11,363 # This application allows you to test the CPU power management.
2020-04-29 22:56:11,365 # The available power modes are 0 - 1. Lower-numbered power modes
2020-04-29 22:56:11,366 # save more power, but may require an event/interrupt to wake up
2020-04-29 22:56:11,368 # the CPU. Reset the CPU if needed.
2020-04-29 22:56:11,368 # mode 0 blockers: 1 
2020-04-29 22:56:11,369 # mode 1 blockers: 2 
2020-04-29 22:56:11,370 # Lowest allowed mode: 2
> pm unblock 1
2020-04-29 22:56:21,699 #  pm unblock 1
2020-04-29 22:56:21,701 # Unblocking power mode 1.
> pm show
2020-04-29 22:56:25,227 #  pm show
2020-04-29 22:56:25,228 # mode 0 blockers: 1 
2020-04-29 22:56:25,230 # mode 1 blockers: 1 
2020-04-29 22:56:25,231 # Lowest allowed mode: 2
> unblock_rtc 1 2
2020-04-29 22:56:33,609 #  unblock_rtc 1 2
2020-04-29 22:56:33,625 # Unblocking power mode 1 for 2 seconds.
> 2020-04-29 22:56:36,539 #  
2020-04-29 22:56:36,541 # Context before hardfault:
2020-04-29 22:56:36,542 #    r0: 0x00000001
2020-04-29 22:56:36,542 #    r1: 0xd1a0c000
2020-04-29 22:56:36,544 #    r2: 0x03034682
2020-04-29 22:56:36,546 #    r3: 0x00000014
2020-04-29 22:56:36,548 #   r12: 0x00000000
2020-04-29 22:56:36,549 #    lr: 0x00000006
2020-04-29 22:56:36,551 #    pc: 0x00000006
2020-04-29 22:56:36,552 #   psr: 0x20000000
2020-04-29 22:56:36,553 # 
2020-04-29 22:56:36,554 # FSR/FAR:
2020-04-29 22:56:36,555 #  CFSR: 0x00020000
2020-04-29 22:56:36,557 #  HFSR: 0x40000000
2020-04-29 22:56:36,558 #  DFSR: 0x00000008
2020-04-29 22:56:36,560 #  AFSR: 0x00000000
2020-04-29 22:56:36,561 # Misc
2020-04-29 22:56:36,564 # EXC_RET: 0xfffffffd
2020-04-29 22:56:36,568 # Attempting to reconstruct state for debugging...
2020-04-29 22:56:36,568 # In GDB:
2020-04-29 22:56:36,569 #   set $pc=0x6
2020-04-29 22:56:36,569 #   frame 0
2020-04-29 22:56:36,569 #   bt
2020-04-29 22:56:36,569 # 
2020-04-29 22:56:36,571 # ISR stack overflowed by at least 16 bytes.

@fjmolinas
Copy link
Contributor

This following modifications of the openocd configs fixed the issue. (But I haven't tested if debugging still works, yet.)

Ok, so the issues are somehow similar. Does inserting __NOP() before the __WFI() fix the issue as well? Can you try from 1 to 4 __NOP()?

@jue89
Copy link
Contributor Author

jue89 commented Apr 30, 2020

Ok, so the issues are somehow similar. Does inserting __NOP() before the __WFI() fix the issue as well? Can you try from 1 to 4 __NOP()?

I tried this with the build environment showed above and with BUILD_IN_DOCKER=1.

The following patch did the trick:

diff --git a/cpu/cortexm_common/include/cpu.h b/cpu/cortexm_common/include/cpu.h
index d79d65495..73d396f41 100644
--- a/cpu/cortexm_common/include/cpu.h
+++ b/cpu/cortexm_common/include/cpu.h
@@ -169,6 +169,8 @@ static inline void cortexm_sleep(int deep)
     /* ensure that all memory accesses have completed and trigger sleeping */
     unsigned state = irq_disable();
     __DSB();
+    __NOP();
+    __NOP();
     __WFI();
 #if defined(CPU_MODEL_STM32L152RE)
     /* STM32L152RE crashes if branching to irq_restore(state). See #11830. */

1, 3 and 4 __NOP() don't work.

@fjmolinas
Copy link
Contributor

Ok, so the issues are somehow similar. Does inserting __NOP() before the __WFI() fix the issue as well? Can you try from 1 to 4 __NOP()?

I tried this with the build environment showed above and with BUILD_IN_DOCKER=1.

Ups, I'm sorry I actually meant after the __WFI(), sorry for the confusion. So what I'm trying to figure out if it might have to do with the "hints" I had in #11919 (comment).

Otherwise I snooped around in forums and reference manual today, but I have nothing very clear to contribute ATM. But I saw that TI for one of their applications added _NOP after WFIas well..

@jue89
Copy link
Contributor Author

jue89 commented Apr 30, 2020

This actually works:

diff --git a/cpu/cortexm_common/include/cpu.h b/cpu/cortexm_common/include/cpu.h
index d79d65495..fe696c15d 100644
--- a/cpu/cortexm_common/include/cpu.h
+++ b/cpu/cortexm_common/include/cpu.h
@@ -170,6 +170,7 @@ static inline void cortexm_sleep(int deep)
     unsigned state = irq_disable();
     __DSB();
     __WFI();
+    __NOP();
 #if defined(CPU_MODEL_STM32L152RE)
     /* STM32L152RE crashes if branching to irq_restore(state). See #11830. */
     __set_PRIMASK(state);

@fjmolinas
Copy link
Contributor

This actually works:

Aha!, so it seems to be the same kind of issue...

@maribu
Copy link
Member

maribu commented May 4, 2020

@jue89: Can you check if the issue is still present with #13999 applied? Maybe it is similar to the STM32L152RE that a branch just after __WFI() is causing the issue and inlining irq_restore() solves it?

@jue89
Copy link
Contributor Author

jue89 commented May 4, 2020

@jue89: Can you check if the issue is still present with #13999 applied? Maybe it is similar to the STM32L152RE that a branch just after __WFI() is causing the issue and inlining irq_restore() solves it?

Yip! It fixes this issue :)

@maribu
Copy link
Member

maribu commented May 4, 2020

Cool. Even more reason to get it in :-)

Thanks for the quick reply!

@fjmolinas
Copy link
Contributor

@jue89: Can you check if the issue is still present with #13999 applied? Maybe it is similar to the STM32L152RE that a branch just after __WFI() is causing the issue and inlining irq_restore() solves it?

Yip! It fixes this issue :)

Huh, thats strange how is it that the changes tin #13999 fixed the issue but not #13918 (comment)?? Anyway if it fixes it it still good!

@maribu
Copy link
Member

maribu commented May 5, 2020

Huh, thats strange how is it that the changes tin #13999 fixed the issue but not #13918 (comment)??

Indeed. Sorry, I wasn't following the discussion here closely enough. It would be nice to understand what exactly triggers the problem so that we can ensure it does not pop up again.

@jue89: Can you check again to rule out human error? Can you temporarily add an #error "test" below the #if defined(CPU_MODEL_STM32L152RE) || defined(CPU_MODEL_STM32F103CB) || defined(CPU_MODEL_STM32F103C8) to verify that correct code gets indeed compiled?

@jue89
Copy link
Contributor Author

jue89 commented May 6, 2020

@jue89: Can you check again to rule out human error? Can you temporarily add an #error "test" below the #if defined(CPU_MODEL_STM32L152RE) || defined(CPU_MODEL_STM32F103CB) || defined(CPU_MODEL_STM32F103C8) to verify that correct code gets indeed compiled?

Sure. Current master:

tests/periph_pm hangs
jue@bart ~/P/R/t/periph_pm $ make BOARD=bluepill flash term
2020-05-06 12:58:24,959 #  �main(): This is RIOT! (Version: 2020.07-devel-446-g8ffd34)
2020-05-06 12:58:24,961 # This application allows you to test the CPU power management.
2020-05-06 12:58:24,963 # The available power modes are 0 - 1. Lower-numbered power modes
2020-05-06 12:58:24,964 # save more power, but may require an event/interrupt to wake up
2020-05-06 12:58:24,964 # the CPU. Reset the CPU if needed.
2020-05-06 12:58:24,965 # mode 0 blockers: 1 
2020-05-06 12:58:24,965 # mode 1 blockers: 2 
2020-05-06 12:58:24,965 # Lowest allowed mode: 2
> pm unblock 1
2020-05-06 12:58:29,575 #  pm unblock 1
2020-05-06 12:58:29,576 # Unblocking power mode 1.
> unblock_rtc 1 2
2020-05-06 12:58:35,983 #  unblock_rtc 1 2
2020-05-06 12:58:35,985 # Unblocking power mode 1 for 2 seconds.
> help
help
help
help
help

With this patch applied (checked with #error inside the if statement):

diff --git a/cpu/cortexm_common/include/cpu.h b/cpu/cortexm_common/include/cpu.h
index d79d65495..9dae3e104 100644
--- a/cpu/cortexm_common/include/cpu.h
+++ b/cpu/cortexm_common/include/cpu.h
@@ -170,7 +170,7 @@ static inline void cortexm_sleep(int deep)
     unsigned state = irq_disable();
     __DSB();
     __WFI();
-#if defined(CPU_MODEL_STM32L152RE)
+#if defined(CPU_MODEL_STM32L152RE) || defined(CPU_MODEL_STM32F103CB) || defined(CPU_MODEL_STM32F103C8)
     /* STM32L152RE crashes if branching to irq_restore(state). See #11830. */
     __set_PRIMASK(state);
 #else
tests/periph_pm hardfaults
jue@bart ~/P/R/t/periph_pm $ make BOARD=bluepill flash term
2020-05-06 13:02:05,700 # main(): This is RIOT! (Version: 2020.07-devel-446-g8ffd34)
2020-05-06 13:02:05,703 # This application allows you to test the CPU power management.
2020-05-06 13:02:05,705 # The available power modes are 0 - 1. Lower-numbered power modes
2020-05-06 13:02:05,707 # save more power, but may require an event/interrupt to wake up
2020-05-06 13:02:05,708 # the CPU. Reset the CPU if needed.
2020-05-06 13:02:05,709 # mode 0 blockers: 1 
2020-05-06 13:02:05,710 # mode 1 blockers: 2 
2020-05-06 13:02:05,710 # Lowest allowed mode: 2
> pm unblock 1
2020-05-06 13:02:11,703 #  pm unblock 1
2020-05-06 13:02:11,704 # Unblocking power mode 1.
> unblock_rtc 1 2
2020-05-06 13:02:15,554 #  unblock_rtc 1 2
2020-05-06 13:02:15,555 # Unblocking power mode 1 for 2 seconds.
> help
help
help
he2020-05-06 13:02:18,478 #  
l2020-05-06 13:02:18,527 # Context before hardfault:
2020-05-06 13:02:18,528 #    r0: 0x00000001
2020-05-06 13:02:18,529 #    r1: 0xd5a0c000
2020-05-06 13:02:18,530 #    r2: 0x03035682
2020-05-06 13:02:18,531 #    r3: 0x00000014
2020-05-06 13:02:18,532 #   r12: 0x00000000
2020-05-06 13:02:18,532 #    lr: 0x00000006
2020-05-06 13:02:18,532 #    pc: 0x00000006
2020-05-06 13:02:18,533 #   psr: 0x20000000
2020-05-06 13:02:18,533 # 
2020-05-06 13:02:18,533 # FSR/FAR:
2020-05-06 13:02:18,534 #  CFSR: 0x00020000
2020-05-06 13:02:18,534 #  HFSR: 0x40000000
2020-05-06 13:02:18,534 #  DFSR: 0x00000008
2020-05-06 13:02:18,534 #  AFSR: 0x00000000
2020-05-06 13:02:18,534 # Misc
2020-05-06 13:02:18,534 # EXC_RET: 0xfffffffd
2020-05-06 13:02:18,535 # Attempting to reconstruct state for debugging...
2020-05-06 13:02:18,535 # In GDB:
2020-05-06 13:02:18,535 #   set $pc=0x6
2020-05-06 13:02:18,535 #   frame 0
2020-05-06 13:02:18,535 #   bt
2020-05-06 13:02:18,535 # 
2020-05-06 13:02:18,536 # ISR stack overflowed by at least 16 bytes.

@fjmolinas
Copy link
Contributor

In that case does it make sense to extend #14015 to also mention this platform, and if the issue shows up again consider adding 3 x _NOP(). I think it would be better to investigate again, but last time I properly investigated this bug for stm32l152re I spent 1 week on that, and I don't have that time ATM.

So if #13999 fixes the issue, I propose to just take that and as I said extend #14015 to mention this platform, since they seem to follow the same pattern.

@maribu
Copy link
Member

maribu commented May 7, 2020

I agree with @fjmolinas that we should not spend too much time investigating the issue if #13999 can solve it. Maybe it will never pop up again.

But in case it does: @jue89, could you post the dissassembly of cortexm_sleep() in the three cases? This might be just the crucial information needed to understand the issue if it will pop up in the future again.

@jue89
Copy link
Contributor Author

jue89 commented May 8, 2020

But in case it does: @jue89, could you post the dissassembly of cortexm_sleep() in the three cases? This might be just the crucial information needed to understand the issue if it will pop up in the future again.

Sure. cortexm_sleep() was in-lined into pm_set(). I'm posting this assembly.

pm_set() on current master (8ffd34d) still hangs
08000670 <pm_set>:
#define PWR_CR_REG     PWR->CR
#define PWR_WUP_REG    PWR->CSR
#endif

void pm_set(unsigned mode)
{
 8000670:       b510            push    {r4, lr}
    int deep;

    switch (mode) {
 8000672:       b198            cbz     r0, 800069c <pm_set+0x2c>
 8000674:       2801            cmp     r0, #1
 8000676:       d023            beq.n   80006c0 <pm_set+0x50>
            /* Set SLEEPDEEP bit of system control block */
            deep = 1;
            break;
#endif
        default:
            deep = 0;
 8000678:       2400            movs    r4, #0
{
    if (deep) {
        SCB->SCR |=  (SCB_SCR_SLEEPDEEP_Msk);
    }
    else {
        SCB->SCR &= ~(SCB_SCR_SLEEPDEEP_Msk);
 800067a:       4a17            ldr     r2, [pc, #92]   ; (80006d8 <pm_set+0x68>)
 800067c:       6913            ldr     r3, [r2, #16]
 800067e:       f023 0304       bic.w   r3, r3, #4
 8000682:       6113            str     r3, [r2, #16]
    }

    /* ensure that all memory accesses have completed and trigger sleeping */
    unsigned state = irq_disable();
 8000684:       f7ff ffc6       bl      8000614 <irq_disable>
  \details Acts as a special kind of Data Memory Barrier.
           It completes when all explicit memory accesses before this instruction complete.
 */
__STATIC_FORCEINLINE void __DSB(void)
{
  __ASM volatile ("dsb 0xF":::"memory");
 8000688:       f3bf 8f4f       dsb     sy
    __DSB();
    __WFI();
 800068c:       bf30            wfi
#if defined(CPU_MODEL_STM32L152RE)
    /* STM32L152RE crashes if branching to irq_restore(state). See #11830. */
    __set_PRIMASK(state);
#else
    irq_restore(state);
 800068e:       f7ff ffc9       bl      8000624 <irq_restore>
            break;
    }

    cortexm_sleep(deep);

    if (deep) {
 8000692:       b1fc            cbz     r4, 80006d4 <pm_set+0x64>
        /* Re-init clock after STOP */
        stmclk_init_sysclk();
    }
}
 8000694:       e8bd 4010       ldmia.w sp!, {r4, lr}
        stmclk_init_sysclk();
 8000698:       f000 bb6a       b.w     8000d70 <stmclk_init_sysclk>
            PWR_CR_REG &= ~(PM_STOP_CONFIG | PM_STANDBY_CONFIG);
 800069c:       4b0f            ldr     r3, [pc, #60]   ; (80006dc <pm_set+0x6c>)
 800069e:       681a            ldr     r2, [r3, #0]
 80006a0:       f022 020f       bic.w   r2, r2, #15
 80006a4:       601a            str     r2, [r3, #0]
            PWR_CR_REG |= PM_STANDBY_CONFIG;
 80006a6:       681a            ldr     r2, [r3, #0]
 80006a8:       f042 020e       orr.w   r2, r2, #14
 80006ac:       601a            str     r2, [r3, #0]
            PWR_WUP_REG |= PM_EWUP_CONFIG;
 80006ae:       685a            ldr     r2, [r3, #4]
 80006b0:       605a            str     r2, [r3, #4]
        SCB->SCR |=  (SCB_SCR_SLEEPDEEP_Msk);
 80006b2:       4a09            ldr     r2, [pc, #36]   ; (80006d8 <pm_set+0x68>)
 80006b4:       2401            movs    r4, #1
 80006b6:       6913            ldr     r3, [r2, #16]
 80006b8:       f043 0304       orr.w   r3, r3, #4
 80006bc:       6113            str     r3, [r2, #16]
 80006be:       e7e1            b.n     8000684 <pm_set+0x14>
            PWR_CR_REG &= ~(PM_STOP_CONFIG | PM_STANDBY_CONFIG);
 80006c0:       4b06            ldr     r3, [pc, #24]   ; (80006dc <pm_set+0x6c>)
 80006c2:       681a            ldr     r2, [r3, #0]
 80006c4:       f022 020f       bic.w   r2, r2, #15
 80006c8:       601a            str     r2, [r3, #0]
            PWR_CR_REG |= PM_STOP_CONFIG;
 80006ca:       681a            ldr     r2, [r3, #0]
 80006cc:       f042 0201       orr.w   r2, r2, #1
 80006d0:       601a            str     r2, [r3, #0]
    if (deep) {
 80006d2:       e7ee            b.n     80006b2 <pm_set+0x42>
}
 80006d4:       bd10            pop     {r4, pc}
 80006d6:       bf00            nop
 80006d8:       e000ed00        and     lr, r0, r0, lsl #26
 80006dc:       40007000        andmi   r7, r0, r0

And irq_restore():

08000624 <irq_restore>:
  \details Assigns the given value to the Priority Mask Register.
  \param [in]    priMask  Priority Mask
 */
__STATIC_FORCEINLINE void __set_PRIMASK(uint32_t priMask)
{
  __ASM volatile ("MSR primask, %0" : : "r" (priMask) : "memory");
 8000624:       f380 8810       msr     PRIMASK, r0
 * @brief Restore the state of the IRQ flags
 */
void irq_restore(unsigned int state)
{
    __set_PRIMASK(state);
}
 8000628:       4770            bx      lr
pm_set() with the modified #ifdef for the in-lined irq_restore() hardfaults
08000670 <pm_set>:
#define PWR_CR_REG     PWR->CR
#define PWR_WUP_REG    PWR->CSR
#endif

void pm_set(unsigned mode)
{
 8000670:       b510            push    {r4, lr}
    int deep;

    switch (mode) {
 8000672:       b198            cbz     r0, 800069c <pm_set+0x2c>
 8000674:       2801            cmp     r0, #1
 8000676:       d023            beq.n   80006c0 <pm_set+0x50>
            /* Set SLEEPDEEP bit of system control block */
            deep = 1;
            break;
#endif
        default:
            deep = 0;
 8000678:       2400            movs    r4, #0
{
    if (deep) {
        SCB->SCR |=  (SCB_SCR_SLEEPDEEP_Msk);
    }
    else {
        SCB->SCR &= ~(SCB_SCR_SLEEPDEEP_Msk);
 800067a:       4a17            ldr     r2, [pc, #92]   ; (80006d8 <pm_set+0x68>)
 800067c:       6913            ldr     r3, [r2, #16]
 800067e:       f023 0304       bic.w   r3, r3, #4
 8000682:       6113            str     r3, [r2, #16]
    }

    /* ensure that all memory accesses have completed and trigger sleeping */
    unsigned state = irq_disable();
 8000684:       f7ff ffc6       bl      8000614 <irq_disable>
  \details Acts as a special kind of Data Memory Barrier.
           It completes when all explicit memory accesses before this instruction complete.
 */
__STATIC_FORCEINLINE void __DSB(void)
{
  __ASM volatile ("dsb 0xF":::"memory");
 8000688:       f3bf 8f4f       dsb     sy
    __DSB();
    __WFI();
 800068c:       bf30            wfi
  __ASM volatile ("MSR primask, %0" : : "r" (priMask) : "memory");
 800068e:       f380 8810       msr     PRIMASK, r0
            break;
    }

    cortexm_sleep(deep);

    if (deep) {
 8000692:       b1fc            cbz     r4, 80006d4 <pm_set+0x64>
        /* Re-init clock after STOP */
        stmclk_init_sysclk();
    }
}
8000694:       e8bd 4010       ldmia.w sp!, {r4, lr}
        stmclk_init_sysclk();
 8000698:       f000 bb6a       b.w     8000d70 <stmclk_init_sysclk>
            PWR_CR_REG &= ~(PM_STOP_CONFIG | PM_STANDBY_CONFIG);
 800069c:       4b0f            ldr     r3, [pc, #60]   ; (80006dc <pm_set+0x6c>)
 800069e:       681a            ldr     r2, [r3, #0]
 80006a0:       f022 020f       bic.w   r2, r2, #15
 80006a4:       601a            str     r2, [r3, #0]
            PWR_CR_REG |= PM_STANDBY_CONFIG;
 80006a6:       681a            ldr     r2, [r3, #0]
 80006a8:       f042 020e       orr.w   r2, r2, #14
 80006ac:       601a            str     r2, [r3, #0]
            PWR_WUP_REG |= PM_EWUP_CONFIG;
 80006ae:       685a            ldr     r2, [r3, #4]
 80006b0:       605a            str     r2, [r3, #4]
        SCB->SCR |=  (SCB_SCR_SLEEPDEEP_Msk);
 80006b2:       4a09            ldr     r2, [pc, #36]   ; (80006d8 <pm_set+0x68>)
 80006b4:       2401            movs    r4, #1
 80006b6:       6913            ldr     r3, [r2, #16]
 80006b8:       f043 0304       orr.w   r3, r3, #4
 80006bc:       6113            str     r3, [r2, #16]
 80006be:       e7e1            b.n     8000684 <pm_set+0x14>
            PWR_CR_REG &= ~(PM_STOP_CONFIG | PM_STANDBY_CONFIG);
 80006c0:       4b06            ldr     r3, [pc, #24]   ; (80006dc <pm_set+0x6c>)
 80006c2:       681a            ldr     r2, [r3, #0]
 80006c4:       f022 020f       bic.w   r2, r2, #15
 80006c8:       601a            str     r2, [r3, #0]
            PWR_CR_REG |= PM_STOP_CONFIG;
 80006ca:       681a            ldr     r2, [r3, #0]
 80006cc:       f042 0201       orr.w   r2, r2, #1
 80006d0:       601a            str     r2, [r3, #0]
    if (deep) {
 80006d2:       e7ee            b.n     80006b2 <pm_set+0x42>
}
 80006d4:       bd10            pop     {r4, pc}
 80006d6:       bf00            nop
 80006d8:       e000ed00        and     lr, r0, r0, lsl #26
 80006dc:       40007000        andmi   r7, r0, r0

Both differ only at the in-linded irq_restore():

diff

pm_set() from #13999 still works (HEAD: 9012c6c)
 8000658 <pm_set>:

void pm_set(unsigned mode)
{
    int deep;

    switch (mode) {
 8000658:       b190            cbz     r0, 8000680 <pm_set+0x28>
 800065a:       2801            cmp     r0, #1
 800065c:       d022            beq.n   80006a4 <pm_set+0x4c>
{
    if (deep) {
        SCB->SCR |=  (SCB_SCR_SLEEPDEEP_Msk);
    }
    else {
        SCB->SCR &= ~(SCB_SCR_SLEEPDEEP_Msk);
 800065e:       4a17            ldr     r2, [pc, #92]   ; (80006bc <pm_set+0x64>)
 8000660:       6913            ldr     r3, [r2, #16]
 8000662:       f023 0304       bic.w   r3, r3, #4
 8000666:       6113            str     r3, [r2, #16]
            /* Set SLEEPDEEP bit of system control block */
            deep = 1;
            break;
#endif
        default:
            deep = 0;
 8000668:       2300            movs    r3, #0
  __ASM volatile ("MRS %0, primask" : "=r" (result) :: "memory");
 800066a:       f3ef 8210       mrs     r2, PRIMASK
  __ASM volatile ("cpsid i" : : : "memory");
 800066e:       b672            cpsid   i
  \details Acts as a special kind of Data Memory Barrier.
           It completes when all explicit memory accesses before this instruction complete.
 */
__STATIC_FORCEINLINE void __DSB(void)
{
  __ASM volatile ("dsb 0xF":::"memory");
 8000670:       f3bf 8f4f       dsb     sy
    }

    /* ensure that all memory accesses have completed and trigger sleeping */
    unsigned state = irq_disable();
    __DSB();
    __WFI();
 8000674:       bf30            wfi
  __ASM volatile ("MSR primask, %0" : : "r" (priMask) : "memory");
 8000676:       f382 8810       msr     PRIMASK, r2
            break;
    }

    cortexm_sleep(deep);

    if (deep) {
 800067a:       b1eb            cbz     r3, 80006b8 <pm_set+0x60>
        /* Re-init clock after STOP */
        stmclk_init_sysclk();
 800067c:       f000 bb60       b.w     8000d40 <stmclk_init_sysclk>
            PWR_CR_REG &= ~(PM_STOP_CONFIG | PM_STANDBY_CONFIG);
 8000680:       4b0f            ldr     r3, [pc, #60]   ; (80006c0 <pm_set+0x68>)
 8000682:       681a            ldr     r2, [r3, #0]
 8000684:       f022 020f       bic.w   r2, r2, #15
 8000688:       601a            str     r2, [r3, #0]
            PWR_CR_REG |= PM_STANDBY_CONFIG;
 800068a:       681a            ldr     r2, [r3, #0]
 800068c:       f042 020e       orr.w   r2, r2, #14
 8000690:       601a            str     r2, [r3, #0]
            PWR_WUP_REG |= PM_EWUP_CONFIG;
 8000692:       685a            ldr     r2, [r3, #4]
 8000694:       605a            str     r2, [r3, #4]
        SCB->SCR |=  (SCB_SCR_SLEEPDEEP_Msk);
 8000696:       4a09            ldr     r2, [pc, #36]   ; (80006bc <pm_set+0x64>)
 8000698:       6913            ldr     r3, [r2, #16]
 800069a:       f043 0304       orr.w   r3, r3, #4
 800069e:       6113            str     r3, [r2, #16]
 80006a0:       2301            movs    r3, #1
 80006a2:       e7e2            b.n     800066a <pm_set+0x12>
            PWR_CR_REG &= ~(PM_STOP_CONFIG | PM_STANDBY_CONFIG);
 80006a4:       4b06            ldr     r3, [pc, #24]   ; (80006c0 <pm_set+0x68>)
 80006a6:       681a            ldr     r2, [r3, #0]
 80006a8:       f022 020f       bic.w   r2, r2, #15
 80006ac:       601a            str     r2, [r3, #0]
            PWR_CR_REG |= PM_STOP_CONFIG;
 80006ae:       681a            ldr     r2, [r3, #0]
 80006b0:       f042 0201       orr.w   r2, r2, #1
 80006b4:       601a            str     r2, [r3, #0]
    if (deep) {
 80006b6:       e7ee            b.n     8000696 <pm_set+0x3e>
    }
}
 80006b8:       4770            bx      lr
 80006ba:       bf00            nop
 80006bc:       e000ed00        and     lr, r0, r0, lsl #26
 80006c0:       40007000        andmi   r7, r0, r0

I hope I caught everything interesting!

@jue89
Copy link
Contributor Author

jue89 commented May 8, 2020

Hmm, the two non-working cases restore the interrupts using: msr PRIMASK, r0.
The working case uses msr PRIMASK, r2.

I think I read something about R0 being corrupted related to power saving modes?! But I cannot find the issue that mentioned that.

@maribu
Copy link
Member

maribu commented Jul 6, 2020

I think I read something about R0 being corrupted related to power saving modes?! But I cannot find the issue that mentioned that.

I will try to confirm this sometime this week. If that is indeed the cause, adding an

__asm__ volatile ( : : : "r0", "memory");

before entering the power saving mode should tell the compiler to either not use r0 across the power saving mode, or back up its contents before the sleep to restore them afterwards. We should not rely on r0 not containing important data by chance if it can get corrupted during sleep.

@maribu
Copy link
Member

maribu commented Aug 4, 2020

I tried this to check if the corruption of r0 is indeed the issue:

diff --git a/cpu/cortexm_common/include/cpu.h b/cpu/cortexm_common/include/cpu.h
index ca895972e5..d3889203b0 100644
--- a/cpu/cortexm_common/include/cpu.h
+++ b/cpu/cortexm_common/include/cpu.h
@@ -166,11 +166,18 @@ static inline void cortexm_sleep(int deep)
         SCB->SCR &= ~(SCB_SCR_SLEEPDEEP_Msk);
     }
 
+    register unsigned r0 __asm__ ("r0") = 0x42424242;
+
     /* ensure that all memory accesses have completed and trigger sleeping */
     unsigned state = irq_disable();
     __DSB();
     __WFI();
     irq_restore(state);
+
+    __asm__ volatile ("" : "+r"(r0));
+    if (r0 != 0x42424242) {
+        printf("Bug! r0 = 0x%x\n", r0);
+    }
 }
 
 /**

So I first us an explicit register variable in r0 before the sleep. After the sleep, I have a fake inline assembly block taking that variable as input and output, so that the compiler must assume its value was changed (and it cannot optimize the test out). Finally, I test if the value in the variable is indeed the value I previously stored in it.

The resulting assembly shows that the compiler indeed did what I intended. (And did not back up r0 to stack and restore it):

0800066c <pm_set>:
 800066c:       b510            push    {r4, lr}
 800066e:       b1e8            cbz     r0, 80006ac <pm_set+0x40>
 8000670:       2801            cmp     r0, #1
 8000672:       d02d            beq.n   80006d0 <pm_set+0x64>
 8000674:       2400            movs    r4, #0
 8000676:       4a1c            ldr     r2, [pc, #112]  ; (80006e8 <pm_set+0x7c>)
 8000678:       6913            ldr     r3, [r2, #16]
 800067a:       f023 0304       bic.w   r3, r3, #4
 800067e:       6113            str     r3, [r2, #16]
 8000680:       f04f 3042       mov.w   r0, #1111638594 ; 0x42424242
 8000684:       f3ef 8310       mrs     r3, PRIMASK
 8000688:       b672            cpsid   i
 800068a:       f3bf 8f4f       dsb     sy
 800068e:       bf30            wfi
 8000690:       f383 8810       msr     PRIMASK, r3
 8000694:       f1b0 3f42       cmp.w   r0, #1111638594 ; 0x42424242
 8000698:       4601            mov     r1, r0
 800069a:       d002            beq.n   80006a2 <pm_set+0x36>
 800069c:       4813            ldr     r0, [pc, #76]   ; (80006ec <pm_set+0x80>)
 800069e:       f001 fe8b       bl      80023b8 <iprintf>
[...]

I was unable to trigger the test. So no idea, why this failed previously :-/

@aabadie aabadie added the Platform: ARM Platform: This PR/issue effects ARM-based platforms label May 20, 2021
@MrKevinWeiss MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021
@cbiffle
Copy link

cbiffle commented Jul 17, 2023

(I just posted a comment to #14015 with a workaround that may fix this, fwiw.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: cpu Area: CPU/MCU ports Platform: ARM Platform: This PR/issue effects ARM-based platforms Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants