Skip to content

Implement safemode.py #7577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 16, 2023
Merged

Implement safemode.py #7577

merged 7 commits into from
Feb 16, 2023

Conversation

dhalbert
Copy link
Collaborator

@dhalbert dhalbert commented Feb 13, 2023

Fixes #5956.
Fixes #2694.

  • Add safemode.py, which is executed first, instead of boot.py and/or code.py, but only if the board has been reset due to going into safe mode.
  • safemode.py has no access to USB. Its output is not written to boot_out.txt, unlike boot.py, to minimize possible problems. There is a special safemode which is entered if there are errors in safemode.py.
  • If safemode.py exits normally, the safe mode will remain in effect, and boot.py and code.py will not be executed. The safe mode reason will be reported in the console as usual.
  • safemode.py can also do a microcontroller.reset(), which will clear the safe mode and restart normally, as if the reset button was pressed. It can also do a deep sleep.
  • Added supervisor.SafeModeReason enum class, and supervisor.runtime.safe_mode_reason, which returns one of those enums.
  • (EDIT) Safe mode messages were shortened. A ninstruction about how to exit safe mode is now always printed.

Internal changes in support of the above:

  • Safemode enum values now all start with SAFE_MODE_.... NO_SAFE_MODE is now SAFE_MODE_NONE (I always did a double-take on NO_SAFE_MODE).
  • Some safemode enums were also renamed or combined, since they are now exposed to the end user. MANUAL and USER are now both just USER. The Nordic-specific safemode is now renamed to be for third-party SDK fatal errors in general. Etc. All hardware faults are now HARD_FAULT. HEAP_OVERWRITTEN is now STACK_OVERFLOW because that describes what happened.
  • The USER safe mode messages were all changed to second person instead of passive voice for clarity.
  • supervisor/message/default.h was obsolete and unused; removed.
  • CIRCUITPY_SAFEMODE_PY is now a compiler flag, though it's default on. But maybe some tiny ports can't use it. I think it will fit, though.

@anecdata
Copy link
Member

anecdata commented Feb 13, 2023

Will the supervisor.SafeModeReason persist into boot.py and code.py?

Will microcontroller.ResetReason change as a result of entering safe mode before boot.py, it's almost like a reset to enter safe mode then boot.py (?)

Will supervisor.RunReason continue to show as STARTUP?

Will alarm.sleep_memory survive the transition from safemode.py to boot.py (and to code.py)? ...edit: suspect not since the only way to get to boot.py would be a microcontroller.reset()

This is a great feature - it will save me so much downtime and manual resets. Thanks, Dan!

@dhalbert
Copy link
Collaborator Author

Will the safe mode reason persist into boot.py and code.py?

You can't enter boot.py or code.py if you are in safe mode, so that is moot. But if you get into the REPL, then supervisor.runtime.safe_mode_reason is preserved.

Will microcontroller.ResetReason change as a result of entering safe mode before boot.py, it's almost like a reset to enter safe mode then boot.py (?)

Will supervisor.RunReason continue to show as STARTUP?

ResetReason and RunReason are unchanged. Safemode reasons are derived from some of these, but they don't interact.

Will alarm.sleep_memory survive the transition from safemode.py to boot.py (and to code.py)?

If alarm.sleep_memory survives microcontroller.reset(), then yes. Again, you can't get directly from safemode.py to boot.py or code.py. You can only get there by clearing the safe mode either by a microcontroller.reset() or an alarm.exit_and_deep_sleep()

The point of safemode.py is to recover from catastrophe programmatically. If you use safemode.py, code.py should realize that it might have been restarted due to clearing a safe mode and then having a reset happen. It should make few assumptions about state.

In many cases, safemode.py probably will just check what kind of safemode happened, and if it's one it wants to handle, it will just call microcontroller.reset(). Examples are batteries running low or a hard fault in network code. In the battery case, you'd want to check the battery voltage at the top of code.py and, if it's low (but you expect it to charge), not do anything power hungry, and instead sleep for a while and check again later. In the network code case, you probably don't know where you left off, so you need to start from scratch.

@anecdata
Copy link
Member

anecdata commented Feb 14, 2023

Thanks, Dan. I currently track reset and reload reasons and stats, using alarm.sleep-memory to carry data from boot.py to code.py, so I'm mainly trying to figure out when and how to know that a safe mode occurred and persist some data about it, and count it only once - maybe set a flag on disk in safemode.py, and clear it in boot.py (after the microcontroller.reset()). I still want stats on safe modes so that causes aren't masked and could still be tracked down.

@dhalbert
Copy link
Collaborator Author

maybe set a flag on disk in safemode.py, and clear it in boot.py (after the microcontroller.reset()).

You might able to use SleepMemory for this, even without alarms. Or nvm might work, if you don't get into some tight loop that writes flash too often. I have thought about some kind of nvm-ish thing that's just a block of RAM (special in a hardware sense or not) that's preserved over resets. On some ports that's how SleepMemory is implemented, but SleepMemory could also be a set of special registers.

@anecdata
Copy link
Member

anecdata commented Feb 14, 2023

I don't think sleep_memory survives reset, at least on espressif currently, and it's not available in all ports (e.g., raspberrypi). But yes, nvm sounds good as a viable alternative to CIRCUITPY for infrequent occurrences, thanks.

edit: ah, but yes alarm.exit_and_deep_sleep() could probably be used with alarm.sleep_memory instead of microcontroller.reset, saving the flash write on ports with alarm.

@dhalbert dhalbert requested review from tannewt and jepler February 14, 2023 04:08
@dhalbert
Copy link
Collaborator Author

While trying to undo safemode.py on a board this morning, I realized that my safemode.py was catching USER safemode, which is when you press the reset button during boot to deliberately going into safemode.

One solution to this is to not run safemode.py if it's a USER-initiated safe mode. The other is to assume that safemode.py is going to let that safe mode pass through. Any opinions?

@RetiredWizard
Copy link

RetiredWizard commented Feb 14, 2023

If I'm trigging a user safe mode I assume it's because I can't get to the REPL any other way. Could a casually written safemode.py block that?

@dhalbert
Copy link
Collaborator Author

If I'm trigging a user safe mode I assume it's because I can't get to the REPL any other way. Could a casually written safemode.py block that?

Yes, if it did a microcontroller.reset() without checking if supervisor.runtime.safe_mode_reason == supervisor.SafeModeReason.USER.

@anecdata
Copy link
Member

anecdata commented Feb 14, 2023

There's also supervisor.SafeModeReason.PROGRAMMATIC (from microcontroller.on_next_reset(microcontroller.RunMode.SAFE_MODE)?) that could be used in code to bypass filtering of supervisor.SafeModeReason.USER. But I don't have a strong opinion about whether to filter USER.

@anecdata
Copy link
Member

anecdata commented Feb 14, 2023

I've been testing safemode.py and everything has been working as expected. It can write to nvm or the filesystem to retain some info about the safemode, do a microcontroller.reset() to proceed with the boot.py and code.py sequence. For those intermittent safe mode occurrences, it will keep the device running. Very nice!

edit: found out unintentionally that the "special safemode which is entered if there are errors in safemode.py" works ;-)

@RetiredWizard
Copy link

If safemode.py can write a new boot.py file... being able to modify storage.remount options by using the "safe mode" button sounds like it could be useful. Perhaps a case for a supervisor.set_next_boot_file feature 😁

@anecdata
Copy link
Member

anecdata commented Feb 14, 2023

I'm using storage.remount() to write a safemode_out.txt file. I think you could just as easily re-write boot.py.

edit: I'm not sure it's even necessary to use storage.remount() in safemode.py or boot.py, since USB isn't up yet
later: it does seem to be necessary

@dhalbert
Copy link
Collaborator Author

One solution to this is to not run safemode.py if it's a USER-initiated safe mode.

I implemented this on the latest commit, because I believe the expectation would be that user-initiated safemode should prevent anything from running, so you can do code repair. Also documented this.

The README.rst could use more work in long run to make it more comprehensive. It sometimes takes a "this is different than MicroPython" viewpoint, and should probably stand on its own. (That's true of other places in the doc as well.) Things like documenting boot.py, code.py, and safemode.py could be moved to a separate section.

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good. Thank you! Just a couple comments on getting it enabled everywhere.

@dhalbert
Copy link
Collaborator Author

@tannewt ok it fits on all the enabled builds now!

@dhalbert dhalbert requested review from tannewt and removed request for jepler February 15, 2023 23:39
Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Looks like only M0 without external flash won't have it.

@dhalbert dhalbert merged commit bbadc00 into adafruit:main Feb 16, 2023
@dhalbert dhalbert deleted the safemode-py branch February 16, 2023 19:15
@ghost
Copy link

ghost commented Feb 16, 2023

This is the type of creature comfort that might not be as appreciated later because it solves a problem too effectively.

So I just wanted to say thanks while it's still fresh, @dhalbert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Safe Mode: mechanism for user code to recover without manual intervention Stuck in Safe Mode after Battery depletion
4 participants