Skip to content

Conversation

@ineskhou
Copy link
Contributor

@ineskhou ineskhou commented Jan 27, 2026

Command Loss Write only on first boot

Command Loss Timer Simplification and Safe Mode Boot Improvements

Description

This PR simplifies the command loss timer implementation and ensures that boot loops and fatal errors reliably boot into safe mode with proper radio configuration and load switch management.

Key Changes

  1. Removed file persistence for command loss timer: The command loss timer no longer saves the last command time to a file. The timer is now in-memory only, resetting on each boot. This simplifies the code and avoids filesystem issues during boot loops.

  2. Safe mode sequence runs on boot: When the system boots into safe mode (either from persistent state or unintended reboot), the safe mode sequence (enter_safe.seq) now runs automatically to:

    • Turn off all load switches (faces 0-5, payload power/battery)
    • It DOES NOT do anything with the radio, we are doing all of that with the startup
  3. Watchdog stop behavior split:

    • Port-based stops (from FatalHandler or AuthenticationRouter command loss): Do NOT call prepareForReboot, ensuring the next boot is classified as unintended and boots into safe mode
    • Ground command STOP_WATCHDOG: Still calls prepareForReboot, marking the reboot as intentional
  4. Unintended reboot path enhancement: When an unintended reboot is detected (boot loop, fatal error, watchdog timeout), the system now:

    • Enters safe mode with SYSTEM_FAULT reason
    • Runs the safe mode sequence to reset radio parameters and turn off load switches
    • this will happen if we command loss timer, so we will reset the radio

Rationale

Previously, if the system was stuck in an assert/boot loop, the command loss timer file persistence could cause issues. The new approach relies on:

  • Boot loop detection via the watchdog/clean shutdown flag mechanism
  • Automatic safe mode entry on unintended reboots
  • Safe mode sequence execution to reset radio and power state

This ensures that boot loops will stop the watchdog, reboot, and automatically enter safe mode with known-good radio parameters and load switches off, preventing transmission before the 45-minute delay period.

Related Issues/Tickets

Helps with #304 and #287

How Has This Been Tested?

  • Unit tests
  • Integration tests
  • Z Tests
  • Manual testing (describe steps)

Screenshots / Recordings (if applicable)

Helps with #304

Further Notes / Considerations

  • Load switch redundancy: ModeManager still calls turnOffNonCriticalComponents() when entering safe mode (via ports), and the sequence also sends TURN_OFF commands. Both mechanisms run, ensuring redundancy but potentially causing duplicate commands. Consider removing port-based turn-off if sequence-based approach is preferred.

  • Command loss timer reset: With file persistence removed, the command loss timer resets on every boot. This means "command loss with boots in between" scenarios will restart the timer from zero on each boot, rather than accumulating across boots. This is intentional simplification.

  • 45-minute delay: The safe mode sequence enforces a 45-minute delay before enabling LoRa transmit. This ensures that if the system boots into safe mode at startup, it will not transmit before the required quiescence period.

Checklist

  • Written detailed sdd with requirements, channels, ports, commands, telemetry defined and correctly formatted and spelled
  • Have written relevant integration tests and have documented them in the sdd
  • Have done a code review with
  • Have tested this PR on every supported board with correct board definitions

Further Notes / Considerations

Copy link
Contributor

@Mikefly123 Mikefly123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these changes look good to me, but calling this new boolean first_boot is confusing. In my mind it implies that this is somehow a persistent variable for the first time the satellite ever boots (after being deployed), but really it just means the first time this component is called after a boot up.

Consider changing the variable name to something like in_startup or cold_start.

I would also like to note that although this behavior helps a lot with making it so we don't instantly trigger command loss after a being turned off for a few days, we still probably want to delay the initial setup of command loss file be a minute or two after startup to protect a little "handling time" when we are doing integration with the dispenser.

@Mikefly123 Mikefly123 moved this to In review in V1.X.X Jan 28, 2026
@ineskhou ineskhou changed the title make the writing only happen on first boot Command Loss Time Does not propogate across boots Jan 28, 2026
@ineskhou ineskhou moved this from In review to In progress in V1.X.X Jan 28, 2026
@Mikefly123 Mikefly123 self-requested a review January 28, 2026 22:18
// (e.g., to reset radio parameters and enforce any transmit delay policy)
this->log_WARNING_HI_UnintendedRebootDetected();
this->enterSafeMode(Components::SafeModeReason::SYSTEM_FAULT);
this->runSafeModeSequence();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I this we have a potential issue here, where calling the SafeModeSequence on an unintended reboot will cause the safe mode sequencer to conflict with the primary command sequencer (which may be running the startup sequence with almost exactly the same timing as the radio_enter_safe.seq

@ineskhou ineskhou requested a review from Mikefly123 January 28, 2026 23:45
@ineskhou ineskhou moved this from In progress to In review in V1.X.X Jan 28, 2026
Copy link
Contributor

@Mikefly123 Mikefly123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Good to see the simplification of the logic here

@ineskhou ineskhou merged commit 962b3c9 into main Jan 29, 2026
3 checks passed
@ineskhou ineskhou deleted the command-loss-fix branch January 29, 2026 04:05
@github-project-automation github-project-automation bot moved this from In review to Done in V1.X.X Jan 29, 2026
@hrfarmer hrfarmer mentioned this pull request Jan 31, 2026
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants