Skip to content

Harden backup/restore and cleanup workflows against data-loss scenarios #22

@BadgerOps

Description

@BadgerOps

Summary

Backup/restore and cleanup operations can lead to data loss or inconsistent recovery if executed during active traffic without stronger guardrails and verification.

Why this matters

  • Maintenance endpoints can delete or mutate large datasets.
  • Current backup/restore approach is file-copy based; operational mistakes can produce partial or stale recovery states.
  • This is a data integrity and recoverability risk.

Repo evidence

  • Maintenance endpoints: backend/routers/maintenance.py
  • Cleanup behavior: backend/services/data_aging.py

Scope

  • Add preflight safety checks before destructive cleanup/restore operations.
  • Add backup integrity verification and restore validation workflow.
  • Add explicit operator guidance/runbook for backup, restore, and rollback.
  • Improve observability/audit details for data-changing maintenance actions.

Acceptance criteria

  • Backup artifacts can be validated before restore.
  • Restore workflow includes a verified post-restore health/data check.
  • Cleanup/restore operations have guardrails that reduce accidental destructive execution.
  • Docs include a tested recovery drill procedure.

Out of scope

  • Replacing SQLite with managed database backups in this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    architectureArchitecture-level changes or design debtbackendBackend API/data-layer scopeenhancementNew feature or requesthigh-priorityHigh-impact item to schedule soonreliabilityAvailability, performance, or operational resilience

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions