-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
architectureArchitecture-level changes or design debtArchitecture-level changes or design debtbackendBackend API/data-layer scopeBackend API/data-layer scopeenhancementNew feature or requestNew feature or requesthigh-priorityHigh-impact item to schedule soonHigh-impact item to schedule soonreliabilityAvailability, performance, or operational resilienceAvailability, performance, or operational resilience
Description
Summary
Backup/restore and cleanup operations can lead to data loss or inconsistent recovery if executed during active traffic without stronger guardrails and verification.
Why this matters
- Maintenance endpoints can delete or mutate large datasets.
- Current backup/restore approach is file-copy based; operational mistakes can produce partial or stale recovery states.
- This is a data integrity and recoverability risk.
Repo evidence
- Maintenance endpoints:
backend/routers/maintenance.py - Cleanup behavior:
backend/services/data_aging.py
Scope
- Add preflight safety checks before destructive cleanup/restore operations.
- Add backup integrity verification and restore validation workflow.
- Add explicit operator guidance/runbook for backup, restore, and rollback.
- Improve observability/audit details for data-changing maintenance actions.
Acceptance criteria
- Backup artifacts can be validated before restore.
- Restore workflow includes a verified post-restore health/data check.
- Cleanup/restore operations have guardrails that reduce accidental destructive execution.
- Docs include a tested recovery drill procedure.
Out of scope
- Replacing SQLite with managed database backups in this issue.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
architectureArchitecture-level changes or design debtArchitecture-level changes or design debtbackendBackend API/data-layer scopeBackend API/data-layer scopeenhancementNew feature or requestNew feature or requesthigh-priorityHigh-impact item to schedule soonHigh-impact item to schedule soonreliabilityAvailability, performance, or operational resilienceAvailability, performance, or operational resilience