Skip to content

Latest commit

 

History

History
193 lines (135 loc) · 9.27 KB

File metadata and controls

193 lines (135 loc) · 9.27 KB

Stack Lifecycle & Backups

Purpose. Host-level view of how containers are started, updated, and backed up. Use this to understand infra and operations; for schedules and script details see CRON.md and SCRIPTS.md.

Lifecycle overview

Boot → systemd starts stacks → containers run
         ↓
Watchtower (scheduled) → pulls new images, recreates labeled containers
         ↓
Cron → health checks, config/data backups, audit, prune

Stacks start in dependency order via systemd. Infrastructure stacks (graylog, fluentbit, monitoring, watchtower) are grouped under docker-compose-infra.target. Application stacks use a drop-in override (after-infra.conf) to wait for infrastructure before starting. See scripts/templates/ for the target and override files.


1. Stack lifecycle (systemd)

What How
Start/stop `sudo systemctl start
Enable at boot sudo systemctl enable docker-compose@<stack>
Logs sudo journalctl -u docker-compose@<stack> -f
  • Template: scripts/templates/docker-compose@.service. Installed by scripts/setup.py --install. Each stack is one instance (e.g. docker-compose@graylog).
  • Boot ordering: docker-compose-infra.target groups observability stacks (graylog, fluentbit, monitoring, watchtower). App stacks use the docker-compose-app-override.conf drop-in to start after infrastructure. Installed automatically by scripts/setup.py --install for all detected app stacks.
  • Working directory: Service runs docker compose up -d from the stack directory. Production path: /opt/docker/<stack>/.
  • Restart policy: Containers use restart: unless-stopped so they recover from crashes without systemd involvement.

2. Image updates (Watchtower)

Watchtower runs as a container and updates only containers that have the opt-in label. It does not start or stop stacks; it recreates containers within already-running stacks.

Aspect Detail
Schedule Default: 3am every Monday (cron-style in container). Runs after Sunday full backup completes. See watchtower/README.md.
Opt-in Label: com.centurylinklabs.watchtower.enable=true. Stacks without it are never updated by Watchtower.
Behavior Polls registry, pulls new image, stops container, creates new container with same compose config.
Manual run docker exec watchtower /watchtower --run-once (or --run-once --debug for dry run).

Relationship to backups: Weekly full backup (config + data) runs Sunday 3am. Watchtower runs Monday 3am — one full day after backup, ensuring backups always capture pre-update state.

Rollback: No automatic rollback. Pin the previous image tag in docker-compose.yml and systemctl restart docker-compose@<stack>. See watchtower README for details.


3. Backups

Backups are driven by cron, not by compose or Watchtower. They read the filesystem under DOCKER_ROOT; they do not parse docker-compose.yml or volume definitions.

What gets backed up

Type Scope Method Schedule
Config Entire DOCKER_ROOT (all stack dirs), excluding */data/*, logs, cache, .git Tarball per run: configs-YYYYMMDD_HHMMSS.tar.gz Daily 02:00
Data Only directories named data under each stack: DOCKER_ROOT/*/data rsync into data-YYYY-MM-DD/<stack_name>/ Sunday 03:00

Config backup uses exclude patterns derived from DOCKER_ROOT/.gitignore when present, plus .env and */data/* always. So secrets (.env) and other gitignored paths are not in the tarball. Included: docker-compose.yml, .env.example, READMEs, and other tracked-style files under each stack dir.

Data backup includes: Only */data directories. Bind mounts that use a path not named data (e.g. ./state, ./config, or custom paths) are not part of the data backup; they may still sit under DOCKER_ROOT and thus be included in the config tarball as part of the directory tree.

Where and retention

Item Default Env override
Backup root /backup/docker BACKUP_ROOT
Docker root /opt/docker DOCKER_ROOT (also used by cron)
Retention 7 days BACKUP_RETENTION_DAYS

Old config tarballs and old data-* directories are pruned after each backup run.

Verification and replication

Feature Description Method
Verification Integrity check using SHA256 checksums + tar test --verify flag
Replication Offsite backup to TrueNAS via rsync over SSH --replicate flag

Verification computes SHA256 checksums for all backups:

  • Config backups: Checksum saved to .sha256 file alongside tarball. Tar integrity validated with test listing.
  • Data backups: Per-file checksums saved in MANIFEST.sha256 within the backup directory.

Replication uses rsync to copy backups to TrueNAS:

  • Remote paths: ~/configs/ for config tarballs, ~/data/ for data directories
  • Authentication: SSH key at ~/.ssh/truenas_backup (user: docker-backup)
  • Retention: 1 month (managed by TrueNAS snapshots)
  • Pre-flight check: Verifies SSH key, host reachability, auth, and remote dirs before attempting replication

Check TrueNAS availability manually:

./scripts/backup.py --check-truenas --human

Environment variables:

  • TRUENAS_HOST (default: <TRUENAS_HOST>)
  • TRUENAS_USER (default: docker-backup)

Commands

# Config only (what runs daily)
./scripts/backup.py --verify --replicate --push-metrics --send-log

# Config + data (what runs weekly)
./scripts/backup.py --data --verify --replicate --push-metrics --send-log

# List existing backups
./scripts/backup.py --list --human

# Manual verification and replication
./scripts/backup.py --verify --replicate --human

Exit code 0 only if all backup steps succeed. Metrics and logs are sent when --push-metrics and --send-log are used (as in cron).


4. Log retention and replication

Logs are managed via logrotate (local retention) and rsync (offsite replication to TrueNAS).

Log paths

Path Content Local retention
/var/log/ct-controller/*.log Script logs (alerting, observability-health) 7 days, compressed
/var/log/docker-audit.json Weekly audit report 7 copies, compressed
/var/log/docker-host.json Daily host report 7 copies, compressed

Logrotate

Config: /etc/logrotate.d/ct-controller (installed by setup.py --install)

  • Daily rotation for all logs
  • 7 days retention
  • Compressed with delaycompress

Replication to TrueNAS

Logs are replicated daily alongside backups via backup.py --replicate-logs:

  • /var/log/ct-controller/truenas.local:~/logs/ct-controller/
  • /var/log/docker-*.jsontruenas.local:~/logs/
  • Retention: 1 month (managed by TrueNAS snapshots)

5. Scheduled operations (week at a glance)

When What
Every 5 min Container health check → Pushgateway
Every 10 min Observability stack health → alerts if Graylog/Prometheus/Grafana/Fluent Bit down
Daily 02:00 Config backup, verify, replicate backups + logs to TrueNAS, cleanup
Sunday 03:00 Full backup (config + data), verify, replicate backups + logs to TrueNAS, cleanup
Monday 03:00 Watchtower update run (default schedule)
Saturday 04:00 docker system prune -f
Monday 06:00 Full audit → /var/log/docker-audit.json
Daily 06:30 Host report → /var/log/docker-host.json

Cron file: scripts/templates/docker-maintenance.cron, installed to /etc/cron.d/docker-maintenance by scripts/setup.py --install. Prefix for script paths in cron: /opt/docker/scripts/.


6. Paths reference

Path Purpose
/opt/docker Stack roots (production). One dir per stack, each with docker-compose.yml, .env, etc.
/opt/docker/scripts Scripts and templates in production (clone/copy of repo scripts/).
/backup/docker Config tarballs and data-* trees.
/var/log/docker-audit.json Latest audit output (ports, images, validation).
/var/log/docker-host.json Latest host report (hardware, Docker, systemd).
/var/log/ct-controller/ Resilient log files from scripts (see OBSERVABILITY.md).

7. Recovery (high level)

  • Restore config: Extract the desired configs-*.tar.gz over DOCKER_ROOT (or a single stack subdir). Fix ownership if needed (docker-services:docker-services). Restart the stack.
  • Restore data: Copy from the chosen data-YYYY-MM-DD/<stack_name>/ back to the stack’s data directory (or the paths the stack actually uses). Restart the stack.
  • Bad Watchtower update: Pin previous image in docker-compose.yml, systemctl restart docker-compose@<stack>. Optionally exclude that service from Watchtower by removing the label.

Related docs