Purpose. Host-level view of how containers are started, updated, and backed up. Use this to understand infra and operations; for schedules and script details see CRON.md and SCRIPTS.md.
Boot → systemd starts stacks → containers run
↓
Watchtower (scheduled) → pulls new images, recreates labeled containers
↓
Cron → health checks, config/data backups, audit, prune
Stacks start in dependency order via systemd. Infrastructure stacks (graylog, fluentbit, monitoring, watchtower) are grouped under docker-compose-infra.target. Application stacks use a drop-in override (after-infra.conf) to wait for infrastructure before starting. See scripts/templates/ for the target and override files.
| What | How |
|---|---|
| Start/stop | `sudo systemctl start |
| Enable at boot | sudo systemctl enable docker-compose@<stack> |
| Logs | sudo journalctl -u docker-compose@<stack> -f |
- Template:
scripts/templates/docker-compose@.service. Installed byscripts/setup.py --install. Each stack is one instance (e.g.docker-compose@graylog). - Boot ordering:
docker-compose-infra.targetgroups observability stacks (graylog, fluentbit, monitoring, watchtower). App stacks use thedocker-compose-app-override.confdrop-in to start after infrastructure. Installed automatically byscripts/setup.py --installfor all detected app stacks. - Working directory: Service runs
docker compose up -dfrom the stack directory. Production path:/opt/docker/<stack>/. - Restart policy: Containers use
restart: unless-stoppedso they recover from crashes without systemd involvement.
Watchtower runs as a container and updates only containers that have the opt-in label. It does not start or stop stacks; it recreates containers within already-running stacks.
| Aspect | Detail |
|---|---|
| Schedule | Default: 3am every Monday (cron-style in container). Runs after Sunday full backup completes. See watchtower/README.md. |
| Opt-in | Label: com.centurylinklabs.watchtower.enable=true. Stacks without it are never updated by Watchtower. |
| Behavior | Polls registry, pulls new image, stops container, creates new container with same compose config. |
| Manual run | docker exec watchtower /watchtower --run-once (or --run-once --debug for dry run). |
Relationship to backups: Weekly full backup (config + data) runs Sunday 3am. Watchtower runs Monday 3am — one full day after backup, ensuring backups always capture pre-update state.
Rollback: No automatic rollback. Pin the previous image tag in docker-compose.yml and systemctl restart docker-compose@<stack>. See watchtower README for details.
Backups are driven by cron, not by compose or Watchtower. They read the filesystem under DOCKER_ROOT; they do not parse docker-compose.yml or volume definitions.
| Type | Scope | Method | Schedule |
|---|---|---|---|
| Config | Entire DOCKER_ROOT (all stack dirs), excluding */data/*, logs, cache, .git |
Tarball per run: configs-YYYYMMDD_HHMMSS.tar.gz |
Daily 02:00 |
| Data | Only directories named data under each stack: DOCKER_ROOT/*/data |
rsync into data-YYYY-MM-DD/<stack_name>/ |
Sunday 03:00 |
Config backup uses exclude patterns derived from DOCKER_ROOT/.gitignore when present, plus .env and */data/* always. So secrets (.env) and other gitignored paths are not in the tarball. Included: docker-compose.yml, .env.example, READMEs, and other tracked-style files under each stack dir.
Data backup includes: Only */data directories. Bind mounts that use a path not named data (e.g. ./state, ./config, or custom paths) are not part of the data backup; they may still sit under DOCKER_ROOT and thus be included in the config tarball as part of the directory tree.
| Item | Default | Env override |
|---|---|---|
| Backup root | /backup/docker |
BACKUP_ROOT |
| Docker root | /opt/docker |
DOCKER_ROOT (also used by cron) |
| Retention | 7 days | BACKUP_RETENTION_DAYS |
Old config tarballs and old data-* directories are pruned after each backup run.
| Feature | Description | Method |
|---|---|---|
| Verification | Integrity check using SHA256 checksums + tar test | --verify flag |
| Replication | Offsite backup to TrueNAS via rsync over SSH | --replicate flag |
Verification computes SHA256 checksums for all backups:
- Config backups: Checksum saved to
.sha256file alongside tarball. Tar integrity validated with test listing. - Data backups: Per-file checksums saved in
MANIFEST.sha256within the backup directory.
Replication uses rsync to copy backups to TrueNAS:
- Remote paths:
~/configs/for config tarballs,~/data/for data directories - Authentication: SSH key at
~/.ssh/truenas_backup(user:docker-backup) - Retention: 1 month (managed by TrueNAS snapshots)
- Pre-flight check: Verifies SSH key, host reachability, auth, and remote dirs before attempting replication
Check TrueNAS availability manually:
./scripts/backup.py --check-truenas --humanEnvironment variables:
TRUENAS_HOST(default:<TRUENAS_HOST>)TRUENAS_USER(default:docker-backup)
# Config only (what runs daily)
./scripts/backup.py --verify --replicate --push-metrics --send-log
# Config + data (what runs weekly)
./scripts/backup.py --data --verify --replicate --push-metrics --send-log
# List existing backups
./scripts/backup.py --list --human
# Manual verification and replication
./scripts/backup.py --verify --replicate --humanExit code 0 only if all backup steps succeed. Metrics and logs are sent when --push-metrics and --send-log are used (as in cron).
Logs are managed via logrotate (local retention) and rsync (offsite replication to TrueNAS).
| Path | Content | Local retention |
|---|---|---|
/var/log/ct-controller/*.log |
Script logs (alerting, observability-health) | 7 days, compressed |
/var/log/docker-audit.json |
Weekly audit report | 7 copies, compressed |
/var/log/docker-host.json |
Daily host report | 7 copies, compressed |
Config: /etc/logrotate.d/ct-controller (installed by setup.py --install)
- Daily rotation for all logs
- 7 days retention
- Compressed with delaycompress
Logs are replicated daily alongside backups via backup.py --replicate-logs:
/var/log/ct-controller/→truenas.local:~/logs/ct-controller//var/log/docker-*.json→truenas.local:~/logs/- Retention: 1 month (managed by TrueNAS snapshots)
| When | What |
|---|---|
| Every 5 min | Container health check → Pushgateway |
| Every 10 min | Observability stack health → alerts if Graylog/Prometheus/Grafana/Fluent Bit down |
| Daily 02:00 | Config backup, verify, replicate backups + logs to TrueNAS, cleanup |
| Sunday 03:00 | Full backup (config + data), verify, replicate backups + logs to TrueNAS, cleanup |
| Monday 03:00 | Watchtower update run (default schedule) |
| Saturday 04:00 | docker system prune -f |
| Monday 06:00 | Full audit → /var/log/docker-audit.json |
| Daily 06:30 | Host report → /var/log/docker-host.json |
Cron file: scripts/templates/docker-maintenance.cron, installed to /etc/cron.d/docker-maintenance by scripts/setup.py --install. Prefix for script paths in cron: /opt/docker/scripts/.
| Path | Purpose |
|---|---|
/opt/docker |
Stack roots (production). One dir per stack, each with docker-compose.yml, .env, etc. |
/opt/docker/scripts |
Scripts and templates in production (clone/copy of repo scripts/). |
/backup/docker |
Config tarballs and data-* trees. |
/var/log/docker-audit.json |
Latest audit output (ports, images, validation). |
/var/log/docker-host.json |
Latest host report (hardware, Docker, systemd). |
/var/log/ct-controller/ |
Resilient log files from scripts (see OBSERVABILITY.md). |
- Restore config: Extract the desired
configs-*.tar.gzoverDOCKER_ROOT(or a single stack subdir). Fix ownership if needed (docker-services:docker-services). Restart the stack. - Restore data: Copy from the chosen
data-YYYY-MM-DD/<stack_name>/back to the stack’sdatadirectory (or the paths the stack actually uses). Restart the stack. - Bad Watchtower update: Pin previous image in
docker-compose.yml,systemctl restart docker-compose@<stack>. Optionally exclude that service from Watchtower by removing the label.
- CRON.md — Cron schedule and log destinations
- OBSERVABILITY.md — Logging, metrics, alerting
- STACKS.md — Stack list and ports
- watchtower/README.md — Watchtower config and rollback