Standardized Docker container management for a homelab environment. Containers run as systemd services, with unified observability (logs + metrics) and automated lifecycle management.
- Systemd-native lifecycle — Each stack is a systemd service, enabling boot ordering, dependency management, and standard
systemctlcommands - Centralized observability — All logs flow to Graylog; all metrics flow to Prometheus; Grafana provides unified dashboards
- Opt-in automation — Watchtower updates only labeled containers on a controlled schedule
- Explicit resource limits — Every container declares memory/CPU caps to prevent runaway usage
- Health-first orchestration — Services use healthchecks to gate dependent startups
- Programmatic validation — Python-based tooling for structured parsing, validation, and reporting
┌─────────────────────────────────────────────────────────────────────────┐
│ Host │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Application Containers │ │
│ │ graylog pihole unifi homeassistant openclaw │ │
│ └──────┬──────────────────────────┬───────────────────────────────┘ │
│ │ stdout/stderr │ metrics │
│ ▼ ▼ │
│ ┌────────────┐ ┌───────────┐ │
│ │ Fluent Bit │ │ cAdvisor │ │
│ └─────┬──────┘ └─────┬─────┘ │
│ │ GELF │ scrape │
│ ▼ ▼ │
│ ┌──────────┐ ┌────────────┐ │
│ │ Graylog │ │ Prometheus │ │
│ └──────────┘ └──────┬─────┘ │
│ │ │ │
│ └──────────────┬───────────┘ │
│ ▼ │
│ ┌───────────┐ │
│ │ Grafana │ ← dashboards + alerting │
│ └───────────┘ │
│ │
│ Lifecycle: Systemd (boot) + Watchtower (image updates) │
│ Validation: Python scripts → JSON reports │
└─────────────────────────────────────────────────────────────────────────┘
| Stack | Purpose | Ports |
|---|---|---|
| graylog | Log aggregation (MongoDB + OpenSearch + Graylog) | 9000, 514, 1514, 12201 |
| monitoring | Metrics pipeline (Prometheus + cAdvisor + Pushgateway + Grafana) | 3000, 9090, 9091 |
| fluentbit | Log shipper — tails Docker logs, ships to Graylog | — |
| watchtower | Automated container image updates | — |
| homeassistant | Home automation platform | 8123 (host) |
| pihole | DNS sinkhole and ad blocker | 53, 8053 (host) |
| unifi | UniFi network controller | 8443, 8080, 3478 |
| openclaw | AI agent gateway | 18789 |
/opt/docker/ # Production deployment path
├── <stack>/
│ ├── docker-compose.yml # Stack definition
│ ├── .env # Secrets (not in git)
│ ├── .env.example # Template for .env
│ ├── README.md # Stack-specific docs
│ └── data/ # Persistent volumes
└── scripts/ # Python validation & management tools
├── validate.py # Stack validation
├── audit.py # Full infrastructure audit
├── healthcheck.py # Container health checks
├── backup.py # Backup management
├── setup.py # Prerequisites and installation
├── host.py # Host system information
├── lib/ # Core library modules
└── templates/ # Systemd & cron templates
This repository mirrors the structure at /opt/docker/ on the target host.
Compose files are validated for healthcheck, restart policy, resource limits (all ERROR), plus container_name and Watchtower label (WARNING). Full list and severity: docs/STANDARDS.md. Run ./scripts/validate.py to check stacks.
# Validate all stacks (JSON output)
./scripts/validate.py
# Human-readable output
./scripts/validate.py --human
# Validate specific stack
./scripts/validate.py graylog
# Full infrastructure audit
./scripts/audit.py --summary
# Port conflict check
./scripts/audit.py --ports
# Image version audit
./scripts/audit.py --images --human# Start/stop/restart
sudo systemctl start docker-compose@<stack>
sudo systemctl stop docker-compose@<stack>
sudo systemctl restart docker-compose@<stack>
# Enable at boot
sudo systemctl enable docker-compose@<stack>
# View logs
sudo journalctl -u docker-compose@<stack> -f# Check all containers (JSON)
./scripts/healthcheck.py
# Human-readable with failures only
./scripts/healthcheck.py --human --quiet
# With metrics push
./scripts/healthcheck.py --push-metrics --send-log# Backup configurations
./scripts/backup.py
# Backup with data
./scripts/backup.py --data
# List existing backups
./scripts/backup.py --list --human# Full system report (JSON)
./scripts/host.py
# Human-readable
./scripts/host.py --human
# Specific sections
./scripts/host.py --hardware # CPU, memory, disk
./scripts/host.py --docker # Docker daemon info
./scripts/host.py --services # Systemd compose services# Check prerequisites
./scripts/setup.py
# Install/fix issues
./scripts/setup.py --installReference: docs/OBSERVABILITY.md.
| What | Where |
|---|---|
| Logs | Graylog UI (:9000) or docker logs <container> |
| Metrics | Grafana (:3000) or Prometheus (:9090) |
| Script metrics | Pushgateway (:9091) |
| Container stats | docker stats |
Send logs and metrics from scripts, external apps, or ad-hoc debugging sessions.
# Using Python library
from scripts.lib.observability import log_info
log_info("Operation completed", facility="myapp", duration_ms=150)# Direct curl to GELF HTTP
curl -X POST -H "Content-Type: application/json" \
-d '{"version":"1.1","host":"myhost","short_message":"Hello"}' \
http://localhost:12201/gelf# Using Python library
from scripts.lib.observability import metric_gauge
metric_gauge("myapp_items", 42, labels={"env": "prod"})# Direct curl
echo 'myapp_items 42' | curl --data-binary @- http://localhost:9091/metrics/job/myapp# 1. Check prerequisites
./scripts/setup.py
# 2. Install/configure (run fixes)
./scripts/setup.py --install
# 3. Create shared network
docker network create monitoring_net
# 4. Deploy stacks in dependency order
sudo systemctl enable --now docker-compose@graylog
sudo systemctl enable --now docker-compose@fluentbit
sudo systemctl enable --now docker-compose@monitoring
sudo systemctl enable --now docker-compose@watchtower
# ... then application stacks
# 5. Validate
./scripts/validate.py --humanSee each stack's README for specific setup instructions.
.envfiles contain secrets — never commit them (see.gitignore).envpermissions should be600- Containers needing Docker socket (
/var/run/docker.sock) are explicitly documented - Resource limits prevent denial-of-service from runaway containers
The project uses a dedicated service account for file ownership and a group-based access model for operators.
| User/Group | Purpose |
|---|---|
docker-services |
Service account that owns /opt/docker. System user (no login shell). |
docker |
Docker daemon group. Required to run docker commands. |
Operator access: Add your user to both groups to manage the project without sudo:
# Add user to required groups
sudo usermod -aG docker-services $USER
sudo usermod -aG docker $USER
# Apply (or log out and back in)
newgrp docker-servicesDirectory permissions: /opt/docker must have group write and setgid:
| Permission | Purpose |
|---|---|
g+w |
Group members can create/modify files |
g+s (setgid) |
New files inherit docker-services group |
Fix permissions if needed:
sudo chmod -R g+w /opt/docker
sudo find /opt/docker -type d -exec chmod g+s {} \;Or use setup.py:
sudo ./scripts/setup.py --fixVerify access:
# Should show docker-services and docker in groups
id $USER
# Should be able to create files without sudo
touch /opt/docker/test && rm /opt/docker/test-
Create stack directory with required files:
mkdir -p myapp touch myapp/docker-compose.yml myapp/.env.example myapp/README.md
-
Edit
docker-compose.ymlwith required standards (healthcheck, restart, limits, labels) -
Create
.envfrom.env.example -
Create data directories:
sudo mkdir -p /opt/docker/myapp/data sudo chown -R docker-services:docker-services /opt/docker/myapp
-
Enable and start:
sudo systemctl enable --now docker-compose@myapp -
Validate:
./scripts/validate.py myapp
Scheduled maintenance is installed via ./scripts/setup.py --install. Full schedule and commands: docs/CRON.md.
Evergreen reference: docs/*.md (one doc per topic).
- docs/LIFECYCLE.md — stack lifecycle, Watchtower, backups (host-level ops)
- docs/OBSERVABILITY.md — logging, alerting, health endpoints
- docs/STANDARDS.md — validation rules
- docs/SCRIPTS.md — CLI reference
- docs/CRON.md — scheduled jobs
- docs/STACKS.md — stack list and ports
- scripts/ — Python tooling
- scripts/templates/ — systemd and cron templates
- Stack-specific READMEs in each stack directory