Skip to content

Latest commit

 

History

History
342 lines (259 loc) · 11.9 KB

File metadata and controls

342 lines (259 loc) · 11.9 KB

ct-controller

Standardized Docker container management for a homelab environment. Containers run as systemd services, with unified observability (logs + metrics) and automated lifecycle management.

Design Principles

  1. Systemd-native lifecycle — Each stack is a systemd service, enabling boot ordering, dependency management, and standard systemctl commands
  2. Centralized observability — All logs flow to Graylog; all metrics flow to Prometheus; Grafana provides unified dashboards
  3. Opt-in automation — Watchtower updates only labeled containers on a controlled schedule
  4. Explicit resource limits — Every container declares memory/CPU caps to prevent runaway usage
  5. Health-first orchestration — Services use healthchecks to gate dependent startups
  6. Programmatic validation — Python-based tooling for structured parsing, validation, and reporting

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                              Host                                       │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                    Application Containers                        │   │
│  │  graylog   pihole   unifi   homeassistant   openclaw   │   │
│  └──────┬──────────────────────────┬───────────────────────────────┘   │
│         │ stdout/stderr            │ metrics                           │
│         ▼                          ▼                                   │
│  ┌────────────┐             ┌───────────┐                              │
│  │ Fluent Bit │             │  cAdvisor │                              │
│  └─────┬──────┘             └─────┬─────┘                              │
│        │ GELF                     │ scrape                             │
│        ▼                          ▼                                    │
│  ┌──────────┐              ┌────────────┐                              │
│  │ Graylog  │              │ Prometheus │                              │
│  └──────────┘              └──────┬─────┘                              │
│        │                          │                                    │
│        └──────────────┬───────────┘                                    │
│                       ▼                                                │
│                ┌───────────┐                                           │
│                │  Grafana  │  ← dashboards + alerting                  │
│                └───────────┘                                           │
│                                                                        │
│  Lifecycle: Systemd (boot) + Watchtower (image updates)                │
│  Validation: Python scripts → JSON reports                             │
└─────────────────────────────────────────────────────────────────────────┘

Stacks

Stack Purpose Ports
graylog Log aggregation (MongoDB + OpenSearch + Graylog) 9000, 514, 1514, 12201
monitoring Metrics pipeline (Prometheus + cAdvisor + Pushgateway + Grafana) 3000, 9090, 9091
fluentbit Log shipper — tails Docker logs, ships to Graylog
watchtower Automated container image updates
homeassistant Home automation platform 8123 (host)
pihole DNS sinkhole and ad blocker 53, 8053 (host)
unifi UniFi network controller 8443, 8080, 3478
openclaw AI agent gateway 18789

Directory Layout

/opt/docker/                    # Production deployment path
├── <stack>/
│   ├── docker-compose.yml      # Stack definition
│   ├── .env                    # Secrets (not in git)
│   ├── .env.example            # Template for .env
│   ├── README.md               # Stack-specific docs
│   └── data/                   # Persistent volumes
└── scripts/                    # Python validation & management tools
    ├── validate.py             # Stack validation
    ├── audit.py                # Full infrastructure audit
    ├── healthcheck.py          # Container health checks
    ├── backup.py               # Backup management
    ├── setup.py                # Prerequisites and installation
    ├── host.py                 # Host system information
    ├── lib/                    # Core library modules
    └── templates/              # Systemd & cron templates

This repository mirrors the structure at /opt/docker/ on the target host.

Standards

Compose files are validated for healthcheck, restart policy, resource limits (all ERROR), plus container_name and Watchtower label (WARNING). Full list and severity: docs/STANDARDS.md. Run ./scripts/validate.py to check stacks.

Quick Reference

Validation & Audit

# Validate all stacks (JSON output)
./scripts/validate.py

# Human-readable output
./scripts/validate.py --human

# Validate specific stack
./scripts/validate.py graylog

# Full infrastructure audit
./scripts/audit.py --summary

# Port conflict check
./scripts/audit.py --ports

# Image version audit
./scripts/audit.py --images --human

Service Management

# Start/stop/restart
sudo systemctl start docker-compose@<stack>
sudo systemctl stop docker-compose@<stack>
sudo systemctl restart docker-compose@<stack>

# Enable at boot
sudo systemctl enable docker-compose@<stack>

# View logs
sudo journalctl -u docker-compose@<stack> -f

Container Health

# Check all containers (JSON)
./scripts/healthcheck.py

# Human-readable with failures only
./scripts/healthcheck.py --human --quiet

# With metrics push
./scripts/healthcheck.py --push-metrics --send-log

Backup

# Backup configurations
./scripts/backup.py

# Backup with data
./scripts/backup.py --data

# List existing backups
./scripts/backup.py --list --human

Host Information

# Full system report (JSON)
./scripts/host.py

# Human-readable
./scripts/host.py --human

# Specific sections
./scripts/host.py --hardware    # CPU, memory, disk
./scripts/host.py --docker      # Docker daemon info
./scripts/host.py --services    # Systemd compose services

Setup & Prerequisites

# Check prerequisites
./scripts/setup.py

# Install/fix issues
./scripts/setup.py --install

Observability

Reference: docs/OBSERVABILITY.md.

What Where
Logs Graylog UI (:9000) or docker logs <container>
Metrics Grafana (:3000) or Prometheus (:9090)
Script metrics Pushgateway (:9091)
Container stats docker stats

External Ingress

Send logs and metrics from scripts, external apps, or ad-hoc debugging sessions.

Logs → Graylog

# Using Python library
from scripts.lib.observability import log_info
log_info("Operation completed", facility="myapp", duration_ms=150)
# Direct curl to GELF HTTP
curl -X POST -H "Content-Type: application/json" \
  -d '{"version":"1.1","host":"myhost","short_message":"Hello"}' \
  http://localhost:12201/gelf

Metrics → Pushgateway

# Using Python library
from scripts.lib.observability import metric_gauge
metric_gauge("myapp_items", 42, labels={"env": "prod"})
# Direct curl
echo 'myapp_items 42' | curl --data-binary @- http://localhost:9091/metrics/job/myapp

Initial Setup

# 1. Check prerequisites
./scripts/setup.py

# 2. Install/configure (run fixes)
./scripts/setup.py --install

# 3. Create shared network
docker network create monitoring_net

# 4. Deploy stacks in dependency order
sudo systemctl enable --now docker-compose@graylog
sudo systemctl enable --now docker-compose@fluentbit
sudo systemctl enable --now docker-compose@monitoring
sudo systemctl enable --now docker-compose@watchtower
# ... then application stacks

# 5. Validate
./scripts/validate.py --human

See each stack's README for specific setup instructions.

Security

  • .env files contain secrets — never commit them (see .gitignore)
  • .env permissions should be 600
  • Containers needing Docker socket (/var/run/docker.sock) are explicitly documented
  • Resource limits prevent denial-of-service from runaway containers

User and Group Access

The project uses a dedicated service account for file ownership and a group-based access model for operators.

User/Group Purpose
docker-services Service account that owns /opt/docker. System user (no login shell).
docker Docker daemon group. Required to run docker commands.

Operator access: Add your user to both groups to manage the project without sudo:

# Add user to required groups
sudo usermod -aG docker-services $USER
sudo usermod -aG docker $USER

# Apply (or log out and back in)
newgrp docker-services

Directory permissions: /opt/docker must have group write and setgid:

Permission Purpose
g+w Group members can create/modify files
g+s (setgid) New files inherit docker-services group

Fix permissions if needed:

sudo chmod -R g+w /opt/docker
sudo find /opt/docker -type d -exec chmod g+s {} \;

Or use setup.py:

sudo ./scripts/setup.py --fix

Verify access:

# Should show docker-services and docker in groups
id $USER

# Should be able to create files without sudo
touch /opt/docker/test && rm /opt/docker/test

Adding a New Stack

  1. Create stack directory with required files:

    mkdir -p myapp
    touch myapp/docker-compose.yml myapp/.env.example myapp/README.md
  2. Edit docker-compose.yml with required standards (healthcheck, restart, limits, labels)

  3. Create .env from .env.example

  4. Create data directories:

    sudo mkdir -p /opt/docker/myapp/data
    sudo chown -R docker-services:docker-services /opt/docker/myapp
  5. Enable and start:

    sudo systemctl enable --now docker-compose@myapp
  6. Validate:

    ./scripts/validate.py myapp

Cron Jobs

Scheduled maintenance is installed via ./scripts/setup.py --install. Full schedule and commands: docs/CRON.md.

Related Documentation

Evergreen reference: docs/*.md (one doc per topic).