An enterprise-grade, HIPAA-compliant observability platform that simulates the central nervous system of a hospital. Built with zero-trust architecture, automated SSH remediation, and a 100-node Ansible deployment mesh.
| Component | What It Does |
|---|---|
| 🧠 The Brain | Centralized Docker cluster running Prometheus, Grafana, Loki, MinIO, and Tempo. |
| 🫀 The Organs | 100+ bare-metal nodes monitored via Ansible-deployed Exporters & Promtail agents. |
| 🤖 Auto-Healer | Asynchronous Node.js agent executing SSH scripts instantly when severity="fatal" fires. |
| 📧 Email Relay | Intercepts raw JSON webhooks and translates them into styled HTML alerts via SMTP. |
| 🌐 Synthetics | Robotic Ping/SSL checkers ensuring 99.99% uptime of public-facing patient portals. |
| 📊 Status Page | Public React/Express UI showcasing SLA metrics directly fed from the Prometheus database. |
Every Fortune 500 hospital requires a HIPAA-Compliant Command Center to track data securely without sending PHI to the cloud. A human DevOps engineer simply cannot monitor 100 servers simultaneously. Fortress automates this entire pipeline—from log ingestion to anomaly detection to robotic self-healing.
%%{init: {'theme': 'dark', 'themeVariables': {'fontSize': '16px'}}}%%
graph LR
ORG["🫀 The Organs\n(Ansible Fleet)"] -->|Streams Metrics + Logs| PROM["🧠 Prometheus & Loki\n(Central Database)"]
PROM -->|Evaluates PromQL/LogQL| ALERT["🚨 Alertmanager\n(Routing Engine)"]
ALERT -->|Webhooks JSON| HEALER["🤖 Auto-Healer\n(Node.js Remediation)"]
ALERT -->|Webhooks JSON| EMAIL["📧 Email Relay\n(SMTP Notification)"]
ALERT -.->|Optional| SMS["📱 SMS / Slack Integration"]
HEALER -->|Executes SSH Scripts| ORG
PROM -->|Visualizes Data| GRAF["📊 Grafana\n(Master Dashboard)"]
| Component | Stack | Purpose |
|---|---|---|
| Metrics DB | Prometheus v2.51.0 | Pulls time-series data from all endpoints; evaluates mathematical anomaly rules. |
| Log DB | Loki & MinIO (S3) | Aggregates application logs with a strict 90-day HIPAA retention compactor. |
| Master UI | Grafana v10.4.2 | Central visual command center for tracing bottlenecks. |
| Auto-Healer | Node.js + SSH2 | The robotic system administrator that fixes issues before humans wake up. |
| Organs (Agents) | Node Exporter / PM2 / Promtail | Lightweight trackers installed on all target machines via Ansible. |
Self-Healing loop: When PM2 detects an application crash, Prometheus fires a fatal alert. The Auto-Healer receives the alert, parses the IP, SSHs into the machine, and restarts the Node.js process—recovering the hospital system in under 1.5 seconds.
This section outlines the exact steps required to transform the dummy configurations in this repository into a live, production-ready observability platform tailored to your specific environment.
The "Brain" is the central command server that runs the entire monitoring stack via Docker Compose.
1. Clone the Repository:
git clone https://github.com/YourRepo/Observational-Dashboard /opt/fortress
cd /opt/fortress2. Generate and Configure the .env File:
The .env file holds all your secure passwords and configuration variables. Never commit this file to GitHub.
cp .env.example .env
chmod 600 .envOpen the .env file using nano .env and replace the following Dummy Values with your Real Values:
GF_SECURITY_ADMIN_PASSWORD: Replacedummy-passwordwith a highly secure string. This is your master Grafana login.SMTP_HOST&SMTP_PORT: Replace with your actual email relay (e.g.,smtp.office365.comor your internal hospital SMTP).SMTP_USER&SMTP_PASS: Replace with the actual email account credentials that will be sending the alerts.EMAIL_TO: Replacedevops@hospital.internalwith the real email address of your IT or DevOps team.CORS_ORIGINS: Replace with the actual domain where your Status Page will be hosted.
3. Launch the Stack:
docker compose -f docker-compose.yml -f docker-compose.local.yml up -d --buildNote: On first startup, the MinIO container will run a brief job to initialize the S3 buckets for Loki's 90-day retention policy.
You do NOT manually install tracking software on your application servers. You use Ansible to automatically deploy the agents (Node Exporter, PM2 Exporter, Promtail) to hundreds of servers simultaneously.
1. Replace Dummy IPs with Real Target IPs: Open the Ansible inventory file:
nano fortress-ansible/inventory/hosts.iniUnder the [appservers] block, you will see dummy lines like app-001 ansible_host=10.0.1.1. Delete these dummy lines and replace them with the actual IP addresses of your real production servers:
[appservers]
database-server ansible_host=192.168.10.50
web-portal ansible_host=192.168.10.512. Configure Passwordless SSH: Ansible does not use passwords. You must generate an SSH key on your Brain server and distribute it to your target nodes.
ssh-keygen -t rsa -b 4096 -f ~/.ssh/fortress_deploy
ssh-copy-id -i ~/.ssh/fortress_deploy deploy@192.168.10.503. Execute the Playbook:
cd fortress-ansible
ansible-playbook -i inventory/hosts.ini site.ymlThe Auto-Healer relies on SSH keys to access your target servers and execute recovery commands.
- Ensure the Brain Server has a valid private SSH key that can access the target servers.
- Open your
.envfile and verify theSSH_KEY_PATHpoints to the exact absolute path of your private key (e.g.,/etc/ssh_keys/id_rsa). - You must bind-mount this key into the
auto-healercontainer within yourdocker-compose.ymlso the Node.js script has access to it.
By default, Fortress routes all critical alerts to the Email Relay. However, Alertmanager is highly extensible and can route alerts to SMS providers, Slack, Microsoft Teams, or PagerDuty.
To add a new alerting channel, you need to modify the alertmanager/alertmanager.yml file.
If your hospital requires text messages for fatal alerts, you can point Alertmanager to an SMS API webhook.
Open alertmanager/alertmanager.yml and add a new receiver:
receivers:
- name: 'sms-api'
webhook_configs:
- url: 'https://api.your-sms-provider.com/send?token=YOUR_API_TOKEN'
send_resolved: trueThen, update the route block at the top of the file to send alerts to the SMS receiver:
route:
receiver: 'sms-api'
routes:
- matchers:
- severity="fatal"
receiver: 'sms-api'To send alerts directly to a DevOps Slack channel:
receivers:
- name: 'slack-devops'
slack_configs:
- api_url: 'https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX'
channel: '#server-alerts'
title: '{{ template "slack.default.title" . }}'
text: '{{ template "slack.default.text" . }}'Whenever you modify alertmanager/alertmanager.yml, you must reload Alertmanager for the changes to take effect:
curl -X POST http://localhost:9093/-/reloadFortress includes a Blackbox Exporter that acts as a robotic user to verify that your public-facing websites are online and that their SSL certificates haven't expired.
Replacing Dummy URLs:
- Open the file
prometheus/prometheus.local.yml. - Scroll down to the
blackbox_pingandblackbox_ssljobs. - You will see dummy URLs like
https://www.example.com. Delete these and replace them with the real, public URLs of your hospital patient portals or APIs.- targets: - https://patient-portal.yourhospital.com - https://api.yourhospital.com/v1/health
- Reload Prometheus to instantly begin tracking the new URLs:
curl -X POST http://localhost:9090/-/reload
The public Status Page will instantly read these new targets from Prometheus and display their live SLA metrics to your users.
