Skip to content

antonym/openstack-canary

Repository files navigation

OpenStack Canary Monitoring System

⚠️ PROOF OF CONCEPT - NOT PRODUCTION READY

This is a proof-of-concept system generated by Claude AI and is UNTESTED. Do not deploy this in production environments without thorough testing, security review, and validation. Use at your own risk.

A comprehensive early warning system for OpenStack cloud infrastructure that monitors dataplane health across multiple datacenters and availability zones.

Overview

The OpenStack Canary system provides proactive monitoring of your OpenStack infrastructure by:

  • Multi-AZ Deployment: Distributes canary instances across availability zones
  • Synthetic Traffic Generation: Creates realistic workload patterns between instances
  • Comprehensive Metrics: Tracks latency, throughput, system health, and application performance
  • Early Warning Detection: Identifies dataplane issues before they impact production workloads
  • Datadog Integration: Real-time monitoring, alerting, and dashboards

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    OpenStack Cloud Infrastructure                │
├─────────────────┬─────────────────┬─────────────────────────────┤
│   Datacenter 1  │   Datacenter 2  │         Datacenter N        │
├─────────────────┼─────────────────┼─────────────────────────────┤
│ ┌─────────────┐ │ ┌─────────────┐ │ ┌─────────────┬─────────────┐│
│ │    AZ-A     │ │ │    AZ-A     │ │ │    AZ-A     │    AZ-B     ││
│ │ ┌─────────┐ │ │ │ ┌─────────┐ │ │ │ ┌─────────┐ │ ┌─────────┐ ││
│ │ │ Canary  │ │ │ │ │ Canary  │ │ │ │ │ Canary  │ │ │ Canary  │ ││
│ │ │Instance │ │ │ │ │Instance │ │ │ │ │Instance │ │ │Instance │ ││
│ │ └─────────┘ │ │ │ └─────────┘ │ │ │ └─────────┘ │ └─────────┘ ││
│ └─────────────┘ │ └─────────────┘ │ └─────────────┴─────────────┘│
└─────────────────┴─────────────────┴─────────────────────────────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │     Datadog     │
                    │   Monitoring    │
                    │   & Alerting    │
                    └─────────────────┘

Components

  1. Canary Application (app.py): Core web service with health endpoints
  2. Traffic Generator (traffic_generator.py): Synthetic workload creation
  3. System Monitor (system_monitor.py): OS-level monitoring and alerts
  4. Datadog Integration (datadog_config.py): Metrics, dashboards, and alerting
  5. Deployment Automation: Heat templates and Docker deployment scripts

Quick Start

Docker Deployment (Recommended)

  1. Clone and configure:

    git clone <repository>
    cd openstack-canary
  2. Set up environment:

    cp .env.example .env
    # Edit .env with your configuration
  3. Deploy:

    ./docker-deploy.sh start --datacenter dc1 --az zone-a
  4. Verify health:

    curl http://localhost:8080/health

OpenStack Heat Deployment

  1. Configure deployment:

    cp deploy-config.env.example deploy-config.env
    # Edit with your OpenStack configuration
  2. Deploy to OpenStack:

    ./deploy.sh deploy --datacenter dc1 --azs "nova,zone-a,zone-b"
  3. Set up monitoring:

    ./deploy.sh setup-datadog --dd-api-key YOUR_KEY --dd-app-key YOUR_APP_KEY

Configuration

Environment Variables

Variable Description Default
CANARY_ID Unique identifier for canary instance canary-{hostname}
DATACENTER Datacenter name for tagging unknown
AVAILABILITY_ZONE AZ name for tagging unknown
DD_API_KEY Datadog API key -
DD_APP_KEY Datadog Application key -
PEER_ENDPOINTS Comma-separated peer endpoints -
TRAFFIC_INTERVAL Traffic generation interval (seconds) 10
MONITORING_INTERVAL System monitoring interval (seconds) 30

Alert Thresholds

Metric Warning Critical
CPU Usage 70% 90%
Memory Usage 80% 95%
Disk Usage 85% 95%
Load Average 5.0 10.0
Error Rate 5% 10%
Peer Connectivity 70% 50%

API Endpoints

Health Checks

  • GET /health - Basic health status
  • GET /health/detailed - Detailed system metrics

Connectivity Tests

  • GET /connectivity - Test connectivity to peer instances
  • GET /load-test - Generate synthetic load

Metrics

  • GET /metrics - Prometheus-style metrics

Example Response

{
  "status": "healthy",
  "canary_id": "canary-dc1-zone-a-001",
  "datacenter": "dc1",
  "availability_zone": "zone-a",
  "timestamp": "2024-01-15T10:30:00Z",
  "uptime_seconds": 3600,
  "system_metrics": {
    "cpu_percent": 15.2,
    "memory": {
      "percent": 45.8,
      "available": 2147483648
    },
    "disk": {
      "percent": 25.3,
      "free": 8589934592
    }
  }
}

Monitoring & Alerting

Datadog Integration

The system automatically creates:

  • Dashboard: Comprehensive overview of all canary instances
  • Alerts: Proactive notifications for infrastructure issues
  • SLOs: Service level objectives for availability tracking

Key Metrics

  • canary.health_check - Health check frequency
  • canary.peer_latency - Inter-instance latency
  • canary.traffic_gen.success_rate - Traffic generation success rate
  • canary.system.cpu_percent - CPU utilization
  • canary.system.memory_percent - Memory utilization

Alert Examples

  • High Error Rate: Error rate > 10% for 5 minutes
  • Instance Down: No health checks for 10 minutes
  • High Latency: Inter-DC latency > 1000ms for 10 minutes
  • Resource Exhaustion: CPU > 90% or Memory > 95% for 15 minutes

Deployment Options

1. Docker Deployment

Pros: Easy setup, consistent environment, quick development Cons: Limited OS-level monitoring, single-host deployment

# Start all services
./docker-deploy.sh start

# View logs
./docker-deploy.sh logs canary-app

# Scale services
./docker-deploy.sh scale canary-app=3

# Health check
./docker-deploy.sh health

2. OpenStack Heat Deployment

Pros: Multi-AZ deployment, native OpenStack integration, scalable Cons: Requires OpenStack environment, more complex setup

# Deploy across multiple AZs
./deploy.sh deploy -d dc1 -a "nova,zone-a,zone-b" -c 2

# Update deployment
./deploy.sh update -n canary-prod

# Check status
./deploy.sh status -n canary-prod

# View logs
./deploy.sh logs -n canary-prod

3. Manual Deployment

For custom environments:

# Install dependencies
pip install -r requirements.txt

# Start canary application
gunicorn --bind 0.0.0.0:8080 app:app

# Start traffic generator (separate terminal)
python traffic_generator.py

# Start system monitor (separate terminal)
python system_monitor.py

Development

Running Tests

# Unit tests
python -m pytest tests/

# Integration tests
python -m pytest tests/integration/

# Load tests
python -m pytest tests/load/

Building Custom Images

# Build Docker image
docker build -t canary:latest .

# Build with custom tag
docker build -t canary:v1.2.3 .

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

Troubleshooting

Common Issues

Service Not Starting

# Check logs
./docker-deploy.sh logs canary-app

# Check container status
docker ps -a | grep canary

# Restart services
./docker-deploy.sh restart

High Memory Usage

# Check system resources
curl localhost:8080/health/detailed

# View container stats
docker stats canary-app

Connectivity Issues

# Test connectivity
curl localhost:8080/connectivity

# Check network configuration
docker network ls
docker network inspect openstack-canary_canary-network

Datadog Metrics Missing

# Verify API keys
echo $DD_API_KEY | cut -c1-8

# Check agent status
docker exec datadog-agent agent status

# Restart agent
docker restart datadog-agent

Log Locations

  • Docker: docker logs <container_name>
  • OpenStack: /var/log/canary/
  • System: /var/log/syslog

Performance Tuning

High Load Scenarios

# Increase worker processes
export GUNICORN_WORKERS=4

# Adjust traffic intervals
export TRAFFIC_INTERVAL=30

# Scale horizontally
./deploy.sh scale 5

Resource Optimization

# Monitor resource usage
curl localhost:8080/metrics | grep canary_

# Adjust monitoring intervals
export MONITORING_INTERVAL=60

Security Considerations

  • Network Security: Use security groups to restrict access
  • API Keys: Store Datadog keys securely, use environment variables
  • Container Security: Run containers as non-root user
  • TLS: Enable HTTPS for production deployments
  • Monitoring: Monitor for unusual traffic patterns

Maintenance

Regular Tasks

  • Weekly: Review error rates and performance metrics
  • Monthly: Update Docker images and dependencies
  • Quarterly: Review and update alert thresholds

Backup & Recovery

# Backup configuration and data
./docker-deploy.sh backup

# Restore from backup
./docker-deploy.sh restore /path/to/backup

Updates

# Update Docker deployment
./docker-deploy.sh update

# Update OpenStack deployment
./deploy.sh update -n canary-prod

Support

For issues and questions:

  1. Check the Troubleshooting section
  2. Review logs for error messages
  3. Check Datadog dashboard for system health
  4. Contact the infrastructure team

Disclaimer

⚠️ This is a proof-of-concept generated by Claude AI and is UNTESTED. See DISCLAIMER.md for full details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

v1.0.0 (2024-01-15)

  • Initial release
  • Multi-AZ deployment support
  • Datadog integration
  • Docker containerization
  • OpenStack Heat templates
  • Comprehensive monitoring and alerting

About

openstack-canary

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published