AI Incident Commander is a simple on‑call assistant that turns raw system alerts into a clear, human‑readable incident story. It opens incidents, summarizes what’s happening, suggests what to do next, and captures a post‑incident summary.
- Watches for problems (errors, latency spikes, crashes).
- Creates a new incident automatically when something breaks.
- Summarizes the symptoms and highlights likely causes.
- Suggests next steps using runbooks.
- Drafts an incident summary for postmortems.
- Saves time during incidents by reducing noise.
- Gives non-experts a clear, readable incident overview.
- Provides a repeatable demo of “AI + ops” that’s easy to show in a portfolio.
- A demo service fails (simulated fault).
- The backend records the incident and timeline events.
- An analysis pipeline creates a summary and stores it in S3.
- The React dashboard shows the incident list and timeline.
- A live incidents list with severity, service, and status.
- A timeline of events and analysis updates.
- A “Request Analysis” button that triggers the workflow.
- Runbook recommendations filtered by service/tag.
- Frontend: React dashboard hosted on S3 + CloudFront.
- Mobile: Flutter app for alerts and incident summaries.
- Backend: Node.js Lambdas behind API Gateway for incidents/runbooks/comments.
- AI/Analysis: Python Lambda (stub) invoked via Step Functions, stores summaries in S3.
- Workflow: EventBridge for incident events, Step Functions for analysis pipeline.
- Data: DynamoDB for incident state, S3 for logs/summaries, optional OpenSearch for search.
- Infra: CDK stacks, EKS demo services with OpenTelemetry and chaos testing.
apps/web: React dashboard.apps/mobile: Flutter alerts app.services/api: Node.js API Lambdas.services/analysis: Python analysis Lambdas.infra: AWS CDK infrastructure.docs: Architecture, data model, API, and demo flow.
docs/ARCHITECTURE.mddocs/DATA_MODEL.mddocs/API.mddocs/DEMO.md
From the repo root:
./scripts/deploy.ps1
Optional:
./scripts/deploy.ps1 -ApiBaseUrl https://your-api-id.execute-api.region.amazonaws.com
./scripts/deploy.ps1 -SkipWebBuild
./scripts/deploy.ps1 -SkipWebDeploy
The script deploys CDK infra, builds the web app, and syncs apps/web/dist to the frontend S3 bucket (with a CloudFront invalidation).
Core API + dashboard + analysis stub are working. Next steps: replace analysis stub with Bedrock, expand runbook management, and add real observability signal ingestion.