Hands on labs and code to help you learn, measure, and build using architectural best practices.
-
Updated
Jan 14, 2026 - Python
Hands on labs and code to help you learn, measure, and build using architectural best practices.
Run a cloud exit assessment on your infrastructure to gain insights into the challenges and constraints of a potential cloud exit.
Chaos engineering systems invented at KTH Royal Institute of Technology. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-320638
🧘♀️ Lightweight fault tolerant primitives for your modern asyncio Python microservices
A collection of Restate examples for AI use cases: agents, A2A, MCP, ...
🔨 📶 WiFi-Jammer/DoS toolset
A Cross-Domain Data Hub with Electricity Market, Coronavirus Case, Mobility and Satellite Data in U.S.
An autonomous SRE agent that monitors cloud logs across multiple platforms, leveraging AI models from various providers to detect anomalies, perform root cause analysis, and automate remediation by creating GitHub Pull Requests.
Policy-driven failure handling for Python services.
Unified resilience patterns for Python — retry, circuit breaker, timeout, fallback, bulkhead, rate limiter, and cache in one decorator. Python's Resilience4j.
Send and receive webhooks on AWS: Innovate with event notifcations.
GoldenEye is a functional simulator with fault injection capabilities for common and emerging numerical formats, implemented for the PyTorch deep learning framework.
This repository contains a Python implementation of a lightweight ZMODEM-like file transfer protocol that is tightly integrated with the MeshCore mesh networking client.
Graph based python library for computing resilience metrics for power distribution systems.
This Guidance helps customers design a resilient batch process application using AWS services
An interpretable early-warning engine that detects academic instability before grades collapse. Instead of predicting performance, it models pressure accumulation, buffer strength, and transition risk using attendance, engagement, and study load to explain fragility and identify high-leverage interventions.
An early-warning system that models disasters as instability transitions rather than isolated events. It combines force-based instability modeling with an interpretable ML escalation-risk layer to detect when hazards become disasters due to exposure growth, response delays, and buffer collapse.
Zero-risk infrastructure chaos simulation — 5 engines, 2000+ scenarios, 3-Layer availability proof. No production fault injection.
Resilient City Toolkit
Add a description, image, and links to the resilience topic page so that developers can more easily learn about it.
To associate your repository with the resilience topic, visit your repo's landing page and select "manage topics."