A curated list of Site Reliability and Production Engineering resources.
-
Updated
Jun 10, 2024
A curated list of Site Reliability and Production Engineering resources.
Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
A checklist of anyone practicing Site Reliability Engineering
Hands on labs and code to help you learn, measure, and build using architectural best practices.
Chaos Engineering Toolkit & Orchestration for Developers
A curated list of Site Reliability and Production Engineering Tools
This repository provides a design methodology and approach to building highly-reliable applications on Microsoft Azure for mission-critical workloads.
Reliability engineering toolkit for Python - https://reliability.readthedocs.io/en/latest/
Serverless chaos monkey for AWS (runs on AWS Lambda) ☁️ 💥
OpenShift Guide. Learn about the Red Hat OpenShift Container Platform, Data Science, Code Ready Containers, Podman, Buildah, and Kubernetes.
Probabilistic Risk Analysis Tool (fault tree analysis, event tree analysis, etc.)
A curated list of awesome Site Reliability and Production Engineering resources.
The k6 documentation website.
GOV.UK PaaS - Cloud Foundry
The Chaos Toolkit core library
A collection of SRE tools
An opinionated list of attributes and policies that need to be met in order to establish a stable software system.
A terraform provider for Concourse
A Python package for survival analysis. The most flexible survival analysis package available. SurPyval can work with arbitrary combinations of observed, censored, and truncated data. SurPyval can also fit distributions with 'offsets' with ease, for example the three parameter Weibull distribution.
A collection templates ported from the SRE Workbook
Add a description, image, and links to the reliability-engineering topic page so that developers can more easily learn about it.
To associate your repository with the reliability-engineering topic, visit your repo's landing page and select "manage topics."