A curated list of Site Reliability and Production Engineering resources.
-
Updated
Jun 10, 2024
A curated list of Site Reliability and Production Engineering resources.
A collection of postmortem templates
A role-playing game for incident management training
A party card game for engineers caring about reliability. Based on Cards Against Humanity.
A curated list of awesome Site Reliability and Production Engineering resources.
Calculate how much downtime should be permitted in your Service Level Agreement or Objective
A collection templates ported from the SRE Workbook
A list of common Disaster Recovery (DR) scenarios for software companies
An ongoing & curated collection of awesome SRE software and tools, libraries and frameworks, engineering books and blogs, philosophical principles, technical guidelines, practical tools about the field of Site Reliablity Engineering (SRE)
🔖 Daily-updated reading list for designing High Scalability 🍒, High Availability 🔥, High Stability 🗻 back-end systems - Pull requests are greatly welcome 👬 I hope you will find this project helpful 🍀 Please help me share it to more and more people ❤️ Thank you - 谢谢 - धन्यवाद - ধন্যবাদ - Спасибо - شكرا - Merci - Gracias - Danke - Cảm ơn! 🙇
Overall map of topics to cover for my “Engineering for Site Reliability” blog series.
A .Net Standard library for working with the Uptime Robot API.
Gerd by Onyx is a light-weight chaos monkey implementation for k8s (kubernetes)
Add a description, image, and links to the site-reliability topic page so that developers can more easily learn about it.
To associate your repository with the site-reliability topic, visit your repo's landing page and select "manage topics."