From 400833d76593424b5b90ae001589f7fb86a39289 Mon Sep 17 00:00:00 2001 From: yolgun Date: Wed, 6 Jul 2016 17:33:15 +0200 Subject: [PATCH 1/3] Update/Create 2016-07-06-Site-Reliability-Engineering---Notes.md --- _posts/2016-07-06-Site-Reliability-Engineering---Notes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2016-07-06-Site-Reliability-Engineering---Notes.md b/_posts/2016-07-06-Site-Reliability-Engineering---Notes.md index b92254e..3d62292 100644 --- a/_posts/2016-07-06-Site-Reliability-Engineering---Notes.md +++ b/_posts/2016-07-06-Site-Reliability-Engineering---Notes.md @@ -6,5 +6,5 @@ permalink: /site-reliability-engineering---notes/ source-id: 1pxEcqinpJ7afV4pKyH5gD4vrS29js7J0OQh9-lRN3co published: true --- -Foo Bar +Foo Bar BAR BAR From f89d7a64a58124f5a66a666b9e1ba96c8cca3c66 Mon Sep 17 00:00:00 2001 From: yolgun Date: Mon, 11 Jul 2016 12:55:00 +0200 Subject: [PATCH 2/3] Update/Create 2016-07-06-Site-Reliability-Engineering---Notes.md --- ...06-Site-Reliability-Engineering---Notes.md | 28 ++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/_posts/2016-07-06-Site-Reliability-Engineering---Notes.md b/_posts/2016-07-06-Site-Reliability-Engineering---Notes.md index 3d62292..e64c4c7 100644 --- a/_posts/2016-07-06-Site-Reliability-Engineering---Notes.md +++ b/_posts/2016-07-06-Site-Reliability-Engineering---Notes.md @@ -6,5 +6,31 @@ permalink: /site-reliability-engineering---notes/ source-id: 1pxEcqinpJ7afV4pKyH5gD4vrS29js7J0OQh9-lRN3co published: true --- -Foo Bar BAR BAR +1. **Introduction** + +* Hope is not a strategy. + +* Traditional sysadmins approach is not scalable with traffic. + +* Sysadmins want stability; developers want features. This may cause strife between teams. + +* UNIX internals and networking(Layer 1 to Layer 3) knowledge is a plus for SRE work. + +* 50% cap on "ops" work vs development for SREs. + +* Availability, Latency, Performance, Efficiency, Change, Monitoring, Emergency, Capacity. + +* 100% reliability target is hard to achieve and almost always unnecessary. + +* Remaining time from SLO(e.g. 99.9% availability) makes **error budget**. Spend it on new features. + +* Software should monitor and humans should only be alerted when they need to take action. + +* **Monitoring** output: Alerts(immediate action), tickets(relaxed action), logging(only when asked to look). + +* Disaster playbooks are very helpful to reduce MTTR(mean time to repair) and improve **emergency response**. + +* **Change**: Progressive rollouts -> Detect problems -> Roll back in case of problems. + +* **Capacity Planning**: Organic and inorganic demand casting, regular load testing. From bcec1639497a953f680d1b981a6993c0d459a8b1 Mon Sep 17 00:00:00 2001 From: Yunus Admin Date: Mon, 18 Jul 2016 17:47:58 +0200 Subject: [PATCH 3/3] message 1 --- trying.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 trying.txt diff --git a/trying.txt b/trying.txt new file mode 100644 index 0000000..1050001 --- /dev/null +++ b/trying.txt @@ -0,0 +1 @@ +asd \ No newline at end of file