Shoutout: This work was made possibly by Circonus -- the monitoring system with full histogram support.
Gathering all kinds of telemetry data is key to operating reliable distributed systems at scale. Once you have set-up your monitoring systems and recorded all relevant data, the challenge becomes to make sense of it and extract valuable information, like:
- Are we fulfilling our SLA?
- How did our query response times change with the last update?
Statistics is the art of extracting information from data. In this tutorial, we address the basic statistical knowledge that helps you at your daily work as a system operator. We will cover probabilistic models, summarizing distributions with mean values, quantiles, and histograms and their relations. Also advanced topics like time series forecasting and scalability analysis will be touched.
The tutorial focuses on practical aspects and will give you hands on knowledge of how to handle, import, analyze, and visualize telemetry data with UNIX command line tools, gnuplot, and the iPython toolkit.
- Introduction
- Visualizing Data
- Histograms
- Summary Statistics
- Quantiles and Outliers
- Forecasting
- Queuing Theory
If you have access to a machine with docker installed, you can boostrap an interactive working environment with a single command:
$ ./docker.sh
[...]
#
# Data Science 4 Effective Operations
#
# starting jupyter notebook&lab ...
done
#
# Notebook:
# * local url: http://0.0.0.0:9999/?token=F2AlHtJBvHIqoLFEVfbMnUVFkcpFlJuZ
# * public url: http://11.22.33.192:9999/?token=F2AlHtJBvHIqoLFEVfbMnUVFkcpFlJuZ
#
# Lab:
# * local url: http://0.0.0.0:9998/?token=F2AlHtJBvHIqoLFEVfbMnUVFkcpFlJuZ
# * public url: http://11.22.33.192:9998/?token=F2AlHtJBvHIqoLFEVfbMnUVFkcpFlJuZ
Sign-up to the mailing list, to get notified about upcoming Statistics for Engieners events.
This workshop has been held in at a number of events in slightly different forms.
- 2019-10-02 SRECon19, Dublin, Ireland
- 2018-08-29 SRECon18, Düsseldorf, Germany
- 2016-06-12 SRECon16, Dublin, Ireland
- 2015-10-28 Velocity, Amsterdam, Netherlands
- 2015-07-29 StatsCraft, Tel-Aviv, Israel
- 2015-05-14 SRECon15, Dublin, Ireland
See the corresponding subfolders for the presented content.
If you want to be informed about upcoming events consider watch out for the following hashtag on Twitter: #StatsForEngineers
-
Event: 2016-06-29 Monitorama, Portland USA
-
Video: https://vimeo.com/173610069
A writeup of the material was published in print by the CACM and the ACM Queue magazine.
-
ACM Queue 14/1: https://queue.acm.org/detail.cfm?id=2903468
-
CACM Vol. 59 No. 7 (paywalled): https://cacm.acm.org/magazines/2016/7/204029-statistics-for-engineers/abstract
-
Statistics
- Rice - Mathematical Statistics and Data Analysis (advanced)
-
Tools
- McKinney - Python for Data Anlaysis
- Janert - Data Analysis with Open Source tools
- Janssens - Data Science at the Command Line (O'Reilly 2015)
-
Queuing Theory
-
dc1cpu.csv - CPU utilization of a machine cluster
-
LogDB.out - DB log file
-
LogDB.csv - Parsed into CSV
-
API_latencies.csv - API latency for individual requests
-
ReqMultiNode.csv - Request rates for a cluster of nodes
-
WebLatency.csv - Ping latencies for a server measured from different locations