This README is just a quick overview of the Skream project.
Skream is a high-performance time-series library with memory-footprint guarantees. The name is short for "online SKetching and stREAMing," and is pronounced like "scream" in English. For scalability, Skream is written in Clojure and has zero side-effects. The library includes a simple RESTful web-service for clients.
Skream views a time-series as a simple sequence of numbers. These numbers are read sequentially and in order, with minimal state maintained. These sequences can be sensor readings from an Internet -of- things project, or stock prices in an HFT strategy.
What queries does a Skream currently support?
- Basic summary statistics (count, minimum, maximum, sum, mean / average)
- Variance & standard deviation
- Skewness & kurtosis
- Arbitrary higher-order moments (standardized & unstandardized)
- Range counts
- Gaussian range counts (e.g. count of elements within 0.42 standard deviations)
- Histograms (evenly-spaced & Gaussian bins)
- Exponential moving average
- Simple moving average
- Approximate membership (via Bloom filters)
- Approximate individual element counts (via Count-Min sketches)
- Distinct element count (via HyperLogLog sketches)
- Approximate median (via P2 algorithm)
- Approximate arbitrary quantiles (e.g. 25% "median")
- Approximate mutual information between two Skreams (via histograms)
All of these queries are supported with a fixed memory footprint, new numbers added sequentially or in an online sense. The exception is simple moving average queries, which require a window of recent numbers maintained as state.
Skream is released under the Eclipse Public License, so you can easily incorporate the library into your commercial or non-commercial projects.
0.1 Alpha
Skream is a Leiningen Clojure project with decent automated test coverage. The main data-structure are simple Clojure maps with sequential updates handled by functions in map metadata.
Everything is done without side-effects, in-memory, and with only the minimal amount of state. This provides fundamental scalability across large time-series. Side-effect-less updates are done in parallel, utilizing every core (CPU) on the server.