Skip to content
/ alea Public

Repeatable pseudo-random sampling, CDF over most known probability distributions.

License

Notifications You must be signed in to change notification settings

nin93/alea

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Alea

Build Status Crystal Shard Docs License

Alea is a collection of utilities to work with most known probability distributions, written in pure Crystal.

Note: This project is in development state and many distributions are still missing, as well as cumulative distribution functions, so keep in mind that breaking changes may occur frequently.

Why Crystal?

Crystal compiles to really fast native code without sacrificing any of the modern programming languages standards providing a nice and clean interface.

Index

Features

Currently Available

  • PRNGs implementations
  • Random sampling (single/double precision)
  • Cumulative Distribution Functions (single/double precision)

Supported Distributions

Distribution Sampling (32 / 64) CDF (32 / 64)
Beta Y Y N N
Chi-Square Y Y Y Y
Exponential Y Y Y Y
F-Snedecor Y Y N N
Gamma Y Y Y Y
Laplace Y Y Y Y
Log-Normal Y Y Y Y
Normal Y Y Y Y
Poisson N Y N Y
T-Student Y Y N N
Uniform Y Y Y Y

Projects

  • Distribution and empirical data statistical properties
  • Quantile Functions

Installation

  1. Add the dependency to your shard.yml:
dependencies:
  alea:
    github: nin93/alea
  1. Run shards install

  2. Import the library:

require "alea"

Usage

Sampling

Random is the interface provided to perform sampling:

random = Alea::Random(Alea::XSR128).new
random.normal # => -0.36790519967553736 : Float64

# Append '32' to call the single-precision version
random.normal32 # => 0.19756398 : Float32

It also accepts an initial seed to reproduce the same seemingly random events across runs:

seed = 9377
random = Alea::Random(Alea::XSR128).new(seed)
random.exp # => 0.10203669577353723 : Float64

Unsafe Methods

Plain sampling methods (such as #normal, #gamma32) performs checks over arguments passed to prevent bad data generation or inner exceptions. In order to avoid checks (might be slow in a large data generation) you must use their unsafe version by prepending next_ to them:

random = Alea::Random(Alea::XSR128).new
random.normal(loc: 0, sigma: 0)      # raises Alea::UndefinedError: sigma is 0 or negative.
random.next_normal(loc: 0, sigma: 0) # these might raise internal exceptions.

Timings are definitely comparable, though: see the benchmarks for direct comparisons between these methods.

PRNGs

Random is actually a wrapper over a well defined pseudo-random number generator. The basic generation of integers and floats comes from the underlying engine, more specifically from: #next_u32, returning a random UInt32, and #next_u64, returning a random UInt64. Floats are obtained by ldexp (load exponent) operations upon generated unsigned integers; signed integers are obtained by raw cast.

Currently implemented engines:

  • XSR128 backed by xoroshiro128++ (32/64 bit)
  • XSR256 backed by xoshiro256++ (32/64 bit)
  • MT19937 backed by mersenne twister (32/64 bit)

The digits in the class name stand for the overall period of the PRNG as a power of 2: (2^N) - 1, where N is the said number.

XSR256 and XSR128 engines are from the xoshiro (XOR/shift/rotate) collection, designed by Sebastiano Vigna and David Blackman: really fast generators promising exquisite statistical properties as well.

MT19937 engine is an implementation of the famous Mersenne Twister, developed by Makoto Matsumoto and Takuji Nishimura: the most widely used PRNG passing most strict statistical tests.

Custom PRNG

All PRNGs in this library inherit from a module: PRNG. You are allowed to build your own custom PRNG by including the module and defining the methods needed by Alea::Random to ensure proper repeatability and sampling, as described in this example.

It is worth noting that in these implementations #next_u32 and #next_u64 depend on different states and thus they are independent from each other, as well as #next_f32 and #next_f64 or #next_i32 and #next_i64. It is still fine, though, if both #next_u32 and #next_u64 rely on the same state, if you want. I choose not to, as it makes state advancements unpredictable.

Cumulative Distribution Functions

CDF is the interface used to calculate the Cumulative Distribution Functions. Given X ~ D and a fixed quantile x, CDFs are defined as the functions that associate x to the probability that the real-valued random X from the distribution D will take a value less or equal to x.

Arguments passed to CDF methods to shape the distributions are analogous to those used for sampling:

Alea::CDF.normal(0.0)                       # => 0.5 : Float64
Alea::CDF.normal(2.0, loc: 1.0, sigma: 0.5) # => 0.9772498680518208 : Float64
Alea::CDF.chisq(5.279, df: 5.0)             # => 0.6172121213841358 : Float64
Alea::CDF.chisq32(5.279, df: 5.0)           # => 0.61721206 : Float32

Documentation

Documentation is hosted on GitHub Pages.

Into The Wild

Here is a list of the projects including alea:

Aknowledgments

Fully listed in LICENSE.md:

  • Crystal Random module for uniform sampling
  • NumPy random module for pseudo-random sampling methods
  • NumPy mt19937 prng implementation
  • JuliaLang random module for ziggurat methods
  • IncGammaBeta.jl for incomplete gamma functions

Contributing

  1. Fork it (https://github.com/nin93/alea/fork)
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Contributors