Project Status | Documentation | Build Status |
---|---|---|
This package contains Julia versions of selected code snippets
and mcmc models
contained in the R package "rethinking" associated with the book Statistical Rethinking by Richard McElreath.
As stated many times by the author in his online lectures, this package is not intended to take away the hands-on component of the course. The clips are just meant to get you going but learning means experimenting, in this case using Julia and Stan.
Over the next 2 months (probably until Apr '20) I'm planning to update StatisticalRethinking.jl v2.0.0 to reflect the changes in the 2nd edition of the book. At the same time (but this will likely take longer) I'll also expand coverage of chapters 5 and beyond.
This version follows the ongoing changes in the packages in the StanJulia Github organization, particularly the changes in StanSample.jl. This version breaks the approach chosen in v1.x with respect to the return values of stan_sample().
Another major change is that not all dependencies for the scripts are included in StatisticalRethinking.jl. Unfortunately that setup, as in version 1.0.0, leads to very long compile times. Please see the install_packages.jl
script in scripts/00
for other packages needed in the various scipts.
Initial release of StatisticalRethinking.jl.
This package is part of the broader StatisticalRethinkingJulia Github organization.
To install the package (from the REPL):
] add StatisticalRethinking
or, easier in some cases to use from within an editor:
] dev StatisticalRethinking
All scripts contain in fact examples. A good initial introduction to running a Stan language program is in scripts/03/intro-stan/intro-part-1.jl
.
In the book and associated R package rethinking
, statistical models are defined as illustrated below:
flist <- alist(
height ~ dnorm( mu , sigma ) ,
mu <- a + b*weight ,
a ~ dnorm( 156 , 100 ) ,
b ~ dnorm( 0 , 10 ) ,
sigma ~ dunif( 0 , 50 )
)
Posterior values can be approximated by
# Simulate quadratic approximation (for simpler models)
m4.31 <- quad(flist, data=d2)
or generated using Stan by:
# Generate a Stan model and run a simulation
m4.32 <- ulam(flist, data=d2)
The author of the book states: "If that (the statistical model) doesn't make much sense, good. ... you're holding the right textbook, since this book teaches you how to read and write these mathematical descriptions" (page 77).
StatisticalRethinking.jl is intended to allow experimenting with this learning process using StanJulia.
There are a few important differences between rethinking
and StatisticalRethinking.jl
:
- In v2.x of StatisticalRethinking.jl, ulam() has been replaced by StanSample.jl.
This means that much earlier on than in the book, StatisticalRethinking.jl introduces the reader to the Stan language.
To help out with this, in the subdirectory scripts/03/intro-stan
the Stan language is introduced and the execution of Stan language programs illustrated. Chapter 9 of the book contains a nice introduction to translating the alist
R models to the Stan language (just before section 9.5).
Chapter 9 of the book contains a nice introduction to translating the alist
R models to the Stan language (just before section 9.5).
- The equivalent of the R function
quap()
in StatisticalRethinking.jl v2.0 uses the MAP density of the Stan samples as the mean of the Normal distribution and reports the approximation as a NamedTuple. e.g. fromscripts/04-part-1/clip-31.jl
:
if success(rc)
df = read_samples(sm; output_format=:dataframe)
q = quap(df)
q |> display
end
returns:
(mu = 178.0 ± 0.1, sigma = 24.5 ± 0.94)
To obtain the mu quap:
q.mu
Examples and comparisons of different ways of computing a quap approximation can be found in scripts/03/intro-stan/intro-part-4.jl
.
-
In
scripts/04
an additional section has been added,intro-logpdf
which introduces an alternative way to compute the MAP (quap) using Optim.jl. This kind of builds on the logpdf formulation introduced inscripts/03/intro-stan/intro-part-4.jl
-
In
scripts/09
an additional intro section has been included,scripts/09/intro-dhmc
. It is envisage that a future version of StatisticalRethinking.jl will be based on DynamicHMC.jl. No time line has been set for this work.
Instead of having all snippets in a single file, the snippets are organized by chapter and grouped in clips by related snippets. E.g. chapter 0 of the R package has snippets 0.1 to 0.5. Those have been combined into 2 clips:
clip-01-03.jl
- contains snippets 0.1 through 0.3clip-04-05.jl
- contains snippets 0.4 and 0.5.
A single snippet clip will be referred to as 03/clip-02.jl
.
As mentioned above, a few chapters contain additional scripts intended as introductions for specific topics.
If you want to use this package as an easy way to access the dataset samples, the package offers the function rel_path
to work with paths inside the StatisticalRethinking package:
using StatisticalRethinking, CSV
# for example, grabbing the `Howell1` dataset used in Chapter 4
datapath = rel_path("..", "data/","Howell1.csv")
CSV.read(datapath)
Implementations of the models using Stan, DynamicHMC and Turing can be found in StanModels, DynamicHMCModels and TuringModels.
In the meantime time, Chris Fisher has made tremendous progress with MCMCBenchmarks.jl, which compares three NUTS mcmc options.
- STABLE — documentation of the most recently tagged version.
- DEVEL — documentation of the in-development version.
Of course, without this excellent textbook by Richard McElreath, this package would not have been possible. The author has also been supportive of this work and gave permission to use the datasets.
Richard Torkar has taken the lead in developing the Turing versions of the models contained in TuringModels.
Tamas Papp has been very helpful during the development of the DynamicHMC versions of the models.
The TuringLang team and #turing contributors on Slack have been extremely helpful! The Turing examples by Cameron Pfiffer are followed closely in several example scripts.
The increasing use of Particles to represent quap approximations is possible thanks to the package MonteCarloMeasurements.jl. Package Soss.jl and related write-ups introduced me to that option.
Question and contributions are very welcome, as are feature requests and suggestions. Please open an issue if you encounter any problems or have a question.
Developing rethinking
must have been an on-going process over several years, StatisticalRethinking.jl
will likely follow a similar path.
-
The first version (v1.x) of
StatisticalRethinking
attempts to capture the models and to show ways of setting up those models, execute the models and post-process the results using Julia. -
Many R functions such as precis(), shade(), etc. are either not in v1 or replaced by Julia equivalents, e.g. the Particles approach is used instead of precis(). Expect significant refactoring of those in future versions of StatisticalRethinking.jl.
-
Several other interesting approaches to mcmc modeling are being explored in Julia, e.g. Soss.jl and Omega.jl. These are tracked as candidates for use in a future version of StatisticalRethinking.jl.