Benchmarking (#248)

torfjelde · torfjelde · commit a7ba0b5569c1 · 2021-07-13T16:43:19.000Z
Changes to DPPL can often have quite significant effects for compilation time and performance of both itself and downstream packages. It's also sometimes difficult to discover these performance regressions. E.g. in #221 we made a small simplification to the compiler and it ended up taking quite a while to figure out what was going wrong and had to test several models to identify the issue. So, this is a WIP PR for including a small set of models which we can `weave` into a document where we can look at the changes. It's unclear to me whether this should go in DPPL itself or in a separate package. I found it useful myself and figured I'd put it here so we can start maybe get some "standard" benchmarks to run for testing purposes. IMO we don't need many of them, as we will add more as we go along. For each model the following will be included in the document: - Benchmarked evaluation of the model on untyped and typed `VarInfo`. - Timing of the compilation of the model in the typed `VarInfo`. - Lowered code for the model. - If `:prefix` is provided to `weave`, the string-representation of `code_typed` for the evaluation of the model will be saved to a file `$(prefix)_(model.name)`. Furthermore, if `:prefix_old` is provided, pointing to `:prefix` used for a previous run (likely using a different version of DPPL), we will `diff` the `code_typed` for the two models by loading the saved files.
diff --git a/benchmarks/Project.toml b/benchmarks/Project.toml
@@ -0,0 +1,13 @@
+name = "DynamicPPLBenchmarks"
+uuid = "d94a1522-c11e-44a7-981a-42bf5dc1a001"
+version = "0.1.0"
+
+[deps]
+BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
+DiffUtils = "8294860b-85a6-42f8-8c35-d911f667b5f6"
+Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
+DynamicPPL = "366bfd00-2699-11ea-058f-f148b4cae6d8"
+LibGit2 = "76f85450-5226-5b5a-8eaa-529ad045b433"
+Markdown = "d6f4376e-aef5-505a-96c1-9c027394607a"
+Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
+Weave = "44d3d7a6-8a23-5bf8-98c5-b353f8df5ec9"
diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -0,0 +1,30 @@
+To run the benchmarks, simply do:
+```sh
+julia --project -e 'using DynamicPPLBenchmarks; weave_benchmarks();'
+```
+
+```julia
+help?> weave_benchmarks
+search: weave_benchmarks
+
+  weave_benchmarks(input="benchmarks.jmd"; kwargs...)
+
+  Weave benchmarks present in benchmarks.jmd into a single file.
+
+  Keyword arguments
+  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡
+
+    •  benchmarkbody: JMD-file to be rendered for each model.
+
+    •  include_commit_id=false: specify whether to include commit-id in the default name.
+
+    •  name: the name of directory in results/ to use as output directory.
+
+    •  name_old=nothing: if specified, comparisons of current run vs. the run pinted to by name_old
+       will be included in the generated document.
+
+    •  include_typed_code=false: if true, output of code_typed for the evaluator of the model will be
+       included in the weaved document.
+
+    •  Rest of the passed kwargs will be passed on to Weave.weave.
+```
diff --git a/benchmarks/benchmark_body.jmd b/benchmarks/benchmark_body.jmd
@@ -0,0 +1,39 @@
+```julia
+@time model_def(data)();
+```
+
+```julia
+m = time_model_def(model_def, data);
+```
+
+```julia
+suite = make_suite(m);
+results = run(suite)
+results
+```
+
+```julia; echo=false; results="hidden";
+BenchmarkTools.save(joinpath("results", WEAVE_ARGS[:name], "$(m.name)_benchmarks.json"), results)
+```
+
+```julia; wrap=false
+if WEAVE_ARGS[:include_typed_code]
+    typed = typed_code(m)
+end
+```
+
+```julia; echo=false; results="hidden"
+if WEAVE_ARGS[:include_typed_code]
+    # Serialize the output of `typed_code` so we can compare later.
+    haskey(WEAVE_ARGS, :name) && serialize(joinpath("results", WEAVE_ARGS[:name],"$(m.name).jls"), string(typed));
+end
+```
+
+```julia; wrap=false; echo=false;
+if haskey(WEAVE_ARGS, :name_old)
+    # We want to compare the generated code to the previous version.
+    import DiffUtils
+    typed_old = deserialize(joinpath("results", WEAVE_ARGS[:name_old], "$(m.name).jls"));
+    DiffUtils.diff(typed_old, string(typed), width=130)
+end
+```
diff --git a/benchmarks/benchmarks.jmd b/benchmarks/benchmarks.jmd
@@ -0,0 +1,96 @@
+# Benchmarks
+
+## Setup
+
+```julia
+using BenchmarkTools, DynamicPPL, Distributions, Serialization
+```
+
+```julia
+import DynamicPPLBenchmarks: time_model_def, make_suite, typed_code, weave_child
+```
+
+## Models
+
+### `demo1`
+
+```julia
+@model function demo1(x)
+    m ~ Normal()
+    x ~ Normal(m, 1)
+
+    return (m = m, x = x)
+end
+
+model_def = demo1;
+data = 1.0;
+```
+
+```julia; results="markup"; echo=false
+weave_child(WEAVE_ARGS[:benchmarkbody], mod = @__MODULE__, args = WEAVE_ARGS)
+```
+
+### `demo2`
+
+```julia
+@model function demo2(y) 
+    # Our prior belief about the probability of heads in a coin.
+    p ~ Beta(1, 1)
+
+    # The number of observations.
+    N = length(y)
+    for n in 1:N
+        # Heads or tails of a coin are drawn from a Bernoulli distribution.
+        y[n] ~ Bernoulli(p)
+    end
+end
+
+model_def = demo2;
+data = rand(0:1, 10);
+```
+
+```julia; results="markup"; echo=false
+weave_child(WEAVE_ARGS[:benchmarkbody], mod = @__MODULE__, args = WEAVE_ARGS)
+```
+
+### `demo3`
+
+```julia
+@model function demo3(x)
+    D, N = size(x)
+
+    # Draw the parameters for cluster 1.
+    μ1 ~ Normal()
+
+    # Draw the parameters for cluster 2.
+    μ2 ~ Normal()
+
+    μ = [μ1, μ2]
+
+    # Comment out this line if you instead want to draw the weights.
+    w = [0.5, 0.5]
+
+    # Draw assignments for each datum and generate it from a multivariate normal.
+    k = Vector{Int}(undef, N)
+    for i in 1:N
+        k[i] ~ Categorical(w)
+        x[:,i] ~ MvNormal([μ[k[i]], μ[k[i]]], 1.)
+    end
+    return k
+end
+
+model_def = demo3
+
+# Construct 30 data points for each cluster.
+N = 30
+
+# Parameters for each cluster, we assume that each cluster is Gaussian distributed in the example.
+μs = [-3.5, 0.0]
+
+# Construct the data points.
+data = mapreduce(c -> rand(MvNormal([μs[c], μs[c]], 1.), N), hcat, 1:2);
+```
+
+```julia; echo=false
+weave_child(WEAVE_ARGS[:benchmarkbody], mod = @__MODULE__, args = WEAVE_ARGS)
+```
diff --git a/benchmarks/src/DynamicPPLBenchmarks.jl b/benchmarks/src/DynamicPPLBenchmarks.jl
@@ -0,0 +1,171 @@
+module DynamicPPLBenchmarks
+
+using DynamicPPL
+using BenchmarkTools
+
+using Weave: Weave
+using Markdown: Markdown
+
+using LibGit2: LibGit2
+using Pkg: Pkg
+
+export weave_benchmarks
+
+function time_model_def(model_def, args...)
+    return @time model_def(args...)
+end
+
+function benchmark_untyped_varinfo!(suite, m)
+    vi = VarInfo()
+    # Populate.
+    m(vi)
+    # Evaluate.
+    suite["evaluation_untyped"] = @benchmarkable $m($vi, $(DefaultContext()))
+    return suite
+end
+
+function benchmark_typed_varinfo!(suite, m)
+    # Populate.
+    vi = VarInfo(m)
+    # Evaluate.
+    suite["evaluation_typed"] = @benchmarkable $m($vi, $(DefaultContext()))
+    return suite
+end
+
+function typed_code(m, vi=VarInfo(m))
+    rng = DynamicPPL.Random.MersenneTwister(42)
+    spl = DynamicPPL.SampleFromPrior()
+    ctx = DynamicPPL.SamplingContext(rng, spl, DynamicPPL.DefaultContext())
+
+    results = code_typed(m.f, Base.typesof(m, vi, ctx, m.args...))
+    return first(results)
+end
+
+"""
+    make_suite(model)
+
+Create default benchmark suite for `model`.
+"""
+function make_suite(model)
+    suite = BenchmarkGroup()
+    benchmark_untyped_varinfo!(suite, model)
+    benchmark_typed_varinfo!(suite, model)
+
+    return suite
+end
+
+"""
+    weave_child(indoc; mod, args, kwargs...)
+
+Weave `indoc` with scope of `mod` into markdown.
+
+Useful for weaving within weaving, e.g.
+```julia
+weave_child(child_jmd_path, mod = @__MODULE__, args = WEAVE_ARGS)
+```
+together with `results="markup"` and `echo=false` will simply insert
+the weaved version of `indoc`.
+
+# Notes
+- Currently only supports `doctype == "github"`. Other outputs are "supported"
+  in the sense that it works but you might lose niceties such as syntax highlighting.
+"""
+function weave_child(indoc; mod, args, kwargs...)
+    # FIXME: Make this work for other output formats than just `github`.
+    doc = Weave.WeaveDoc(indoc, nothing)
+    doc = Weave.run_doc(doc; doctype="github", mod=mod, args=args, kwargs...)
+    rendered = Weave.render_doc(doc)
+    return display(Markdown.parse(rendered))
+end
+
+"""
+    pkgversion(m::Module)
+
+Return version of module `m` as listed in its Project.toml.
+"""
+function pkgversion(m::Module)
+    projecttoml_path = joinpath(dirname(pathof(m)), "..", "Project.toml")
+    return Pkg.TOML.parsefile(projecttoml_path)["version"]
+end
+
+"""
+    default_name(; include_commit_id=false)
+
+Construct a name from either repo information or package version
+of `DynamicPPL`.
+
+If the path of `DynamicPPL` is a git-repo, return name of current branch,
+joined with the commit id if `include_commit_id` is `true`.
+
+If path of `DynamicPPL` is _not_ a git-repo, it is assumed to be a release,
+resulting in a name of the form `release-VERSION`.
+"""
+function default_name(; include_commit_id=false)
+    dppl_path = abspath(joinpath(dirname(pathof(DynamicPPL)), ".."))
+
+    # Extract branch name and commit id
+    local name
+    try
+        githead = LibGit2.head(LibGit2.GitRepo(dppl_path))
+        branchname = LibGit2.shortname(githead)
+
+        name = replace(branchname, "/" => "_")
+        if include_commit_id
+            gitcommit = LibGit2.peel(LibGit2.GitCommit, githead)
+            commitid = string(LibGit2.GitHash(gitcommit))
+            name *= "-$(commitid)"
+        end
+    catch e
+        if e isa LibGit2.GitError
+            @info "No git repo found for $(dppl_path); extracting name from package version."
+            name = "release-$(pkgversion(DynamicPPL))"
+        else
+            rethrow(e)
+        end
+    end
+
+    return name
+end
+
+"""
+    weave_benchmarks(input="benchmarks.jmd"; kwargs...)
+
+Weave benchmarks present in `benchmarks.jmd` into a single file.
+
+# Keyword arguments
+- `benchmarkbody`: JMD-file to be rendered for each model.
+- `include_commit_id=false`: specify whether to include commit-id in the default name.
+- `name`: the name of directory in `results/` to use as output directory.
+- `name_old=nothing`: if specified, comparisons of current run vs. the run pinted to
+  by `name_old` will be included in the generated document.
+- `include_typed_code=false`: if `true`, output of `code_typed` for the evaluator
+  of the model will be included in the weaved document.
+- Rest of the passed `kwargs` will be passed on to `Weave.weave`.
+"""
+function weave_benchmarks(
+    input=joinpath(dirname(pathof(DynamicPPLBenchmarks)), "..", "benchmarks.jmd");
+    benchmarkbody=joinpath(
+        dirname(pathof(DynamicPPLBenchmarks)), "..", "benchmark_body.jmd"
+    ),
+    include_commit_id=false,
+    name=default_name(; include_commit_id=include_commit_id),
+    name_old=nothing,
+    include_typed_code=false,
+    doctype="github",
+    outpath="results/$(name)/",
+    kwargs...,
+)
+    args = Dict(
+        :benchmarkbody => benchmarkbody,
+        :name => name,
+        :include_typed_code => include_typed_code,
+    )
+    if !isnothing(name_old)
+        args[:name_old] = name_old
+    end
+    @info "Storing output in $(outpath)"
+    mkpath(outpath)
+    return Weave.weave(input, doctype; out_path=outpath, args=args, kwargs...)
+end
+
+end # module