Add elliptical slice sampling algorithm #1000

devmotion · 2019-11-27T08:22:49Z

This is an updated version of the ESS in #991. I noticed that the bias visible in the plots in #991 is real and the mean of the MCMC chain does not converge to the true posterior mean. The problem was that I computed the joint probabilities although ESS requires the likelihood. I tried to fix this by changing the assume function. Moreover, I included some checks for normality and allowed only one parameter.

The example in the original issue and two other examples included in the tests seem to converge now, but the gdemo example still fails. I'm not sure why this is the case and assume there is some problem with how ESS is integrated in the Gibbs sampler - I guess I forgot to implement something related to Turing's internals or implemented it incorrectly.

You can check the MATLAB implementation from Ian Murray, one of the original authors, for a comparison: https://homepages.inf.ed.ac.uk/imurray2/pub/10ess/elliptical_slice.m

src/inference/ess.jl

test/inference/ess.jl

src/inference/ess.jl

Co-Authored-By: Cameron Pfiffer <cpfiffer@gmail.com>

src/inference/ess.jl

devmotion · 2019-12-19T22:54:44Z

I updated the PR and changed in particular the implementation of assume and observe (basically, they are not doing anything special anymore apart from not evaluating the logpdf of the prior distribution since it is never needed. Now MCMC converges for both models

@model demo(x) = begin
    m ~ Normal(0, 1)
    x ~ MvNormal(fill(m, length(x)), sqrt(σ²))
end

and

@model gdemo(x, y) = begin
    s ~ InverseGamma(2, 3)
    m ~ Normal(0, sqrt(s))
    x ~ Normal(m, sqrt(s))
    y ~ Normal(m, sqrt(s))
    return s, m
end

successfully. I created plots of the trajectories and the true (dashed blue) and approximated (orange) posterior (code):

ESS (demo):
NUTS (demo):

ESS (gdemo):
HMC (gdemo):

trappmartin

Thanks for the work, I left a few remarks.

src/core/RandomVariables.jl

src/inference/ess.jl

src/inference/hmc.jl

src/inference/mh.jl

mohamed82008 · 2019-12-21T13:21:32Z

I would suggest keeping lateral changes to another PR. Also if you are trying to compute the log likelihood, I recommend overloading tilde and dot_tilde for the ESS sampler with ctx::DefaultContext and call the tilde or dot_tilde functions using the spl::SampleFromPrior and ctx::LikelihoodContext. This will significantly simplify this PR.

devmotion · 2019-12-24T02:15:03Z

Also if you are trying to compute the log likelihood, I recommend overloading tilde and dot_tilde for the ESS sampler with ctx::DefaultContext and call the tilde or dot_tilde functions using the spl::SampleFromPrior and ctx::LikelihoodContext. This will significantly simplify this PR.

If I understand you correctly, then you suggest using

function tilde(ctx::DefaultContext, sampler::Sampler{<:ESS}, right, left, vi)
    if left isa VarName && left in getspace(sampler)
        return tilde(LikelihoodContext(), SampleFromPrior(), right, left, vi)
    else
        return tilde(ctx, SampleFromPrior(), right, left, vi)
    end
end

function dot_tilde(ctx::DefaultContext, sampler::Sampler{<:ESS}, right, left, vn::VarName, vi)
    if vn in getspace(sampler)
        return dot_tilde(LikelihoodContext(), SampleFromPrior(), right, left, vn, vi)
    else
        return dot_tilde(ctx, SampleFromPrior(), right, left, vn, vi)
    end
end

instead of the assume and observe implementation. Everything else would be unaffected it seems, so I'm not sure if this is actually the significant simplification you're referring to. I still think it would be nice to use and it seems to work as well.

However, in general it seems the approach of sampling from the prior and computing the proposals in the step! function is problematic since it can't cope with arrays of distributions. I'm not sure what would be the best way to handle this, it seems ideally one would like to use runmodel! also for sampling from the prior - in that case, however, one can't forward to SampleFromPrior since that sampler does not overwrite existing variables. Maybe one solution would be to always sample the variables of interest in the DefaultContext and use ctx::LikelihoodContext instead of ctx::DefaultContext in the examples above. But even in that case I would have to retrieve the mean of all different distributions to correct for them...

cpfiffer · 2019-12-24T03:38:24Z

Perhaps it would be nice if we exposed something similar to vi[spl] that returned a vector of distributions -- would this help at all?

mohamed82008 · 2019-12-24T06:18:52Z

I meant something like this (only valid after #997 goes in).

# assume
function tilde(ctx::DefaultContext, sampler::Sampler{<:ESS}, right, vn::VarName, inds, vi)
    return tilde(LikelihoodContext(), SampleFromPrior(), right, vn, inds, vi)
end
# observe
function tilde(ctx::DefaultContext, sampler::Sampler{<:ESS}, right, left, vi)
    return tilde(LikelihoodContext(), SampleFromPrior(), right, left, vi)
end

# dot_assume
function dot_tilde(ctx::DefaultContext, sampler::Sampler{<:ESS}, right, left, vn::VarName, inds, vi)
    return dot_tilde(LikelihoodContext(), SampleFromPrior(), right, left, vn, inds, vi)
end
# dot_observe
function dot_tilde(ctx::DefaultContext, sampler::Sampler{<:ESS}, right, left, vi)
    return dot_tilde(LikelihoodContext(), SampleFromPrior(), right, left, vi)
end

This takes care of the assume, observe, dot_assume and dot_observe for you. It's also clearer than having to define those functions ourselves.

mohamed82008 · 2019-12-24T06:23:53Z

Perhaps it would be nice if we exposed something similar to vi[spl] that returned a vector of distributions -- would this help at all?

If it's helpful, we can have a wrapper around vi that gives us the distributions when we use getindex. For example, dists(vi) isa DistsWrapper and dists(vi)[spl] gives us what we want. dists(vi)[vn] also does the obvious.

devmotion · 2019-12-24T12:04:05Z

I meant something like this

Ah OK, but then it seems it's basically what I outlined above. Your example wouldn't work, however, since the loglikelihood should be computed only based on the variable of interest. But it seems #997 would provide a simpler way to achieve what I sketched above by specifying the variables of interest vns in LikelihoodContext(vns)?

yebai · 2019-12-26T20:38:52Z

@mohamed82008 This PR contains some modification to RandomVariables. It might be useful to sync these changes with DynamicPPL.

mohamed82008 · 2019-12-26T21:04:07Z

@yebai sure.

mohamed82008 · 2019-12-26T21:05:01Z

@devmotion in your implementation, the log prior will be counted in for the random variables that are not under the ESS sampler. I assume this is your intention. If so, then this PR looks good to me but I can't speak on the correctness of the logic.

devmotion · 2019-12-26T22:02:30Z

@devmotion in your implementation, the log prior will be counted in for the random variables that are not under the ESS sampler. I assume this is your intention.

Yes, that's exactly my intention. Basically, I'm interested in the log joint minus the log prior of the random variables that are sampled by the ESS sampler. A motivating example is the hierarchical model

@model demo(x) = begin
    m ~ Normal()
    k ~ Normal(m, 0.5)
    x ~ Normal(k, 0.5)
end

in which we might want to sample from the posterior distribution of m and k using a Gibbs sampler with ESS for both variables. In this model, if k is given, the log likelihood of x (as evaluated by Turing's LikelihoodContext without the log priors) is independent of the choice of m. Clearly, that's not useful for evaluating which sample on the elliptical slice to accept, but instead we are interested in how the likelihood of k changes with the choice of m. Hence we have to evaluate the log priors of all variables that are not under the ESS sampler since they might be affected by a change in m.

yebai · 2019-12-27T14:25:46Z

Great work, many thanks @devmotion!

Add elliptical slice sampling algorithm

584d78a

devmotion commented Nov 27, 2019

View reviewed changes