Allow sampling from the prior #1243

devmotion · 2020-04-29T16:23:14Z

This PR is an alternative to #1225. I tried to reduce the existing amount of type piracy and to clean the existing implementation by moving HMC-specific stuff to the implementation of the AbstractMCMC interface for HMC samplers. The bundle_samples implementations for SampleFromPrior are still type piracy, since neither the function nor the input types are owned by Turing. Also to avoid type piracy and too Turing-specific implementations of the interface for SampleFromPrior (Turing sets the default chain_type to MCMCChains.Chains which one can't do in a generic implementation in DynamicPPL), I added a Prior algorithm type that is owned by Turing and just samples using a SampleFromPrior. However, I'm not completely sure if that's a good idea.

The PR needs the implementation of step! for SampleFromPrior in the master branch of DynamicPPL.

devmotion · 2020-04-29T19:17:56Z

As I said, I'm not fully satisfied with the fact that we just introduce a Prior struct to be able to define some default chain_type and progress keyword arguments. So I thought a bit more about the current situation and these are my two cents.

One idea would be to move the bundle_samples for certain output types to their respective packages, e.g., for MCMCChains.Chains to MCMCChains where it would be implemented for all samplers, models, and samples as generically as possible by just converting the samples to an axis array (e.g., by calling MCMCChains.samples2array(model, sampler, transitions)), getting the name map (by, e.g., calling MCMCChains.getnamemap(model, sampler, transitions)), extracting the logevidence (by calling MCMCChains.getlogevidence(model, sampler, transitions), and the info (by calling MCMCChains.getinfo(model, sampler, transitions)), and then using the results to build the Chains object. Then Turing (and other packages) could implement these methods for their combination of samplers, models, and transitions. However, that would not solve the type piracy problem for SampleFromPrior (as long as DynamicPPL does not depend on MCMCChains) and create a bunch of new methods that one would have to implement...

Another idea would be to just drop chain_type and to always return the actual samples (if one does not reimplement/modify the default sample logic in AbstractMCMC). If one would have an abstract interface in AbstractMCMC for accessing parameter values and additional information of samples, then users would still get some basic functionality and could retrieve, e.g., all samples of parameter :p or the log probabilities in an easy way. For some rudimentary use cases that might be enough, but additionally packages such as MCMCChains should then provide an implementation of how to construct a Chains object from any set of samples (and possibly the model and sampler) that implements this interface. Then it would still be fairly easy for users to construct a Chains object that gives them access to all the diagnostic tools in MCMCChains. And of course one could implement the same interface also for VarInfo, and hence even if SampleFromPrior returns a vector of VarInfo, users could extract samples etc. AbstractMCMC could even implement default methods for how to convert samples that implement this interface to types in Julia base such as NamedTuple or Vector. Not performing any transformations of the samples during or after the sampling procedure and just returning the "raw" samples would maybe also allow more advanced (and possibly more efficient?) use cases without hooking into the internals (too much).

yebai

thanks @devmotion, looks good overall. I left some minor comments below.

src/inference/Inference.jl

yebai · 2020-04-29T21:45:38Z

tests are failing with the following error:

  MethodError: no method matching step!(::MersenneTwister, ::DynamicPPL.Model{getfield(Main, Symbol("###evaluator#7567")),(),Tuple{},(),DynamicPPL.ModelGen{getfield(Main, Symbol("###generator#7568")),(),(),Tuple{}}}, ::SampleFromPrior, ::Int64, ::Nothing; iteration=1)

https://travis-ci.com/github/TuringLang/Turing.jl/jobs/325101686#L623

devmotion · 2020-04-30T00:07:05Z

A new DynamicPPL release should fix the test errors, at least it worked for me with the master branch.

devmotion · 2020-05-03T21:52:00Z

I'll update this PR and address the comments when #1249 is merged.

Co-authored-by: Hong Ge <hg344@cam.ac.uk>

codecov · 2020-05-04T08:31:04Z

Codecov Report

Merging #1243 into master will increase coverage by 0.87%.
The diff coverage is 91.83%.

@@            Coverage Diff             @@
##           master    #1243      +/-   ##
==========================================
+ Coverage   66.25%   67.12%   +0.87%     
==========================================
  Files          25       25              
  Lines        1286     1305      +19     
==========================================
+ Hits          852      876      +24     
+ Misses        434      429       -5

Impacted Files	Coverage Δ
src/Turing.jl	`100.00% <ø> (ø)`
src/inference/AdvancedSMC.jl	`97.87% <0.00%> (-0.70%)`	⬇️
src/inference/gibbs.jl	`91.66% <0.00%> (-1.96%)`	⬇️
src/inference/is.jl	`100.00% <ø> (ø)`
src/inference/mh.jl	`75.00% <ø> (ø)`
src/inference/hmc.jl	`88.28% <90.00%> (+0.14%)`	⬆️
src/inference/Inference.jl	`88.46% <95.65%> (+8.66%)`	⬆️
src/core/container.jl	`93.18% <100.00%> (+0.15%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e18ac2...9f4fcee. Read the comment docs.

yebai · 2020-05-04T09:17:18Z

The failing tests are related to the prior sampler tests. Seems the error bound for the prior sampler tests is too tight.

devmotion · 2020-05-04T09:39:22Z

I set a seed and increased the number of samples. Locally tests pass now, hopefully this fixes CI tests as well 🤞

yebai · 2020-05-04T10:30:17Z

Looks like the tests are passing now - thanks, @devmotion!

* Allow sampling from the prior * Apply suggestions from code review Co-authored-by: Hong Ge <hg344@cam.ac.uk> * Use `getlogp` instead of `getlp` * Add seed and increase number of samples Co-authored-by: Hong Ge <hg344@cam.ac.uk>

Allow sampling from the prior

cc8df1e

yebai requested changes Apr 29, 2020

View reviewed changes

devmotion and others added 2 commits May 4, 2020 08:30

Apply suggestions from code review

13ad964

Co-authored-by: Hong Ge <hg344@cam.ac.uk>

Use getlogp instead of getlp

b2eff85

Add seed and increase number of samples

9f4fcee

yebai merged commit c85d8c5 into TuringLang:master May 4, 2020

yebai mentioned this pull request May 4, 2020

Inference algorithm for sampling from the prior #1225

Closed

devmotion deleted the prior branch May 4, 2020 10:45

Allow sampling from the prior #1243

Allow sampling from the prior #1243

Uh oh!

Conversation

devmotion commented Apr 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devmotion commented Apr 29, 2020

Uh oh!

yebai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yebai commented Apr 29, 2020

Uh oh!

devmotion commented Apr 30, 2020

Uh oh!

devmotion commented May 3, 2020

Uh oh!

codecov bot commented May 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yebai commented May 4, 2020

Uh oh!

devmotion commented May 4, 2020

Uh oh!

yebai commented May 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

devmotion commented Apr 29, 2020 •

edited

Loading

codecov bot commented May 4, 2020 •

edited

Loading