Safety for poisson random calls #2301

SamuelBrand1 · 2024-08-12T16:02:22Z

Hi everyone,

Base Julia can't convert large floats into Int type using floor or round etc. This creates a problem for sampling from Poisson's with a large mean because this is used in fast polyalgo for Poisson sampling.

e.g.

using Distributions
log_μ = 48.0 #Plausible value in a log-scale random walk with a Poisson link
d = Poisson(exp(log_μ))
 rand(d)

ERROR: InexactError: trunc(Int64, 7.016735912097631e20)
Stacktrace:
[1] trunc
@ ./float.jl:905 [inlined]
[2] floor
@ ./float.jl:383 [inlined]
[3] PoissonADSampler
@ ~/.julia/packages/Distributions/oGEEM/src/samplers/poisson.jl:54 [inlined]
[4] PoissonADSampler
@ ~/.julia/packages/Distributions/oGEEM/src/samplers/poisson.jl:49 [inlined]
[5] sampler
@ ~/.julia/packages/Distributions/oGEEM/src/univariate/discrete/poisson.jl:144 [inlined]
[6] rand(rng::Random.TaskLocalRNG, d::Poisson{Float64})
@ Distributions ~/.julia/packages/Distributions/oGEEM/src/univariate/discrete/poisson.jl:148
[7] rand(::Poisson{Float64})
@ Distributions ~/.julia/packages/Distributions/oGEEM/src/genericrand.jl:22
[8] top-level scope
@ REPL[18]:1

Now this would not be problematic, because the logpdf calls are not affected but for some reason a rand call comes into using Turing at inference time (not sure why...).

Is there any chance of a safer version of the Poisson/Negative Binomial that can detect if round(BigInt, ... should be used?

The text was updated successfully, but these errors were encountered:

penelopeysm · 2024-08-14T16:52:00Z

Hi @SamuelBrand1, thanks for the issue!

Could you provide a MWE that shows what you're trying to do with Turing that causes this crash? My initial impression is that this seems like an upstream issue with Distributions.

SamuelBrand1 · 2024-08-15T12:29:08Z

Hey @penelopeysm

Its actually quite hard to construct a MWE here from a clean script, for example,

using Turing, Distributions

@model function rw_model(rw₀, n)
    ϵ_t ~ filldist(Normal(), n)
    rw = rw₀ .+ cumsum(ϵ_t)
    return rw
end

@model function data_model(ys, rw)
    for i in eachindex(rw)
        ys[i] ~ Poisson(exp(rw[i]))
    end
    return ys
end

@model function pois_rw(ys, rw₀)
    n = length(ys)
    @submodel rw = rw_model(rw₀, n)
    @submodel gen_ys = data_model(ys, rw)
    return rw, gen_ys
end

generative_mdl = pois_rw(fill(missing,10), 3.)
ys = generative_mdl()[2] .|> Int
inference_mdl = pois_rw(ys, 48.) #deliberately bad choice for rw_0

chn = sample(inference_mdl, NUTS(), 1000)

Works ok with package versions:

[31c24e10] Distributions v0.25.110
[fce5fe82] Turing v0.33.3

And this is roughly analogous to what @seabbs and myself are doing atm.

SamuelBrand1 · 2024-08-15T12:30:22Z

This seems to suggest that I should move down towards Distributions, as well as investigating why the large mean Poisson problem is hitting our modelling but not in this minimal script.

seabbs · 2024-08-15T13:44:23Z

Agreeing this does seem to be a Distributions issue so flagging it there makes sense. F2F @SamuelBrand1 and I have discussed that the SciMl ecosystem has had to develop their own faster Poisson (presumably due to issues changing it in Distributions). As this seems like an issue that would be common for users Turing users weighing in on the Distributions issue might be useful (as the rand call being buried in the inference call is quite confusing/hard to resolve so reducing the chance of edge case issues with rand seems like a good thing).

devmotion · 2024-08-16T22:59:46Z

F2F @SamuelBrand1 and I have discussed that the SciMl ecosystem has had to develop their own faster Poisson (presumably due to issues changing it in Distributions).

That's basically a myth, and I'm not really sure where it comes from. The sampler in SciML and Distributions are actually basically identical and already quite a few years ago I noticed that SciML is not faster: SciML/PoissonRandom.jl#6

devmotion · 2024-08-16T23:07:41Z

Issue in Distributions for the problem here: JuliaStats/Distributions.jl#821

We should hopefully be able to fix it when overhauling rand etc. more generally. But my gut feeling is that it's somewhat suspicious if such large values show up in your model anyway - I guess it's an indication that you might want to rescale your quantities?

penelopeysm · 2024-10-09T20:40:24Z

Closing in favour of upstream issue

SamuelBrand1 · 2024-10-10T10:14:52Z

Issue in Distributions for the problem here: JuliaStats/Distributions.jl#821

We should hopefully be able to fix it when overhauling rand etc. more generally. But my gut feeling is that it's somewhat suspicious if such large values show up in your model anyway - I guess it's an indication that you might want to rescale your quantities?

Thanks for the link.

The problem isn't so much "a model". @seabbs and I are working on a meta-modelling pkg for users who want to build epi models.

There is definitely a big role for educating users on Bayesian methods, e.g. if sampling is bad this usually indicates issues with the model and can be addressed via reparameterisation etc.

However, if your modelling throw Exceptions because of prior choices thats pretty problematic because the mental model here isn't a single power user with a background in probabilistic modelling who is comfortable with reading Exception messages. Its easier to get "something" back and then look at output and adjust priors/reparam. That kind of robustness is a real feature of using stan for example.

seabbs · 2024-10-10T12:42:39Z

Thanks for this JuliaStats/Distributions.jl#821 the suggested overhaul here looks great.

To expand on what @SamuelBrand1 is saying here the problem we have is trying to ship a tool for making models to users who may not have that much experience in coding/modelling/the bayesian workflow and want things to "just run" - often we are pushing against a perception that Julia is not stable/robust as well which obviously isn't super well founded but still a problem we keep coming across.

SamuelBrand1 mentioned this issue Aug 12, 2024

Issue 340: Remove nan handling CDCgov/Rt-without-renewal#415

Merged

SamuelBrand1 mentioned this issue Aug 15, 2024

Investigate rand calls in EpiAware CDCgov/Rt-without-renewal#422

Open

penelopeysm closed this as completed Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safety for poisson random calls #2301

Safety for poisson random calls #2301

SamuelBrand1 commented Aug 12, 2024

penelopeysm commented Aug 14, 2024

SamuelBrand1 commented Aug 15, 2024

SamuelBrand1 commented Aug 15, 2024

seabbs commented Aug 15, 2024 •

edited

Loading

devmotion commented Aug 16, 2024

devmotion commented Aug 16, 2024

penelopeysm commented Oct 9, 2024

SamuelBrand1 commented Oct 10, 2024 •

edited

Loading

seabbs commented Oct 10, 2024

Safety for poisson random calls #2301

Safety for poisson random calls #2301

Comments

SamuelBrand1 commented Aug 12, 2024

penelopeysm commented Aug 14, 2024

SamuelBrand1 commented Aug 15, 2024

SamuelBrand1 commented Aug 15, 2024

seabbs commented Aug 15, 2024 • edited Loading

devmotion commented Aug 16, 2024

devmotion commented Aug 16, 2024

penelopeysm commented Oct 9, 2024

SamuelBrand1 commented Oct 10, 2024 • edited Loading

seabbs commented Oct 10, 2024

seabbs commented Aug 15, 2024 •

edited

Loading

SamuelBrand1 commented Oct 10, 2024 •

edited

Loading