Skip to content

Simple Poissonian log-likelihood (llh) test statistic is inaccurate (undefined) for low (zero) observed counts #854

@thehrh

Description

@thehrh

... because of the use of (an approximation of) Stirling's approximation.

Ever since 442a7ef#diff-f7283bae353d06044b8ff9fe43c9fcc6fd92165586395923983197fc9871293e, utils.stats.llh has for some reason been approximating $\ln(n!) = n\text{ }\ln(n) - n$, which is undefined for $n=0$ and wildly inaccurate for low observed counts.

Here are some deviations between the left-hand side and our approximation on the right for various $n$:

  • $n=0$: 0 vs. nan
  • $n=1$: 0 vs. -1
  • $n=5$: 4.79 vs. 3.05
  • $n=10$: 15.10 vs. 13.03
  • $n=50$: 148.5 vs. 145.6

Before that change, we had been using $\ln(\Gamma(n+1))$, which agrees with $\ln(n!)$ by definition for integer $n$.

Whether this leads to a noticeable deviation of the assumed log-likelihood expression from a Poissonian one or even gives rise to pseudoexperiments with nan llh values depends on how fluctuations are produced and how large these are with respect to the expectations of course.

Any ideas why we stopped using gammaln (https://en.wikipedia.org/wiki/Poisson_distribution#Evaluating_the_Poisson_distribution)?

In any case, I think we should enable the use of an (actually Poissonian) log-likelihood metric, if need be with a new name so that existing analyses that have been using llh as it stands today can still be reproduced.

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions