Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return marginal likelihood from inside pass #249

Closed
nspope opened this issue Jan 27, 2023 · 2 comments
Closed

Return marginal likelihood from inside pass #249

nspope opened this issue Jan 27, 2023 · 2 comments

Comments

@nspope
Copy link
Contributor

nspope commented Jan 27, 2023

It'd be useful to return the (approximate) marginal likelihood from the inside pass (that is, the likelihood of the mutations in the tree sequence, integrating over possible node ages). The reason is that with a constant mutation rate, this marginal likelihood should be quite sensitive to the choice of effective population size and time grid. For example, I simulated ten tree sequences from the standard coalescent with population size 26000; and calculated marginal likelihood with tsdate's algorithm across choices of Ne for the prior:

In this figure, relative absolute error is calculated as abs(estimated age - true age) / true age excluding nodes that are less than 100 generations old; because at mu=1e-8 there's not much information with which to estimate these precisely (& they'll dominate the relative error otherwise). The vertical lines are maxima/minima for logLik/RAE for each sim. For this particular case it seems like choosing Ne on the basis of marginal likelihood also does a pretty good job at minimizing the relative absolute error.

I ran a separate set of simulations where I varied population size between 5000 and 50000, to see how the "inferred" Ne (in terms of maximizing log marginal likelihood) performs as a surrogate for the "optimal" Ne (in terms of minimizing relative absolute error):

There's a downward bias in "optimal" Ne vs the truth regardless of criterion (I think because the current prior discretization scheme biases the expectation of the prior). However, it looks like choosing the prior Ne based on marginal likelihood is a good proxy for minimizing RAE.

So, this may provide a useful heuristic for "calibrating" the prior in an iterative fashion ala #237, especially when combined with #248 which gives more flexibility for modifying the prior timescale. In particular, if we change the underflow protection scheme so that:

  • Poisson likelihoods aren't standardized over timepoints at each iteration, and
  • "Standardization" at each step is done by dividing by the sum, not the max (e.g. use logsumexp when working in logspace)

then the (approximate) marginal likelihood can be calculated directly from the denominator member of the inside-outside algorithm class, because geometric scaling appropriately divvies up the contributions of shared descendants across roots.

@hyanwong
Copy link
Member

Can we close this now @nspope, or is there useful discussion here re choosing the prior Ne? I suspect this has been superseded by #349 ?

@nspope
Copy link
Contributor Author

nspope commented Dec 11, 2023

Yes! Closing.

@nspope nspope closed this as completed Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants