Skip to content

Support Exporting Codon Analysis to MrBayes Block Files #266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 8, 2024

Conversation

IntegerLimit
Copy link
Collaborator

@IntegerLimit IntegerLimit commented Jul 8, 2024

Introduction

This PR implements exporting the final sequence type to MrBayes: Codon Models. This therefore finishes the implementation of #195, excluding any changes required further from suggestions and bug fixes after testing has been conducted.

Codon Models in MrBayes: Introduction

The basic structure for codon models in MrBayes is quite similar to mechanistic codon models in IQTree, following the same formulation of the model by Goldman & Yang 1994 and Muse & Gaut 1994. However, the settings for MrBayes are under different names, and most inputs cannot be ported directly.

Instead of using named models, such as MG or GY, MrBayes uses five main parameters:

  • Nucleotide Substitution Model (From JC, HKY and GTR), set through lset nst (nst = 1 for JC, nst = 2 for HKY, nst = 6 for GTR)
  • The Transition/Transversion (ts/tv) Rate Ratio (set through prset tratiopr)
  • The Non-Synonymous/Synonymous (dn/ds) Rate Ratio (set through prset omegapr)
  • The State Frequencies of each non-stop codon (set through prset statefreqpr)
  • The Omega Variation Type and Corresponding Parameters (set through lset omegavar)

(Source: MrBayes Manual (Chapter 6.1.3 & Appendix A), MrBayes help lset and help prset commands, IQTree Documentation on Substitution Models (Section on Codon Models))

Mechanistic Model Output

Nucleotide Substitution Model

For retrieving the nucleotide substitution model that should be used as input into MrBayes, the implemented code does the following:

  • If fix_kappa is true, then the model will be set to nst = 1 (JC)
  • If fix_kappa is false, then the model will be set to nst = 2 (HKY)

This implementation means that GTR is not used, appropriate considering the inputs for Mechanistic Codon Models in IQTree (ds/dt ratio + ts/tv ratio).

Note that fix_kappa is only set to true under the Codon Models MGK and GY0K (which are the only models without a ts/tv input ratio). This can be shown through the initCodon function, which only calls initMG94 or initGY94 with fix_kappa as true for those two models. That input is then read into the fix_kappa field here for MG Models and here for GY Models.

TS/TV Rate Ratio

In IQTree, ModelCodon stores two values for the kappa, or the ts/tv rate ratio:

  • kappa, which represents the ts/tv rate ratio for 'normal' MG/GY models
  • kappa2, which represents the ts/tv rate ratio for '2-Kappa' MG/GY modles

However, for the tratiopr input in MrBayes, there are two types of input available:

  • fixed, which takes one input (ts/tv rate ratio), and fixes it
  • beta, which takes two inputs (transition rate & transversion rate), and allows for variation

Since it is preferable to have variable inputs in Bayesian Inference, the preference would be to use the beta input. In general, the beta input should be in the form beta(ax, x), where a is the ts/tv ratio, and x represents how close to the ratio the data is. (Source 1)

Source 1: (From the MrBayes help prset page)

If you think it is likely that the transition/transversion rate ratio is 2.0, you can use a prior of the type beta(2x,x), where x determines how strongly the prior is concentrated on tratio values near 2.0.

Therefore, the ratio is figured out as below:

  • beta(kappa, 1) when codon_kappa_style is CK_ONE_KAPPA (kappa here represents the ts/tv rate ratio)
  • beta(kappa, 1) when codon_kappa_style is CK_ONE_KAPPA_TS (kappa here represents the transition rate)
  • beta(1, kappa) when codon_kappa_style is CK_ONE_KAPPA_TV (kappa here represents the transversion rate)
  • beta(kappa, kappa2) when codon_kappa_style is CK_TWO_KAPPA (kappa here represents the transition rate, and kappa2 represents the transversion rate)

(Source: Usage of Kappa and Kappa2 in Four Functions)

DN/DS Rate Ratio

ModelCodon stores the dn/ds rate ratio in omega. Similar to ts/tv, we do not have the non-synonymous rate and the synonymous rate individually.

MrBayes, again similar to ts/tv, has two possible inputs for omegapr.

  • fixed, takes one input (the dn/ds rate ratio), fixes it
  • dirichlet, takes two inputs (non-synonymous rate and the synonymous rate), allows for change.

However, as we don't know the rates themselves, the output would simply be the first case in the ts/tv ratio calculation: dirichlet(omega, 1).

State Frequencies

This is simply retrieved from state_freq. However, the implementation also skips indices of state_freq where phylo_tree->aln->isStopCodon(i) returns true.

Omega Variation

Since IQTree has no values for Omega Variation (which variates the ds/dt ratio across codons, based on how they affect fitness), this has been set to equal. The user can still change this value themselves if they wish to do so.

This could be improved in the future, so that it and its other priors, are sourced from the gamma distribution.

Rate Heterogeneity Modifiers

MrBayes does not support +G or +I (or +R, although it doesn't support that in any sequence type) for Codon Models. This has been excluded, and a warning printed to the file and log.

Rate Matrix

This has also been discarded, as there is no input in MrBayes for the rate matrix.

Empirical Model Output

MrBayes does not support Empirical Codon Models, so when such a model is being used (or a mixture of Empirical + Mechanistic), a warning is printed to the log and file. However, a model is still outputted, with nst = 1, no kappa/omega values, and the provided state frequencies.

Codon Codes

Whilst IQTree uses Number IDs for its Codon Codes (CODON1, CODON2, etc.), MrBayes uses Text IDs. (vertmt, invermt, etc.) There is no clear documentation or description for most of the models, but below shows the final table to transfer from IQTree Codon Codes to MrBayes Codon Codes.

An XXX in the MrBayes column represents a code that MrBayes does not support.

IQTree MrBayes
CODON1 universal
CODON2 vertmt
CODON3 yeast
CODON4 mycoplasma
CODON5 invermt
CODON6 ciliate
CODON9 echinoderm
CODON10 euplotid
CODON11 universal
CODON12 XXX
CODON13 XXX
CODON14 XXX
CODON16 XXX
CODON21 XXX
CODON22 XXX
CODON23 XXX
CODON24 XXX
CODON25 XXX

If a code is used that MrBayes does not support, it defaults to the universal code, and prints an error to the log and file.

Other Changes

The main changes were: (compared with Previous Two PRs)

  • Changing the indentation of warnings in the output file to match the indentation of the model parameters
  • Changing the printMrBayesModelText in ModelSubst and child classes to not include the RateHeterogeneity parameter, instead having that accessed via phylo_tree->getRate()
  • Improving the initial warning in the output files

Testing

The base cases, and some edge cases, have been tested across all data types and some models. The test case simply was to check the presence of the warnings, and that the output files import without errors into MrBayes.

However, further tests may need to be conducted on whether the input values into MrBayes produce an acceptable and reliable result.

@IntegerLimit IntegerLimit changed the base branch from master to MrBayesSupport July 8, 2024 07:29
@thomaskf thomaskf merged commit d36486c into iqtree:MrBayesSupport Jul 8, 2024
IntegerLimit added a commit to IntegerLimit/iqtree2 that referenced this pull request Jul 8, 2024
* Codon Model

* Fix Compiler Error due to Merge Conflicts

* Fix Codon Model `NucModel` Parameter

* Fix Empirical Warning + Indentation of Warnings

* Improve Start-Of-File Warnings

* Fix Indentation in alignment.cpp
IntegerLimit added a commit to IntegerLimit/iqtree2 that referenced this pull request Apr 23, 2025
* Codon Model

* Fix Compiler Error due to Merge Conflicts

* Fix Codon Model `NucModel` Parameter

* Fix Empirical Warning + Indentation of Warnings

* Improve Start-Of-File Warnings

* Fix Indentation in alignment.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants