Support Exporting Codon Analysis to MrBayes Block Files #266
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduction
This PR implements exporting the final sequence type to MrBayes: Codon Models. This therefore finishes the implementation of #195, excluding any changes required further from suggestions and bug fixes after testing has been conducted.
Codon Models in MrBayes: Introduction
The basic structure for codon models in MrBayes is quite similar to mechanistic codon models in IQTree, following the same formulation of the model by Goldman & Yang 1994 and Muse & Gaut 1994. However, the settings for MrBayes are under different names, and most inputs cannot be ported directly.
Instead of using named models, such as
MG
orGY
, MrBayes uses five main parameters:JC
,HKY
andGTR
), set throughlset nst
(nst = 1
forJC
,nst = 2
forHKY
,nst = 6
forGTR
)prset tratiopr
)prset omegapr
)prset statefreqpr
)lset omegavar
)(Source: MrBayes Manual (Chapter 6.1.3 & Appendix A), MrBayes
help lset
andhelp prset
commands, IQTree Documentation on Substitution Models (Section on Codon Models))Mechanistic Model Output
Nucleotide Substitution Model
For retrieving the nucleotide substitution model that should be used as input into MrBayes, the implemented code does the following:
fix_kappa
is true, then the model will be set tonst = 1
(JC)fix_kappa
is false, then the model will be set tonst = 2
(HKY)This implementation means that GTR is not used, appropriate considering the inputs for Mechanistic Codon Models in IQTree (ds/dt ratio + ts/tv ratio).
Note that
fix_kappa
is only set to true under the Codon ModelsMGK
andGY0K
(which are the only models without ats/tv
input ratio). This can be shown through theinitCodon
function, which only callsinitMG94
orinitGY94
withfix_kappa
as true for those two models. That input is then read into thefix_kappa
field here for MG Models and here for GY Models.TS/TV Rate Ratio
In IQTree,
ModelCodon
stores two values for thekappa
, or the ts/tv rate ratio:kappa
, which represents the ts/tv rate ratio for 'normal' MG/GY modelskappa2
, which represents the ts/tv rate ratio for '2-Kappa' MG/GY modlesHowever, for the
tratiopr
input in MrBayes, there are two types of input available:fixed
, which takes one input (ts/tv rate ratio), and fixes itbeta
, which takes two inputs (transition rate & transversion rate), and allows for variationSince it is preferable to have variable inputs in Bayesian Inference, the preference would be to use the
beta
input. In general, the beta input should be in the formbeta(ax, x)
, wherea
is the ts/tv ratio, andx
represents how close to the ratio the data is. (Source 1)Source 1: (From the MrBayes
help prset
page)Therefore, the ratio is figured out as below:
beta(kappa, 1)
whencodon_kappa_style
isCK_ONE_KAPPA
(kappa here represents the ts/tv rate ratio)beta(kappa, 1)
whencodon_kappa_style
isCK_ONE_KAPPA_TS
(kappa here represents the transition rate)beta(1, kappa)
whencodon_kappa_style
isCK_ONE_KAPPA_TV
(kappa here represents the transversion rate)beta(kappa, kappa2)
whencodon_kappa_style
isCK_TWO_KAPPA
(kappa here represents the transition rate, and kappa2 represents the transversion rate)(Source: Usage of Kappa and Kappa2 in Four Functions)
DN/DS Rate Ratio
ModelCodon
stores the dn/ds rate ratio inomega
. Similar to ts/tv, we do not have the non-synonymous rate and the synonymous rate individually.MrBayes, again similar to ts/tv, has two possible inputs for
omegapr
.fixed
, takes one input (the dn/ds rate ratio), fixes itdirichlet
, takes two inputs (non-synonymous rate and the synonymous rate), allows for change.However, as we don't know the rates themselves, the output would simply be the first case in the ts/tv ratio calculation:
dirichlet(omega, 1)
.State Frequencies
This is simply retrieved from
state_freq
. However, the implementation also skips indices ofstate_freq
wherephylo_tree->aln->isStopCodon(i)
returns true.Omega Variation
Since IQTree has no values for Omega Variation (which variates the ds/dt ratio across codons, based on how they affect fitness), this has been set to
equal
. The user can still change this value themselves if they wish to do so.This could be improved in the future, so that it and its other priors, are sourced from the gamma distribution.
Rate Heterogeneity Modifiers
MrBayes does not support
+G
or+I
(or+R
, although it doesn't support that in any sequence type) for Codon Models. This has been excluded, and a warning printed to the file and log.Rate Matrix
This has also been discarded, as there is no input in MrBayes for the rate matrix.
Empirical Model Output
MrBayes does not support Empirical Codon Models, so when such a model is being used (or a mixture of Empirical + Mechanistic), a warning is printed to the log and file. However, a model is still outputted, with
nst = 1
, no kappa/omega values, and the provided state frequencies.Codon Codes
Whilst IQTree uses Number IDs for its Codon Codes (CODON1, CODON2, etc.), MrBayes uses Text IDs. (
vertmt
,invermt
, etc.) There is no clear documentation or description for most of the models, but below shows the final table to transfer from IQTree Codon Codes to MrBayes Codon Codes.An
XXX
in the MrBayes column represents a code that MrBayes does not support.If a code is used that MrBayes does not support, it defaults to the universal code, and prints an error to the log and file.
Other Changes
The main changes were: (compared with Previous Two PRs)
printMrBayesModelText
inModelSubst
and child classes to not include theRateHeterogeneity
parameter, instead having that accessed viaphylo_tree->getRate()
Testing
The base cases, and some edge cases, have been tested across all data types and some models. The test case simply was to check the presence of the warnings, and that the output files import without errors into MrBayes.
However, further tests may need to be conducted on whether the input values into MrBayes produce an acceptable and reliable result.