Skip to content

Support Exporting Model and Partition Selections to MrBayes #29

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

IntegerLimit
Copy link
Contributor

@IntegerLimit IntegerLimit commented May 10, 2025

Introduction

This PR ports over iqtree/iqtree2#267 into IQTree 3, as well as implementing all review requests previously on that PR.

Similarly, iqtree/iqtree2#195 has been implemented for all MrBayes supported sequence types. (DNA/RNA, Protein, Binary, Morphological & Codon) This PR is complete, excluding any changes required further from suggestions and bug fixes after testing has been conducted.

General Implementation Details

  • MrBayes files will be outputted when the user provides the -mset mrbayes, or adds the -mrbayes flag
  • One MrBayes file will be exported: .mr_bayes.nex)
  • MrBayes files will be outputted on partitioned and non-partitioned runs
  • For non-supported sequence types and non-supported models (Mixture and Non-Reversible Models), a default of GTR+G+I (DNA) will be used, with warnings printed to log and file.
  • +R has been mapped to +G+I (with warnings printed to log + file)

DNA Fallbacks

For DNA, MrBayes supports three models: JR/F81 (nst=1), HKY (nst=2) and GTR (nst=6) (excluding their fixed frequency counterparts)

Therefore, when a model is used that is not supported in MrBayes, it will default to GTR, due to the lower impact of increased parameters when using Bayesian Inference.

Protein Fallbacks

For Protein, when a model is used that is not supported by MrBayes, a default of GTR will be used. Then there will be a rate and state frequency matrix of the model included. The rate matrix will be set to fixed, unless the model used by IQTree was GTR20, in which case dirichlet will be used. This appears to be a mandatory parameter for MrBayes GTR models.

Binary, Morphological and Codon Data Exclusions

  • MrBayes Binary Data does not support +I, that has been ignored, with warning printed to log and file
  • Morphological Data:
    • MrBayes does not support states {A-Z}, warning has been printed to log and file
    • MrBayes does not support +I, that has been ignored, with warning printed to log and
  • MrBayes Codon Data does not support any Heterogeneity Modifier (+G, +I or +R). They have been excluded, and a warning is printed to the log and file when a model is used with any modifiers.

Codon Implementation Details

Codon Models in MrBayes: Introduction

The basic structure for codon models in MrBayes is quite similar to mechanistic codon models in IQTree, following the same formulation of the model by Goldman & Yang 1994 and Muse & Gaut 1994. However, the settings for MrBayes are under different names, and most inputs cannot be ported directly, making it the most difficult model to port to MrBayes format.

Instead of using named models, such as MG or GY, MrBayes uses only one main parameter, which acts similar to DNA model selection: Nucleotide Substitution Model (From JC, HKY and GTR), set through lset nst (nst = 1 for JC, nst = 2 for HKY, nst = 6 for GTR)

(Source: MrBayes Manual (Chapter 6.1.3 & Appendix A), MrBayes help lset and help prset commands, IQTree Documentation on Substitution Models (Section on Codon Models))

Mechanistic Model Output

Nucleotide Substitution Model

For retrieving the nucleotide substitution model that should be used as input into MrBayes, the implemented code does the following:

  • If fix_kappa is true, then the model will be set to nst = 1 (JC)
  • If fix_kappa is false, then the model will be set to nst = 2 (HKY)

This implementation means that GTR is not used, appropriate considering the inputs for Mechanistic Codon Models in IQTree (ds/dt ratio + ts/tv ratio).

Note that fix_kappa is only set to true under the Codon Models MGK and GY0K (which are the only models without a ts/tv input ratio). This can be shown through the initCodon function, which only calls initMG94 or initGY94 with fix_kappa as true for those two models. That input is then read into the fix_kappa field here for MG Models and here for GY Models.

Empirical Model Output

MrBayes does not support Empirical Codon Models, so when such a model is being used (or a mixture of Empirical + Mechanistic), a warning is printed to the log and file. However, a model is still outputted, with nst = 6 (GTR-like model).

Codon Codes

Whilst IQTree uses Number IDs for its Codon Codes (CODON1, CODON2, etc.), MrBayes uses Text IDs. (vertmt, invermt, etc.) There is no clear documentation or description for most of the codes, but below shows the final table to transfer from IQTree Codon Codes to MrBayes Codon Codes.

An XXX in the MrBayes column represents a code that MrBayes does not support.

IQTree MrBayes
CODON1 universal
CODON2 vertmt
CODON3 yeast
CODON4 mycoplasma
CODON5 invermt
CODON6 ciliate
CODON9 echinoderm
CODON10 euplotid
CODON11 universal
CODON12 XXX
CODON13 XXX
CODON14 XXX
CODON16 XXX
CODON21 XXX
CODON22 XXX
CODON23 XXX
CODON24 XXX
CODON25 XXX

If a code is used that MrBayes does not support, it defaults to the universal code, and prints a warning to the log and file.

Example Output Files

Without Partitions
#nexus

[This MrBayes Block Declaration provides the basic models from the IQTree Run.]
[Note that MrBayes does not support a large collection of models, so defaults of 'nst=6' for DNA and 'wag' for Protein will be used if a model that does not exist in MrBayes is used.]
[Furthermore, the Model Parameter '+R' will be replaced by '+G+I'.]
[This should be used as a Template Only.]

begin mrbayes;
  [IQTree inferred model TIM2+F+I+G4, using MrBayes model GTR+G+I]
  lset applyto=(all) nucmodel=4by4 nst=6 rates=invgamma;

end;
With Partitions
#nexus

[This MrBayes Block Declaration provides the basic partition structure and models from the IQTree Run.]
[Note that MrBayes does not support a large collection of models, so defaults of 'nst=6' for DNA and 'wag' for Protein will be used if a model that does not exist in MrBayes is used.]
[Furthermore, the Model Parameter '+R' will be replaced by '+G+I'.]
[This should be used as a Template Only.]

begin mrbayes;
  charset part1 = 1-999\3 2-999\3;
  charset part2 = 3-999\3;
  charset part3 = 1000-1998;

  partition iqtree = 3: part1, part2, part3;
  set partition = iqtree;

  [Subset #1: IQTree inferred model HKY+F, using MrBayes model HKY]
  lset applyto=(1) nucmodel=4by4 nst=2 rates=equal;

  [Subset #2: IQTree inferred model GTR+F+G4, using MrBayes model GTR+G]
  lset applyto=(2) nucmodel=4by4 nst=6 rates=gamma;

  [Subset #3: IQTree inferred model GTR+F+G4, using MrBayes model GTR+G]
  lset applyto=(3) nucmodel=4by4 nst=6 rates=gamma;

unlink statefreq=(all) revmat=(all) shape=(all) pinvar=(all) tratio=(all);
prset applyto=(all) ratepr=variable;
end;

* Support Exporting DNA/RNA Analysis to MrBayes Block Files

* Fix Formatting Issues

* Move Functions to Supplementary, Fix +R Remapping
…#264)

* Protein Model

* Morphological Models Support

* Cleanup

Move Model Specific Functions to each model class
Move other functions from phylotree and phylosupertree to phyloanalysis

* Cleanup Imports

* Binary Model Support

* Misc Cleanup

Misc Cleanup

* Output Files Readability, Default Warning & Help Message

* Fix Edge Case: Importing Values < 0.01 into MrBayes

* Fix Edge Case: Extra Characters in Charset

* Fix +G+I or +R Inputs

* Fix Issues with Binary Model

* Fix Issues with Morphology Model
* Codon Model

* Fix Compiler Error due to Merge Conflicts

* Fix Codon Model `NucModel` Parameter

* Fix Empirical Warning + Indentation of Warnings

* Improve Start-Of-File Warnings

* Fix Indentation in alignment.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant