-
Notifications
You must be signed in to change notification settings - Fork 7
[WIP: Implementation of a New Algorithm] Extension of MixtureFinder to morphological data #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
HS6986
wants to merge
57
commits into
iqtree:master
Choose a base branch
from
HS6986:feature/HS6986/MorphMixtureFinder
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+195
−32
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…on works for non-reversible models
…allow both options: MIX+MF and MF+MIX
Allow to fix the parameters for RHAS when using mixture finder. Allow all these options: MIX+MF, MIX+MFP, MF+MIX, MFP+MIX -- to run the mixture finder
…ing mixture finder. Another option: -optfromgiven The RHAS model will still be optimized according to the initial values same as the input parameters.
Fixed the issue happened when user specifies the RHAS model for mixture finder
…if the number of states <= 6 && the number of the patterns in the alignment/partition >= 100
…; Temporarily comment out `free(init_state_freq_set);`, which HuaiyanRen added, as they cause an error
…_update # Conflicts: # main/phylotesting.cpp
…ture model format.
…th single partition
…dels for morphological data, which has not yet been validated
…in partition model format.
6c63cde
to
418a483
Compare
d44e931
to
af06267
Compare
accfa6c
to
b924171
Compare
… that occurs when users try to apply MixtureFinder to amino acid data; Create the --force-aa-mix-finder option to force IQ-TREE to run MixtureFinder for amino acid data
b924171
to
005eead
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Dear All,
This is a draft pull request that extends a pull request in progress (#11) and implements a new MixtureFinder-like algorithm for selecting the best-fitting mixture models for morphological data, which has not yet been validated in publications. See below for details.
We're moving forward with the plan to extend MixtureFinder (Ren et al., 2025), which currently only works on DNA data, to codon, binary, and non-morphological multistate data in #11. Morphological data are not considered in this PR because they are fundamentally incompatible with the existing MixtureFinder framework as they have a number of properties that differentiate them from other major data types:
These properties have led empiricists to use distinctive analytical conditions for morphological data in probabilistic phylogenetic methods. Generally:
However, I've recently learned that some Bayesian phylogenetic software (MrBayes and RevBayes) implement methods that model the heterogeneity of state frequencies among characters in morphological data using mixture models (Wright et al., 2016; https://revbayes.github.io/tutorials/morph_tree/). These methods do not seem to be widely used as far as I know (probably), but it may open up new avenues in morphological phylogenetics.
What I'm thinking is that the idea of modeling the heterogeneity of state frequencies (and perhaps also replacement rates) in morphological data using mixture models could be extended to maximum likelihood frameworks. In addition, a feature for automatic model selection in IQ-TREE similar to MixtureFinder could improve model fit for morphological data. I think it would be valuable, given that the aforementioned software do not implement such a feature.
I've implemented my devised MixtureFinder-like algorithm for selecting the best-fitting models for morphological data in this PR, although they have several limitations that promote further development; they currently cannot explicitly consider the state space heterogeneity among characters (users probably need to test models per partition) and ascertainment bias corrections (
+ASC
in IQ-TREE) cannot be applied, as+ASC
in mixture models is currently not implemented (#12).Although of course this new algorithm should and must be theoretically well explained and empirically validated in a peer-reviewed paper (or at least in a preprint) in the future before it is possibly merged into the master branch and explained in the documentation, I create this PR for now to potentially get some feedback.
I apologize for the current dirtiness of the code.
I'll post details of the algorithm, the usage details, and some test runs with empirical datasets later. I'm sorry, but it might take a few days or more.
If I have misunderstood something, or if this algorithm is fundamentally not justified in the first place, I apologize.