Skip to content

[WIP: Implementation of a New Algorithm] Extension of MixtureFinder to morphological data #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 57 commits into
base: master
Choose a base branch
from

Conversation

HS6986
Copy link

@HS6986 HS6986 commented May 13, 2025

Dear All,

This is a draft pull request that extends a pull request in progress (#11) and implements a new MixtureFinder-like algorithm for selecting the best-fitting mixture models for morphological data, which has not yet been validated in publications. See below for details.

We're moving forward with the plan to extend MixtureFinder (Ren et al., 2025), which currently only works on DNA data, to codon, binary, and non-morphological multistate data in #11. Morphological data are not considered in this PR because they are fundamentally incompatible with the existing MixtureFinder framework as they have a number of properties that differentiate them from other major data types:

  • the number of states is different depending on the character
  • the state labels are arbitrarily and different depending on the character
  • they are artificially sampled so that they will completely or almost lack invariant (or sometimes parsimony uninformative) characters

These properties have led empiricists to use distinctive analytical conditions for morphological data in probabilistic phylogenetic methods. Generally:

However, I've recently learned that some Bayesian phylogenetic software (MrBayes and RevBayes) implement methods that model the heterogeneity of state frequencies among characters in morphological data using mixture models (Wright et al., 2016; https://revbayes.github.io/tutorials/morph_tree/). These methods do not seem to be widely used as far as I know (probably), but it may open up new avenues in morphological phylogenetics.

What I'm thinking is that the idea of modeling the heterogeneity of state frequencies (and perhaps also replacement rates) in morphological data using mixture models could be extended to maximum likelihood frameworks. In addition, a feature for automatic model selection in IQ-TREE similar to MixtureFinder could improve model fit for morphological data. I think it would be valuable, given that the aforementioned software do not implement such a feature.

I've implemented my devised MixtureFinder-like algorithm for selecting the best-fitting models for morphological data in this PR, although they have several limitations that promote further development; they currently cannot explicitly consider the state space heterogeneity among characters (users probably need to test models per partition) and ascertainment bias corrections (+ASC in IQ-TREE) cannot be applied, as +ASC in mixture models is currently not implemented (#12).

Although of course this new algorithm should and must be theoretically well explained and empirically validated in a peer-reviewed paper (or at least in a preprint) in the future before it is possibly merged into the master branch and explained in the documentation, I create this PR for now to potentially get some feedback.

I apologize for the current dirtiness of the code.

I'll post details of the algorithm, the usage details, and some test runs with empirical datasets later. I'm sorry, but it might take a few days or more.

If I have misunderstood something, or if this algorithm is fundamentally not justified in the first place, I apologize.

HuaiyanRen and others added 30 commits March 4, 2025 16:21
Allow to fix the parameters for RHAS when using mixture finder.
Allow all these options: MIX+MF, MIX+MFP, MF+MIX, MFP+MIX -- to run the mixture finder
…ing mixture finder.

Another option: -optfromgiven
The RHAS model will still be optimized according to the initial values same as the input parameters.
Fixed the issue happened when user specifies the RHAS model for mixture finder
…if the number of states <= 6 && the number of the patterns in the alignment/partition >= 100
…; Temporarily comment out `free(init_state_freq_set);`, which HuaiyanRen added, as they cause an error
@HS6986 HS6986 force-pushed the feature/HS6986/MorphMixtureFinder branch from d44e931 to af06267 Compare May 18, 2025 12:24
@HS6986 HS6986 changed the title [WIP: Implementation of a New Algorithm] The extension of MixtureFinder to morphological data [WIP: Implementation of a New Algorithm] Extension of MixtureFinder to morphological data May 18, 2025
@HS6986 HS6986 force-pushed the feature/HS6986/MorphMixtureFinder branch from accfa6c to b924171 Compare May 21, 2025 17:30
… that occurs when users try to apply MixtureFinder to amino acid data; Create the --force-aa-mix-finder option to force IQ-TREE to run MixtureFinder for amino acid data
@HS6986 HS6986 force-pushed the feature/HS6986/MorphMixtureFinder branch from b924171 to 005eead Compare May 21, 2025 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants