[WIP: Implementation of a New Algorithm] Extension of MixtureFinder to morphological data #35

HS6986 · 2025-05-13T23:02:22Z

Dear All,

This is a draft pull request that extends a pull request in progress (#11) and implements a new MixtureFinder-like algorithm for selecting the best-fitting mixture models for morphological data, which has not yet been validated in publications. See below for details.

We're moving forward with the plan to extend MixtureFinder (Ren et al., 2025), which currently only works on DNA data, to codon, binary, and non-morphological multistate data in #11. Morphological data are not considered in this PR because they are fundamentally incompatible with the existing MixtureFinder framework as they have a number of properties that differentiate them from other major data types:

the number of states is different depending on the character
the state labels are arbitrarily and different depending on the character
they are artificially sampled so that they will completely or almost lack invariant (or sometimes parsimony uninformative) characters

These properties have led empiricists to use distinctive analytical conditions for morphological data in probabilistic phylogenetic methods. Generally:

they partition data by the number of states (see Černý & Simanoff, 2023; https://davidcerny.github.io/post/teaching_revbayes/; https://github.com/davidcerny/GEOS26100-Fall2022; Huang, 2025 preprint; https://github.com/ej91016/MorphoParse)
they apply the MK models (Lewis, 2001), which are models with equal state frequencies and replacement rates, thus avoiding making a priori assumptions of them
they apply ascertainment bias corrections (e.g., Lewis, 2001) to account for the lack or paucity of invariant (or parsimony uninformative) characters, after removing invariant (or parsimony uninformative) characters if they are present

However, I've recently learned that some Bayesian phylogenetic software (MrBayes and RevBayes) implement methods that model the heterogeneity of state frequencies among characters in morphological data using mixture models (Wright et al., 2016; https://revbayes.github.io/tutorials/morph_tree/). These methods do not seem to be widely used as far as I know (probably), but it may open up new avenues in morphological phylogenetics.

What I'm thinking is that the idea of modeling the heterogeneity of state frequencies (and perhaps also replacement rates) in morphological data using mixture models could be extended to maximum likelihood frameworks. In addition, a feature for automatic model selection in IQ-TREE similar to MixtureFinder could improve model fit for morphological data. I think it would be valuable, given that the aforementioned software do not implement such a feature.

I've implemented my devised MixtureFinder-like algorithm for selecting the best-fitting models for morphological data in this PR, although they have several limitations that promote further development; they currently cannot explicitly consider the state space heterogeneity among characters (users probably need to test models per partition) and ascertainment bias corrections (+ASC in IQ-TREE) cannot be applied, as +ASC in mixture models is currently not implemented (#12).

Although of course this new algorithm should and must be theoretically well explained and empirically validated in a peer-reviewed paper (or at least in a preprint) in the future before it is possibly merged into the master branch and explained in the documentation, I create this PR for now to potentially get some feedback.

I apologize for the current dirtiness of the code.

I'll post details of the algorithm, the usage details, and some test runs with empirical datasets later. I'm sorry, but it might take a few days or more.

If I have misunderstood something, or if this algorithm is fundamentally not justified in the first place, I apologize.

…se is now.

… protein.

…on works for non-reversible models

…allow both options: MIX+MF and MF+MIX

Allow to fix the parameters for RHAS when using mixture finder. Allow all these options: MIX+MF, MIX+MFP, MF+MIX, MFP+MIX -- to run the mixture finder

…ing mixture finder. Another option: -optfromgiven The RHAS model will still be optimized according to the initial values same as the input parameters.

Fixed the issue happened when user specifies the RHAS model for mixture finder

…if the number of states <= 6 && the number of the patterns in the alignment/partition >= 100

…nder

…; Temporarily comment out `free(init_state_freq_set);`, which HuaiyanRen added, as they cause an error

…ture model format.

…th single partition

…dels for morphological data, which has not yet been validated

…ixture model

…in partition model format.

… mixture model

…point file

This reverts commit 83e6826, reversing changes made to 418a483.

… that occurs when users try to apply MixtureFinder to amino acid data; Create the --force-aa-mix-finder option to force IQ-TREE to run MixtureFinder for amino acid data

…rphMixtureFinder

HS6986 · 2025-05-28T03:00:19Z

I'm sorry, but I realized that this algorithm doesn't seem to be justified at least in the current implementation. I apologize the confusion I may have caused. I am closing this PR for now.

HuaiyanRen and others added 30 commits March 4, 2025 16:21

allow mAIC calculation for 0 or 1 "overlapping" sequences.

f2c802b

allow mAIC calculation for 2 "overlapping" sequences.

96d6b53

method: add one gappy sequence for "2-overlapping" cases.

7357f83

fix bugs for old method of "2-overlapping" cases. Although we don't u…

aa60951

…se is now.

Fix protein ambiguous preblem for mAIC. Only compute mAIC for DNA and…

219fd3f

… protein.

Allow mAIC calculation for binary and codon data. Make mAIC calculati…

687002e

…on works for non-reversible models

extend MixtureFinder to codon, binary, multistate, and amino acid data

6df0555

fix frequency types

d3e3e59

allow FU in mixture models

3a4de2b

Allow to fix the parameters for RHAS when using mixture finder. Also …

13b6892

…allow both options: MIX+MF and MF+MIX

Merge pull request iqtree#19 from thomaskf/mixFinder_update

81f4e7a

Allow to fix the parameters for RHAS when using mixture finder. Allow all these options: MIX+MF, MIX+MFP, MF+MIX, MFP+MIX -- to run the mixture finder

Fixed the issue happened when user specifies the RHAS model when runn…

4710559

…ing mixture finder. Another option: -optfromgiven The RHAS model will still be optimized according to the initial values same as the input parameters.

Merge pull request iqtree#20 from thomaskf/mixFinder_update

dd0c411

Fixed the issue happened when user specifies the RHAS model for mixture finder

delete the class ModelMultistate

39c8a32

Merge branch 'master' into feature/HS6986/extend-MixtureFinder

4af564f

Changed the wording of the GTRX warnings; disabled the GTRX warnings …

477b8b0

…if the number of states <= 6 && the number of the patterns in the alignment/partition >= 100

Fix indentation issues

5e95d0e

Merge branch 'master' into mixFinder_update

f262dbd

Merge branch 'master' into huaiyan

53ec25a

Only generateNestNetwork for DNA models.

06fffe7

Allow the option -m MIX+MF, MIX+MFP, MF+MIX, MFP+MIX to run MixtureFi…

988531d

…nder

create a function to initialise MixtureFinder frequencies

16fc417

Allow mixtureFinder on an alignment with a single partition

61342be

Fixed the double commas inside the best_model.nex file

016a6b8

Fix conflicts

6bac548

Restore a comment I accidentally deleted

0b417c2

Refine the code according to the advice by StefanFlaumberg and bqminh…

ce2b9a9

…; Temporarily comment out `free(init_state_freq_set);`, which HuaiyanRen added, as they cause an error

Delete free(init_state_freq_set)

727a934

Relocate the misplaced name = "GTRX";

623874a

Restrict isRateTypeNested() only to DNA data

3f01fd9

HuaiyanRen and others added 10 commits May 12, 2025 17:04

report the mixture model in only in single-partition alignment in mix…

d29ff88

…ture model format.

Copy the resulting final tree to the first partition for mixfinder wi…

64423b1

…th single partition

Implement a new MixtureFinder-like algorithm for selecting mixture mo…

0c58dc4

…dels for morphological data, which has not yet been validated

update the checkpoint file to keep the model parameters of the best m…

1e073ec

…ixture model

Merge branch 'huaiyan' into mixFinder_update

0a9f5a5

report the mixture model in only in single-partition alignment STILL …

c6b25ab

…in partition model format.

introduce an option -mfopt to further optimise nucl/aa frequencies in…

66c5278

… mixture model

Add an option --writefreqchkpt to force writing nucl/aa freq to check…

6fc4e70

…point file

update the version number

fdf0476

Fix algorithm

418a483

HS6986 mentioned this pull request May 16, 2025

Extend MixtureFinder to codon, binary, multistate, (and amino acid) data #11

Open

HS6986 force-pushed the feature/HS6986/MorphMixtureFinder branch from 6c63cde to 418a483 Compare May 18, 2025 04:45

HS6986 added 5 commits May 18, 2025 13:51

Merge master of thomaskf/iqtree3 and fix conflicts

83e6826

Fix algorithm

786bcb5

minor fix

d3b7290

Fix

89dac41

Fix

af06267

HS6986 force-pushed the feature/HS6986/MorphMixtureFinder branch from d44e931 to af06267 Compare May 18, 2025 12:24

HS6986 added 2 commits May 18, 2025 23:15

Delete MIX{MK+FQ,MK+FQ}

1388f73

Revert "Merge master of thomaskf/iqtree3 and fix conflicts"

d2389ed

This reverts commit 83e6826, reversing changes made to 418a483.

HS6986 changed the title ~~[WIP: Implementation of a New Algorithm] The extension of MixtureFinder to morphological data~~ [WIP: Implementation of a New Algorithm] Extension of MixtureFinder to morphological data May 18, 2025

Merge branch 'master' into feature/HS6986/MorphMixtureFinder

aca3c94

HS6986 force-pushed the feature/HS6986/MorphMixtureFinder branch from accfa6c to b924171 Compare May 21, 2025 17:30

Set warnings according to the advice by StefanFlaumberg; Set an error…

005eead

… that occurs when users try to apply MixtureFinder to amino acid data; Create the --force-aa-mix-finder option to force IQ-TREE to run MixtureFinder for amino acid data

HS6986 force-pushed the feature/HS6986/MorphMixtureFinder branch from b924171 to 005eead Compare May 21, 2025 18:19

HS6986 added 3 commits May 22, 2025 03:22

Change the wording of an error

077f45e

Change the wording of an error

4bc5ef9

Merge remote-tracking branch 'upstream/master' into feature/HS6986/Mo…

b1a757d

…rphMixtureFinder

HS6986 closed this May 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP: Implementation of a New Algorithm] Extension of MixtureFinder to morphological data #35

[WIP: Implementation of a New Algorithm] Extension of MixtureFinder to morphological data #35

Uh oh!

HS6986 commented May 13, 2025 •

edited

Loading

Uh oh!

HS6986 commented May 28, 2025

Uh oh!

Uh oh!

[WIP: Implementation of a New Algorithm] Extension of MixtureFinder to morphological data #35

[WIP: Implementation of a New Algorithm] Extension of MixtureFinder to morphological data #35

Uh oh!

Conversation

HS6986 commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HS6986 commented May 28, 2025

Uh oh!

Uh oh!

HS6986 commented May 13, 2025 •

edited

Loading