Allow for single-state alignments and remove misleading warnings #73
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current version does not allow an alignment or a partition to have only one state by invoking a hardcoded error. The motivation for this restriction is probably that states not present on tree leaves are assumed to be excluded from the substitution process, hence for a single-state alignment there would be only one state left for modelling substitution process, which would make the observed alignment a certain event given any tree topology and branch lengths (LL = 0).
Here I want to bring your attention to the fallacy of assuming that unrepresented states get excluded from the substitution process. Actually, they do not! Below I provide a simple example in proof:
Let's assume a dummy alignment
dummy.phy
:And use it for tree reconstruction after disabling the error raising expression in the
Alignment::checkAbsentStates
function:iqtree3 -seed 123 -nt 1 -s dummy.phy --seqtype AA --keep-ident -m "LG" -pre test_dummy_allstates
iqtree3 -seed 123 -nt 1 -s dummy.phy --seqtype AA --keep-ident -m "LG+F" -pre test_dummy_onestate
Both runs finish successfully. The run using the
LG
model results inLL = -2.537
, while the run using theLG+F
model results inLL = 0.0
. All the inferred tree branches have the minimum allowed length (1e-6) in both runs.The explanation is obvious. The inbuilt state frequencies of the LG matrix used in the first run make it possible for all states to occur in the substitution process, implying that different alignments, not only the given one, can evolve on any tree to be estimated, hence the non-zero LL of the tree estimated for the given alignment. On the contrary, the observed
+F
state frequenciesfreq(A, other) = (1.0, 0.0)
used in the second run allow only for the A state in the substitution process, making the evolution of the given alignment inevitable on any tree.Were unrepresented state frequencies really excluded from the substitution process, we would observe the second run situation for the both runs.
The conclusion is:
This pull request allows for single-state alignments/partitions and modifies the warnings in accordance with the conclusion.
The
Alignment::checkAbsentStates
andSuperAlignment::checkAbsentStates
functions are made to be of the void-returning type because 1) all the relevant information is already printed insideAlignment::checkAbsentStates
and printing the sum of numbers of unobserved states inSuperAlignment::checkAbsentStates
(which can easily exceed the num_states) is somewhat misleading, and 2) the number of unobserved states is not used anywhere else.