-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Some notes from my e-mail correspondence with @davidchampredon:
Here's the outline of the manuscript.
Introduction
- Genetic sequences are considered as certain quantities
- But uncertainty is introduced at nearly all the sequencing steps (bio and software)
- So we should quantify uncertainty and propagate it into any genetical analysis
review of the few studies that dealt with that - not satisfactoryMethods
probabilistic sequences theoretical framework
very brief description of
sung(full description in another paper)Example 1: phylogenetic trees applied to
a) summary stats on ancestry (tree topology) ? or
b) clustering? or
c) source attribution?Example 2: ??? (something that is different enough from example 1, no phylogenetic tree)
Results
- results for example 1
- results for example 2
Previously, I think my angle was too focused on the phylogenies reconstruction.
My initial experiment was to:
i) generate a known tree (known sequences and evolution)
ii) simulate uncertainty (observation errors) N times according to the theoretical model using "probabilistic sequences"
iii) infer tree for each of the N simulations
iv) see how all those trees differ using the TN93 distanceI think I shouldn't use TN93 at step iv) because this is, I think, too "abstract". I should probably push the ?analysis further with results that are more intuitive for a broad audience. Examples that come to my mind are the a), b), c) points in Example 1 above... but I would like to pick your brain on this.
Also, I think there should be a second example that is different from example 1 (so, not based on a phylogeny) in order to show the broad applicability of the method. But I'm limited when it comes to choosing something... again you will probably think of something relevant here.