The implications of incongruence between gene tree and species tree topologies for divergence time estimation
The scripts presented here are relevent to the article: Carruthers et al. The implications of incongruence between gene tree and species tree topologies for divergence time estimation. Syst Biol.
To generate simulated data run simlpe_four_taxon_simulation.R.
n_gene_trees: number of gene trees to simulate.
locus_size: length in base pairs of each simulated locus.
probs: the probability that gene tree topologies are incongruent with the species tree topology in each simulated dataset.
To analyse simulated data run overall.Rev in RevBayes.
This script will perform analyses of:
- The entire dataset (concatenated) - using simple_clock_script.Rev to estimate t, more_relaxed_script.Rev and less_relaxed_script.Rev to estimate r, and simple_branch_length_estimation.Rev to estimate n.
- Concatenated loci with gene trees that are topologically congruent with the species tree - using same scripts as for 1
- Individual loci, such that parameters can be estimated from congruent branches in gene trees - using simple_clock_scripts_gene_trees.Rev
To generate simulated data run simple_sixteen_taxon_simulation.R
Simulation needs to be repeated for imbalanced and imbalanced tree, and different levels of topological incongruence.
congruence_level: output_file (needs to refer to whether tree is balanced, and level of incongruence)
tree: species tree topology. Either balanced_sixteen_taxon_cong.tre or unbalanced_sixteen_taxon_cong.tre
incong_tree: incongurent gene tree topology. Either balanced_sixteen_one_incong.tre, balanced_sixteen_two_incong.tre, balanced_sixteen_three_incong.tre, balanced_sixteen_four_incong.tre, unbalanced_sixteen_incong.tre
n_gene_trees: number of gene trees to simulate
locus_size: number of base pairs per locus
To analyse simulated data run overall.Rev in RevBayes.
Thie script will perform analyses of the entire dataset (concatenated) - using simple_clock_script.Rev to estimate t; simple_relaxed_script.Rev and simple_less_relaxed_script.Rev to estimate r; and simple_branch_length_estimation.Rev to estimate n.
To generate simulated data use multispecies_coalescent_sixteen_taxon_simulation.R
n_gene_trees: number of gene trees to simulate
locus_size: size in base pairs of each gene tree
n_reps: number of times to repeat entire simulation
effective_population_size_approx: Ne
n_tips: number of taxa in species tree
species_tree: species tree, either entire_tree_unbalanced.tre, or entire_tree_balanced.tre
type: name of output file
To generate concatenated alignment of simulated data based only on loci with gene trees that are topologically congruent with the species tree use get_congruent_subsets.R
To analyse simulated data run rev_shell.Rev
This script will estimate t in the balanced species tree with entire_script_balanced.Rev; t in the unbalanced species tree with entire_script_unbalanced.Rev
also use rev_shell_balanced_by_gene.Rev to estimate t in individual gene trees.
To generate simulated data use simulation.R
n_gene_trees: number of gene trees to simulate
locus_size: size in base pairs of each gene tree
n_reps: number of times to repeat entire simulation
effective_population_size_approx: Ne
n_tips: number of taxa in species tree
To generate concatenated alignment of simulated data based only on loci with gene trees that are topologically congruent with the species tree use get_congruent_subsets.R
To generate start trees from initial four taxon simulation for analysis in multi species coalescent framework use generate_start_trees_for_simple_analysed_as_coales.R
To analyse simulated data use overall.rev
Scripts for empirical example presented in the study.
- data (gene trees, alignments, and species tree)
- scripts (for running astral, raxml_ng, and rooting
- branch_length_tree.tre: species tree with molecular branch lengths derived from branch wise method
- for_analysis_cong.tre: species tree with molecular branch lengths estimated from concatenated alignment of congruent loci
- for_analysis_all.tre: species tree with molecular branch lengths estimates from concatenated alignment of all loci
- .CPPR8S files: input files for treePL. There are two for each type of input tree (listed above). One with minimal assumptions, and one with full assumptions
- output folders: contains output tree for each of the 6 different analyses in treePL