Skip to content

pebgroup/tree_incongruence_divergence_times

Repository files navigation

The implications of incongruence between gene tree and species tree topologies for divergence time estimation

The scripts presented here are relevent to the article: Carruthers et al. The implications of incongruence between gene tree and species tree topologies for divergence time estimation. Syst Biol.

Simple four taxon

To generate simulated data run simlpe_four_taxon_simulation.R.

Arguments:

n_gene_trees: number of gene trees to simulate.
locus_size: length in base pairs of each simulated locus.
probs: the probability that gene tree topologies are incongruent with the species tree topology in each simulated dataset.

To analyse simulated data run overall.Rev in RevBayes.

This script will perform analyses of:

  1. The entire dataset (concatenated) - using simple_clock_script.Rev to estimate t, more_relaxed_script.Rev and less_relaxed_script.Rev to estimate r, and simple_branch_length_estimation.Rev to estimate n.
  2. Concatenated loci with gene trees that are topologically congruent with the species tree - using same scripts as for 1
  3. Individual loci, such that parameters can be estimated from congruent branches in gene trees - using simple_clock_scripts_gene_trees.Rev

Simple sixteen taxon

To generate simulated data run simple_sixteen_taxon_simulation.R

Simulation needs to be repeated for imbalanced and imbalanced tree, and different levels of topological incongruence.

Arguments:

congruence_level: output_file (needs to refer to whether tree is balanced, and level of incongruence)
tree: species tree topology. Either balanced_sixteen_taxon_cong.tre or unbalanced_sixteen_taxon_cong.tre
incong_tree: incongurent gene tree topology. Either balanced_sixteen_one_incong.tre, balanced_sixteen_two_incong.tre, balanced_sixteen_three_incong.tre, balanced_sixteen_four_incong.tre, unbalanced_sixteen_incong.tre
n_gene_trees: number of gene trees to simulate
locus_size: number of base pairs per locus

To analyse simulated data run overall.Rev in RevBayes.

Thie script will perform analyses of the entire dataset (concatenated) - using simple_clock_script.Rev to estimate t; simple_relaxed_script.Rev and simple_less_relaxed_script.Rev to estimate r; and simple_branch_length_estimation.Rev to estimate n.

Multi-species coalescent sixteen taxon

Arguments:

n_gene_trees: number of gene trees to simulate
locus_size: size in base pairs of each gene tree
n_reps: number of times to repeat entire simulation
effective_population_size_approx: Ne
n_tips: number of taxa in species tree
species_tree: species tree, either entire_tree_unbalanced.tre, or entire_tree_balanced.tre
type: name of output file

To generate concatenated alignment of simulated data based only on loci with gene trees that are topologically congruent with the species tree use get_congruent_subsets.R

To analyse simulated data run rev_shell.Rev

This script will estimate t in the balanced species tree with entire_script_balanced.Rev; t in the unbalanced species tree with entire_script_unbalanced.Rev

also use rev_shell_balanced_by_gene.Rev to estimate t in individual gene trees.

Multi-species coalescent four taxon

To generate simulated data use simulation.R

Arguments:

n_gene_trees: number of gene trees to simulate
locus_size: size in base pairs of each gene tree
n_reps: number of times to repeat entire simulation
effective_population_size_approx: Ne
n_tips: number of taxa in species tree

To generate concatenated alignment of simulated data based only on loci with gene trees that are topologically congruent with the species tree use get_congruent_subsets.R

To generate start trees from initial four taxon simulation for analysis in multi species coalescent framework use generate_start_trees_for_simple_analysed_as_coales.R

To analyse simulated data use overall.rev

Empirical Example

Scripts for empirical example presented in the study.

phylogenetic_inference contains:

  • data (gene trees, alignments, and species tree)
  • scripts (for running astral, raxml_ng, and rooting

divergence_times contains:

  • branch_length_tree.tre: species tree with molecular branch lengths derived from branch wise method
  • for_analysis_cong.tre: species tree with molecular branch lengths estimated from concatenated alignment of congruent loci
  • for_analysis_all.tre: species tree with molecular branch lengths estimates from concatenated alignment of all loci
  • .CPPR8S files: input files for treePL. There are two for each type of input tree (listed above). One with minimal assumptions, and one with full assumptions
  • output folders: contains output tree for each of the 6 different analyses in treePL

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages