This repository hosts materials I developed as a teaching assistant for the course "Phylogenetics and the Fossil Record" (GEOS 26100), taught by Graham Slater at the University of Chicago in Fall 2022.
For my reflections on the experience and some commentary on the files described below, see:
https://davidcerny.github.io/post/teaching_revbayes
This repository includes the following files:
-
PDF handouts for four labs covering the basics of model-based phylogenetic inference from discrete morphological data:
- Lab 5 focuses on maximum likelihood as implemented in IQ-TREE 2 (Minh et al. 2020);
- Labs 6, 7, and 8 focus on Bayesian analysis as implemented in RevBayes (Höhna et al. 2016).
Each subdirectory also contains the corresponding TeX source and graphics to make it easier to create derivative works, should anyone feel so inclined.
-
Two example data files:
- A Nexus file with the Tedford et al. (2009) canid matrix (
Tedford_2009-1.nex
), modified from the version available from Graeme T. Lloyd's phylogenetic dataset repository. (Specifically, the file was stripped of theASSUMPTIONS
block to make it work with RevBayes.) - A tab-separated file with the fossil ages of the corresponding taxa (
Tedford_ages.tsv
).
- A Nexus file with the Tedford et al. (2009) canid matrix (
-
An R script intended to preprocess Nexus character matrices for IQ-TREE analyses (
partition.R
) by splitting them into separate Phylip (.phy
) files – first by character type (ordered vs. unordered), and then by the number of observed character states. -
Three RevBayes (
.Rev
) scripts:archery.Rev
goes with Lab 7;Tedford_phylo.Rev
goes with Lab 7;Tedford_FBD_strictclock.Rev
goes with Lab 8.
-
In the handout for Lab 5, I say that maximum-likelihood phylogenetic inference was originally developed for DNA sequences. This is not correct; it was actually first developed for continuous characters representing blood-group allele frequencies (Edwards & Cavalli-Sforza 1964).
-
Also in the handout for Lab 5, my well-intentioned advice to use the standard/slow (
-b
) rather than ultrafast (-B
) bootstrap sometimes caused students to run into the following error with IQ-TREE v2.2.0:ERROR: phylokernelnew.h:3332: double PhyloTree::computeLikelihoodFromBufferGenericSIMD() [VectorClass = Vec4d, FMA = true, SITE_MODEL = false]: Assertion `std::isfinite(tree_lh) && "Numerical underflow for lh-from-buffer"' failed. ERROR: ERROR: *** IQ-TREE CRASHES WITH SIGNAL ABORTED
I haven't verified if this error occurs still occurs in the most recent version (v2.3.6 as of 2024-10-27), but I would still recommend using the more numerically stable ultrafast bootstrap just in case.
-
Changing
Tedford_phylo.Rev
as suggested in Exercise 6 of Lab 7 will trigger RevBayes issue #308 and result in the following error:Error: Ambiguous call to function 'sum' with arguments ( Probability[] ) Potentially matching functions are: sum (Real[]<any> x) sum (RealPos[]<any> x) sum (Integer[]<any> x) sum (Natural[]<any> x) Error: Problem processing line 26 in file ""Tedford_phylo.Rev""
Unfortunately, this issue is still unsolved, so I'd recommend modifying the exercise by asking the students to try out a different prior – e.g,
dnExponential(5)
. -
Also in the script for Lab 8, when the students try to plug their own numbers into the calculation of the origin age prior, it's preferable for type safety reasons to write
abs(upper - lower)/qexp(0.95)
instead of just(upper - lower)/qexp(0.95)
. -
Occasionally, students might see negative branch lengths when they try to summarize their time trees from Lab 8 using
mapTree()
ormccTree()
. These can be avoided by specifyingpositiveBranchLengths=TRUE
.
What about Labs 1–4? What were they about and why are they not here?
The first four labs of the course were dedicated to finding a pre-existing character matrix in the literature (Lab 1); constructing one's own toy matrix for different types of pasta (Lab 2); parsimony analysis in PAUP* (Lab 3); and time-scaling parsimony trees using R, RStudio, and paleotree
(Lab 4). Unlike the handouts for Labs 5–8, which I wrote pretty much from scratch, the handouts for the first four labs were heavily based on earlier materials prepared by Anna Wisniewski and Graham Slater, so it didn't feel appropriate to upload them to my personal GitHub.
As described in the accompanying blog post, Labs 6–8 were developed following the advice of a number of RevBayes developers, some of whom kindly shared with me their own tutorials and workshop slides. When publicly available, these are linked to from the corresponding handouts and credited to their authors, with all such citations highlighted in blue. They include:
- Jeremy Brown's slides from the Workshop on Molecular Evolution (link) (Lab 6)
- Tracy Heath's tutorial from the Taming the BEAST workshop (link) (Lab 8)
In addition, I made use of a number of official tutorials hosted directly on the RevBayes website:
- Introduction to MCMC using RevBayes, written by Wade Dismukes, Tracy Heath, and June Walker (Lab 7)
- Discrete morphology - Tree Inference, written by April Wright, Michael Landis, and Sebastian Höhna (Lab 7)
- Nucleotide substitution models, written by Sebastian Höhna, Michael Landis, Brian Moore, and Tracy Heath (Lab 7)
Valuable feedback and inspiration were further provided by Jiansi Gao, Bruno Petrucci, Orlando Schwery, Carrie Tribble, Rachel Warnock, and April Wright.
The markdown file used to apply the CC BY-SA license to the contents of this repo was taken from this repository helpfully maintained by Santiago Soler and colleagues.
- Edwards AWF, Cavalli-Sforza LL. 1964. Reconstruction of evolutionary trees. Pp. 67–76 in Heywood VH, McNeill J, eds. Phenetic and Phylogenetic Classification. London, UK: Systematics Association Publ. No. 6
- Höhna S, Landis MJ, Heath TA, Boussau B, Lartillot N, Moore BR, Huelsenbeck JP, Ronquist F. 2016. RevBayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Syst. Biol. 65(4): 726–736
- Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R. 2020. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37(5): 1530–1534. Corrigendum: 37(8): 2461
- Tedford RH, Wang X-M, Taylor BE. 2009. Phylogenetic systematics of the North American fossil Caninae (Carnivora: Canidae). Bull. Am. Mus. Nat. Hist. 325: 1–218
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.