-
Notifications
You must be signed in to change notification settings - Fork 3
/
README.Rmd
104 lines (78 loc) · 4.92 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
title: "README.Rmd"
bibliography: bibliography.bib
---
![logo](inst/images/treesiftR.png)
[![Travis build status](https://travis-ci.org/wrightaprilm/treesiftr.svg?branch=master)](https://travis-ci.org/wrightaprilm/treesiftr)
[![Coverage status](https://codecov.io/gh/wrightaprilm/treesiftr/branch/master/graph/badge.svg)](https://codecov.io/github/wrightaprilm/treesiftr?branch=master)
## Introduction
Estimating phylogenetic trees is crucial in many areas of evolutionary biology.
However, visualizing the relationship between data and trees is not intutive. To assist with visualizing this relationship, I have created [treesiftR](https://wrightaprilm.github.io/treesiftr/),
an a [Shiny application](https://wrightaprilm.shinyapps.io/treesiftr_app/) [@shiny] that can be run locally or used via the web that takes subsets of data from a phylogenetic matrix, generates a tree under parsimony, and scores that tree under both the likelihood and parsimony criteria. The output of the package is a visualization or set of visualizations of a tree and characters. treesiftR can also be used as an [@R] package to provide a visual demonstration of the relationship between
trees and data, while enforcing concepts in R programming.
If you are interested in treesiftr, take a look at my [instructor's
guide](https://wrightaprilm.github.io/treesiftr/articles/00-Instuctor-Guide.html)
for more information (PDF also available [here](https://github.com/wrightaprilm/treesiftr/blob/master/vignettes/00-Instuctor-Guide.pdf)) about adopting this module.
### Target Audience
treesiftr has been used in the [Analytical Paleobiology Workshop](http://www.analytical.palaeobiology.de/), in which the audience was graduate students and postdocs, many of whom had no prior knowledge of phylogeny. It is also
used in the Genetics course at Southeastern Louisiana University, where the audience
is undergraduates who have no prior knowledge of phylogeny. It is meant to be
accompanied by lecture material on phylogenetics. A glossary is provided with each
worksheet, and a sample slide deck is included in the `inst/slides` directory.
## Installation
Currently, treesiftr can be installed via the devtools ```install_github```
function [@devtools].
```{r eval=FALSE}
devtools::install_github("wrightaprilm/treesiftr")
```
## Required Packages
```{r setup}
knitr::opts_chunk$set(
message = FALSE,
warning = FALSE,
include = FALSE
)
library(ape)
library(treesiftr)
library(phangorn)
library(alignfigR)
library(ggtree)
library(ggplot2)
data(bears)
```
Note: [phangorn](https://cran.r-project.org/web/packages/phangorn/index.html), [alignfigr](https://cran.r-project.org/web/packages/alignfigR/index.html), and [ggplot2](https://ggplot2.tidyverse.org/) are all available via [CRAN](https://cran.r-project.org/).
[ggtree](https://bioconductor.org/packages/release/bioc/html/ggtree.html) is available via [bioconductor](https://www.bioconductor.org/). For information on installing
bioconductor packages, see [here](https://www.bioconductor.org/install/).
## Operation
The first step to making a treesiftr visualization is to select the subset of
the phylogenetic matrix that we would like to visualize. This is performed via a function called ```generate_sliding```. The below command will subset the
```{r message=FALSE, warning=FALSE}
# Locate package data
fdir <- system.file("extdata", package = "treesiftr")
aln_path <- file.path(fdir, "bears_fasta.fa")
bears <- read_alignment(aln_path)
tree <- read.tree(file.path(fdir, "starting_tree.tre"))
# Generate our list of dataframe subsets. For simplicity, we will look at one
# set of characters
sample_df <- generate_sliding(bears, start_char = 1, stop_char = 1, steps = 1)
```
The result of this is a dataframe, shown below:
```{r}
sample_df
```
This dataframe dispays the start character (the first character that will be visualized) and stop character (the final character that will be visualized).
We can then build a tree from our data subset:
```{r message=FALSE, warning=FALSE}
output_tree <- generate_tree_vis(sample_df = sample_df, alignment = aln_path, tree = tree, phy_mat = bears)
output_tree
```
`Phangorn` [@Schliep2011, Schliep2017] requires a starting tree to estimate a parsimony tree. We specify the tree we read in earlier for this purpose. The trees, which were generated with `ggtree` [@ggtree], a ggplot2 [@ggplot2] library for phylogenies, have been saved to a vector, which can be displayed in its entirety, or subsetted to look at specific trees.
## Random Trees
You can also map the characters to a random tree, and count the parsimony steps
or score the characters under likelihood on this tree.
```{r message=FALSE, warning=FALSE}
random_tree <- generate_tree_vis(sample_df = sample_df, alignment = aln_path, tree = tree, phy_mat = bears,
random_tree = TRUE, pscore = TRUE)
random_tree
```
## References