Skip to content

Commit

Permalink
added prediction strength
Browse files Browse the repository at this point in the history
  • Loading branch information
Dan Knightsaaf committed Apr 16, 2016
1 parent 45d1241 commit f39119a
Show file tree
Hide file tree
Showing 7 changed files with 278 additions and 0 deletions.
165 changes: 165 additions & 0 deletions data/globalgut-66-adults/prediction.strength.html

Large diffs are not rendered by default.

Binary file added data/globalgut-66-adults/prediction.strength.pdf
Binary file not shown.
105 changes: 105 additions & 0 deletions data/globalgut-66-adults/prediction.strength.rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
---
title: "Prediction strength"
output:
html_document:
theme: united
fig_width: 6
fig_height: 6
pdf_document:
fig_width: 6
fig_height: 6
---
Back to [Table of Contents](../../doc/index.html)

**All of the code in this page is meant to be run in ```R``` unless otherwise specified.**

## Loading a genus table and the metadata into R
Before loading data into R, this QIIME command must be run on the command line to collapse OTU counts into genus (L6) and phylum (L2) count tables:
```{r eval=FALSE, engine='bash'}
# (run on command line)
summarize_taxa.py -i otu_table.biom -L 6
# convert to JSON BIOM format to load into R using R biom package:
biom convert -i otu_table_L6.biom -o otu_table_L6_json.biom --to-json
```


Inside `R`, Install biom, vegan, and cluster packages **if not installed**.
```{r eval=FALSE}
install.packages(c('biom','vegan','cluster'),repo='http://cran.wustl.edu')
```

Load packages
```{r, eval=TRUE, echo=FALSE}
library('biom')
library('vegan')
library('cluster')
```

Load packages
```{r, eval=FALSE, echo=TRUE}
library('biom')
library('vegan')
library('cluster')
```

Load data
```{r, eval=TRUE}
# load biom file
genus.biom <- read_biom('otu_table_L6_json.biom')
# Extract data matrix (genus counts) from biom table
genus <- as.matrix(biom_data(genus.biom))
# transpose so that rows are samples and columns are genera
genus <- t(genus)
# load mapping file
map <- read.table('map.txt', sep='\t', comment='', head=T, row.names=1)
```


It is extremely important to ensure that your genus table and metadata table sample IDs are lined up correctly.
```{r, eval=TRUE}
# find the overlapping samples
common.ids <- intersect(rownames(map), rownames(genus))
# get just the overlapping samples
genus <- genus[common.ids,]
map <- map[common.ids,]
```


## Calculate prediction strength

```{r, eval=TRUE}
# Source the prediction strength code.
source('../../src/prediction.strength.r')
# Calculate Bray-Curtis distances
bc <- vegdist(genus)
# Run prediction strength analysis on bray-curtis table
ps <- prediction.strength(bc)
# plot prediction strength
plot(2:10,ps,type='b',ylim=c(0,1),xlab='Number of clusters', ylab='Prediction strength')
```

Make a PCoA plot colored by cluster, but showing COUNTRY by shape of the points. Note that USA is clustered almost perfectly.
```{r, eval=TRUE}
# Run partitioning around medoids (PAM) clustering with 2 clusters
p <- pam(bc,2)
# PCoA coordinates of Bray-Curtis distances
pc <- cmdscale(bc,2)
# plot PCoA colored by cluster, with countries shown by shape.
plot(pc[,1], pc[,2], col=c('blue','red')[p$clustering], pch=c(16,17,18)[map$COUNTRY])
legend('topleft',legend=c('USA','Malawi','Venezuela'),pch=c(17,16,18))
# Plot the cluster labels at the centroids
cluster.centroids.x <- sapply(split(pc[,1],p$clustering),mean)
cluster.centroids.y <- sapply(split(pc[,2],p$clustering),mean)
text(cluster.centroids.x, cluster.centroids.y, c('Cluster 1', 'Cluster 2'),col=c('blue','red'))
```
4 changes: 4 additions & 0 deletions doc/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,10 @@ <h3>Ordination (Guerrero Negro)</h3>
<h3>Constrained ordination (Guerrero Negro)</h3>
<p><a href="../data/guerreronegro/constrained-ordination.html">HTML</a> <a href="../data/guerreronegro/constrained-ordination.pdf">PDF</a></p>
</div>
<div id="evaluating-clusters-with-prediction-strength-global-gut" class="section level3">
<h3>Evaluating clusters with prediction strength (Global Gut)</h3>
<p><a href="../data/globalgut-66-adults/prediction.strength.html">HTML</a> <a href="../data/globalgut-66-adults/prediction.strength.pdf">PDF</a></p>
</div>
<div id="sourcetracker" class="section level3">
<h3>SourceTracker</h3>
<p><a href="sourcetracker.html">HTML</a> <a href="sourcetracker.pdf">PDF</a></p>
Expand Down
3 changes: 3 additions & 0 deletions doc/index.rmd
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,9 @@ The Github repo for this code is https://github.com/danknights/mice8992-2016
###Constrained ordination (Guerrero Negro)
[HTML](../data/guerreronegro/constrained-ordination.html) [PDF](../data/guerreronegro/constrained-ordination.pdf)

### Evaluating clusters with prediction strength (Global Gut)
[HTML](../data/globalgut-66-adults/prediction.strength.html) [PDF](../data/globalgut-66-adults/prediction.strength.pdf)

###SourceTracker
[HTML](sourcetracker.html) [PDF](sourcetracker.pdf)

Expand Down
File renamed without changes.
1 change: 1 addition & 0 deletions src/prediction.strength.r
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
# Returns the average for M repeats
# Returns vector of prediction strengths for all values of K
"prediction.strength" <- function(dmat, K=2:10, M=10){
require('cluster')
dmat <- as.matrix(dmat)
N <- nrow(dmat)

Expand Down

0 comments on commit f39119a

Please sign in to comment.