added prediction strength

danknights · Apr 16, 2016 · f39119a · f39119a
1 parent 45d1241
commit f39119a
Show file tree

Hide file tree

Showing 7 changed files with 278 additions and 0 deletions.
diff --git a/data/globalgut-66-adults/prediction.strength.html b/data/globalgut-66-adults/prediction.strength.html
diff --git a/data/globalgut-66-adults/prediction.strength.pdf b/data/globalgut-66-adults/prediction.strength.pdf
diff --git a/data/globalgut-66-adults/prediction.strength.rmd b/data/globalgut-66-adults/prediction.strength.rmd
@@ -0,0 +1,105 @@
+---
+title: "Prediction strength"
+output:
+  html_document:
+    theme: united
+    fig_width: 6
+    fig_height: 6
+  pdf_document:
+    fig_width: 6
+    fig_height: 6
+---
+Back to [Table of Contents](../../doc/index.html)  
+
+**All of the code in this page is meant to be run in ```R``` unless otherwise specified.**
+
+## Loading a genus table and the metadata into R
+Before loading data into R, this QIIME command must be run on the command line to collapse OTU counts into genus (L6) and phylum (L2) count tables:
+```{r eval=FALSE, engine='bash'}
+# (run on command line)
+summarize_taxa.py -i otu_table.biom -L 6 
+
+# convert to JSON BIOM format to load into R using R biom package:
+biom convert -i otu_table_L6.biom -o otu_table_L6_json.biom --to-json
+```
+
+
+Inside `R`, Install biom, vegan, and cluster packages **if not installed**.
+```{r eval=FALSE}
+install.packages(c('biom','vegan','cluster'),repo='http://cran.wustl.edu')
+```
+
+Load packages
+```{r, eval=TRUE, echo=FALSE}
+library('biom')
+library('vegan')
+library('cluster')
+```
+
+Load packages
+```{r, eval=FALSE, echo=TRUE}
+library('biom')
+library('vegan')
+library('cluster')
+```
+
+Load data
+```{r, eval=TRUE}
+# load biom file
+genus.biom <- read_biom('otu_table_L6_json.biom')
+
+# Extract data matrix (genus counts) from biom table
+genus <- as.matrix(biom_data(genus.biom))
+
+# transpose so that rows are samples and columns are genera
+genus <- t(genus)
+
+# load mapping file
+map <- read.table('map.txt', sep='\t', comment='', head=T, row.names=1)
+```
+
+
+It is extremely important to ensure that your genus table and metadata table sample IDs are lined up correctly.
+```{r, eval=TRUE}
+# find the overlapping samples
+common.ids <- intersect(rownames(map), rownames(genus))
+
+# get just the overlapping samples
+genus <- genus[common.ids,]
+map <- map[common.ids,]
+```
+
+
+## Calculate prediction strength
+
+```{r, eval=TRUE}
+# Source the prediction strength code.
+source('../../src/prediction.strength.r')
+
+# Calculate Bray-Curtis distances
+bc <- vegdist(genus)
+
+# Run prediction strength analysis on bray-curtis table
+ps <- prediction.strength(bc)
+
+# plot prediction strength
+plot(2:10,ps,type='b',ylim=c(0,1),xlab='Number of clusters', ylab='Prediction strength')
+```
+
+Make a PCoA plot colored by cluster, but showing COUNTRY by shape of the points. Note that USA is clustered almost perfectly.
+```{r, eval=TRUE}
+# Run partitioning around medoids (PAM) clustering with 2 clusters
+p <- pam(bc,2)
+
+# PCoA coordinates of Bray-Curtis distances
+pc <- cmdscale(bc,2)
+
+# plot PCoA colored by cluster, with countries shown by shape.
+plot(pc[,1], pc[,2], col=c('blue','red')[p$clustering], pch=c(16,17,18)[map$COUNTRY])
+legend('topleft',legend=c('USA','Malawi','Venezuela'),pch=c(17,16,18))
+
+# Plot the cluster labels at the centroids
+cluster.centroids.x <- sapply(split(pc[,1],p$clustering),mean)
+cluster.centroids.y <- sapply(split(pc[,2],p$clustering),mean)
+text(cluster.centroids.x, cluster.centroids.y, c('Cluster 1', 'Cluster 2'),col=c('blue','red'))
+```
diff --git a/doc/index.html b/doc/index.html
@@ -105,6 +105,10 @@ <h3>Ordination (Guerrero Negro)</h3>
 <h3>Constrained ordination (Guerrero Negro)</h3>
 <p><a href="../data/guerreronegro/constrained-ordination.html">HTML</a> <a href="../data/guerreronegro/constrained-ordination.pdf">PDF</a></p>
 </div>
+<div id="evaluating-clusters-with-prediction-strength-global-gut" class="section level3">
+<h3>Evaluating clusters with prediction strength (Global Gut)</h3>
+<p><a href="../data/globalgut-66-adults/prediction.strength.html">HTML</a> <a href="../data/globalgut-66-adults/prediction.strength.pdf">PDF</a></p>
+</div>
 <div id="sourcetracker" class="section level3">
 <h3>SourceTracker</h3>
 <p><a href="sourcetracker.html">HTML</a> <a href="sourcetracker.pdf">PDF</a></p>

diff --git a/doc/index.rmd b/doc/index.rmd
@@ -40,6 +40,9 @@ The Github repo for this code is https://github.com/danknights/mice8992-2016
 ###Constrained ordination (Guerrero Negro)
 [HTML](../data/guerreronegro/constrained-ordination.html)   [PDF](../data/guerreronegro/constrained-ordination.pdf)  
 
+### Evaluating clusters with prediction strength  (Global Gut)
+[HTML](../data/globalgut-66-adults/prediction.strength.html)   [PDF](../data/globalgut-66-adults/prediction.strength.pdf)  
+
 ###SourceTracker
 [HTML](sourcetracker.html)   [PDF](sourcetracker.pdf)  
 

diff --git a/doc/make.r → src/make.r b/doc/make.r → src/make.r
diff --git a/src/prediction.strength.r b/src/prediction.strength.r
@@ -8,6 +8,7 @@
 # Returns the average for M repeats
 # Returns vector of prediction strengths for all values of K
 "prediction.strength" <- function(dmat, K=2:10, M=10){
+    require('cluster')
     dmat <- as.matrix(dmat)
     N <- nrow(dmat)