-
Notifications
You must be signed in to change notification settings - Fork 7
DE Visualization
#3-iii. CummeRbund R Analysis
The output of cuffdiff can be directly loaded into the R/BioConductor package to produce a sophisticated set of analysis results and visualizations.
Navigate to the correct directory and then launch R:
cd $RNA_HOME/de/tophat_cufflinks/ref_only/
R
A separate R tutorial file has been provided in the github repo for part 2 of the tutorial: Tutorial_Module4_Part2_CummeRbund.R. Run the R commands detailed in the R script. All results are directed to pdf file(s). The output pdf files can be viewed in your browser at the following urls. Note, you must replace YOUR_IP_ADDRESS with your own amazon instance IP (e.g., 101.0.1.101)).
- http://YOUR_IP_ADDRESS/workspace/rnaseq/de/tophat_cufflinks/ref_only/Tutorial_Part2_cummeRbund_output.pdf
- http://YOUR_IP_ADDRESS/workspace/rnaseq/de/tophat_cufflinks/ref_only/Tutorial_Part2_cummeRbund_output_extras.pdf
##SUPPLEMENTARY R ANALYSIS
Occasionally you may wish to reformat and work with cuffdiff output in R manually. Therefore we provide an optional/advanced tutorial on how to format your results for R and perform "old school" (non-cummeRbund analysis) on your data.
In this tutorial you will:
- Learn basic R usage and commands (common plots, and data manipulation tasks)
- Examine the expression estimates
- Create an MDS plot to visualize the differences between/among replicates, library prep methods and UHR versus HBR
- Examine the differential expression estimates
- Visualize the expression estimates and highlight those genes that appear to be differentially expressed
- Generate a list of the top differentially expressed genes
- Ask how reproducible technical replicates are.
Expression and differential expression files will be read into R. The R analysis will make use of the transcript-level expression and differential expression files from cuffdiff. Navigate to the correct directory and then launch R:
cd $RNA_HOME/de/tophat_cufflinks/ref_only/
R
A separate R file has been provided in the github repo for part 3 of the tutorial: Tutorial_Module4_Part3_Supplementary_R.R. Run the R commands detailed in the R script above.
The output file can be viewed in your browser at the following url. Note, you must replace YOUR_IP_ADDRESS with your own amazon instance IP (e.g., 101.0.1.101)).
- http://YOUR_IP_ADDRESS/workspace/rnaseq/de/tophat_cufflinks/ref_only/Tutorial_Part3_Supplementary_R_output.pdf
##ERCC DE Analysis This section will demonstrate the DE between the ERCC spike-in:
cd $RNA_HOME/de/tophat_cufflinks/ref_only
wget https://raw.githubusercontent.com/griffithlab/rnaseq_tutorial/master/scripts/Tutorial_Module4_ERCC_DE.R
chmod +x Tutorial_Module4_ERCC_DE.R
./Tutorial_Module4_ERCC_DE.R $RNA_HOME/expression/tophat_counts/ERCC_Controls_Analysis.txt $RNA_HOME/de/tophat_cufflinks/ref_only/gene_exp.diff
View the results here:
- http://YOUR_IP_ADDRESS/workspace/rnaseq/de/tophat_cufflinks/ref_only/Tutorial_Module4_ERCC_DE.pdf
##edgeR Analysis
In this tutorial you will:
- Make use of the raw counts you generate above using htseq-count
- edgeR is a bioconductor package designed specifically for differential expression of count-based RNA-seq data
- This is an alternative to using cufflinks/cuffmerge/cuffdiff to find differentially expressed genes
First, create a mapping file to go from ENSG IDs (which htseq-count output) to Symbols:
cd $RNA_HOME/refs/hg19/genes
perl -ne 'if ($_=~/gene_id\s\"(ENSG\S+)\"\;\sgene_name\s\"(\S+)\"\;/){print "$1\t$2\n";} elsif ($_=~/gene_id\s\"(ERCC\S+)\"/){print "$1\t$1\n";}' genes_chr22_ERCC92.gtf | sort | uniq > ENSG_ID2Name.txt
Then, create a directory for results and launch R:
cd $RNA_HOME/
mkdir -p de/tophat_counts
cd de/tophat_counts
R
A separate R tutorial file has been provided in the github repo for part 4 of the tutorial: Tutorial_Module4_Part4_edgeR.R. Run the R commands in this file.
Once you have run the edgeR tutorial, compare the sigDE genes to those saved earlier from cuffdiff:
cat $RNA_HOME/de/tophat_cufflinks/ref_only/DE_genes.txt
cat $RNA_HOME/de/tophat_counts/DE_genes.txt
Pull out the gene symbols
cd $RNA_HOME/de/
cut -f 1 $RNA_HOME/de/tophat_cufflinks/ref_only/DE_genes.txt > tophat_cufflinks_cuffdiff_DE_gene_symbols.txt
cut -f 2 $RNA_HOME/de/tophat_counts/DE_genes.txt > tophat_counts_edgeR_DE_gene_symbols.txt
Visualize overlap with a venn diagram. This can be done with simple web tools like:
| Previous Section | This Section | Next Section | |:---------------------------------------------------:|:------------------------------------:|:-------------------------------------------------------------------:| | Differential Expression | DE Visualization | Ref Guided |
##Note: The current version of this tutorial is now at www.rnaseq.wiki
Table of Contents
Module 0: Authors | Citation | Syntax | Intro to AWS | Log into AWS | Unix | Environment | Resources
Module 1: Installation | Reference Genomes | Annotations | Indexing | Data | Data QC
Module 2: Adapter Trim | Alignment | IGV | Alignment Visualization | Alignment QC
Module 3: Expression | Differential Expression | DE Visualization
Module 4: Ref Guided | De novo | Merging | Differential Splicing | Splicing Visualization
Module 5: Kallisto
Appendix: Abbreviations | Lectures | Practical Exercise Solutions | Integrated Assignment | Proposed Improvements | AWS Setup