Skip to content

A vectorized solution for GO / GOslim enrichment analysis

Notifications You must be signed in to change notification settings

ZW-xjtlu/golite

Repository files navigation

golite

Installation

Install the package with:

library(devtools)
install_github("ZhenWei10/golite")

Run GO enrichment analysis

The necessary input of goea should be the gene set gene ID, background gene ID, and the orgDb object.

library(golite)
library(magrittr)
library(org.Hs.eg.db)

Use randomly sampled gene IDs for gene set and background.

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
all_eids_hg19 <- names(genes(txdb))

set.seed(1)
eids_bg <- sample(all_eids_hg19, 3500)
eids_set <- sample(eids_bg,300)

Run GO enrichment analysis given gene set and background.

goea(gene_set = eids_set,
     back_ground = eids_bg,
     orgDb = org.Hs.eg.db,
     interpret_term = T) %>% head(.,10) %>% knitr::kable(.,"markdown")
## Loading required package: GO.db
term definition freq_gs freq_bg p adj_BH OR
GO:0009967 positive regulation of signal transduction 29 191 0.0012078 0.5996853 1.77
GO:0010647 positive regulation of cell communication 32 225 0.0020695 0.5996853 1.66
GO:0023056 positive regulation of signaling 32 225 0.0020695 0.5996853 1.66
GO:0006672 ceramide metabolic process 3 4 0.0023288 0.5996853 8.75
GO:2001235 positive regulation of apoptotic signaling pathway 7 24 0.0030061 0.5996853 3.40
GO:0045859 regulation of protein kinase activity 17 101 0.0045006 0.5996853 1.96
GO:0071900 regulation of protein serine/threonine kinase activity 12 61 0.0045852 0.5996853 2.30
GO:0043408 regulation of MAPK cascade 15 85 0.0047575 0.5996853 2.06
GO:1902531 regulation of intracellular signal transduction 34 256 0.0047870 0.5996853 1.55
GO:0060538 skeletal muscle organ development 6 20 0.0051416 0.5996853 3.50

The function can be vectorized, i.e. the input can be a list of multiple gene sets.

eids_sets <- lapply(1:10,function(x) sample(eids_bg,300)) 

goea(gene_set = eids_sets,
     back_ground = eids_bg,
     orgDb = org.Hs.eg.db) %>% summary
##       Length Class      Mode
##  [1,] 6      data.frame list
##  [2,] 6      data.frame list
##  [3,] 6      data.frame list
##  [4,] 6      data.frame list
##  [5,] 6      data.frame list
##  [6,] 6      data.frame list
##  [7,] 6      data.frame list
##  [8,] 6      data.frame list
##  [9,] 6      data.frame list
## [10,] 6      data.frame list

Run GO slim enrichment analysis

GO slim is a subset of GO terms that can be defined at here.

goea(gene_set = eids_set,
     back_ground = eids_bg,
     orgDb = org.Hs.eg.db,
     interpret_term = T,
     GO_Slim = T)  %>% head(.,10) %>% knitr::kable(.,"markdown")
term definition freq_gs freq_bg p adj_BH OR
GO:0006629 lipid metabolic process 25 187 0.0141012 0.5162342 1.56
GO:0008219 cell death 39 327 0.0156435 0.5162342 1.39
GO:0048856 anatomical structure development 83 848 0.0683518 0.9268498 1.14
GO:0040007 growth 17 140 0.0852517 0.9268498 1.42
GO:0007165 signal transduction 77 795 0.0988005 0.9268498 1.13
GO:0021700 developmental maturation 6 40 0.1221631 0.9268498 1.75
GO:0006790 sulfur compound metabolic process 7 51 0.1415087 0.9268498 1.60
GO:0043473 pigmentation 2 8 0.1453356 0.9268498 2.92
GO:0071554 cell wall organization or biogenesis 1 2 0.1641360 0.9268498 5.83
GO:0007049 cell cycle 26 250 0.1651823 0.9268498 1.21

you could set EASE_score = TRUE to get a more conservative p value.

For more information of EASE, please see here.

goea(gene_set = eids_sets,
     back_ground = eids_bg, 
     orgDb = org.Hs.eg.db,
     GO_Slim = T,
     EASE_Score = F) %>% lapply(.,function(x)x$p) %>% unlist %>% hist(main = "normal hypergeometric")

goea(gene_set = eids_sets,
         back_ground = eids_bg,
     orgDb = org.Hs.eg.db,
     GO_Slim = T,
     EASE_Score = T) %>% lapply(.,function(x)x$p) %>% unlist %>% hist(main = "EASE score")

with any questions, please contact zhen.wei@xjtlu.edu.cn.

sessionInfo()
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] zh_CN.UTF-8/zh_CN.UTF-8/zh_CN.UTF-8/C/zh_CN.UTF-8/zh_CN.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] GO.db_3.5.0                            
##  [2] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
##  [3] GenomicFeatures_1.30.3                 
##  [4] GenomicRanges_1.30.3                   
##  [5] GenomeInfoDb_1.14.0                    
##  [6] org.Hs.eg.db_3.5.0                     
##  [7] AnnotationDbi_1.40.0                   
##  [8] IRanges_2.12.0                         
##  [9] S4Vectors_0.16.0                       
## [10] Biobase_2.38.0                         
## [11] BiocGenerics_0.24.0                    
## [12] magrittr_1.5                           
## [13] golite_1.0                             
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.16               highr_0.6                 
##  [3] compiler_3.4.2             XVector_0.18.0            
##  [5] prettyunits_1.0.2          bitops_1.0-6              
##  [7] tools_3.4.2                zlibbioc_1.24.0           
##  [9] progress_1.1.2             biomaRt_2.34.2            
## [11] digest_0.6.15              bit_1.1-12                
## [13] lattice_0.20-35            RSQLite_2.0               
## [15] evaluate_0.10.1            memoise_1.1.0             
## [17] pkgconfig_2.0.1            Matrix_1.2-12             
## [19] DelayedArray_0.4.1         DBI_0.8                   
## [21] yaml_2.1.18                GenomeInfoDbData_1.0.0    
## [23] rtracklayer_1.38.3         httr_1.3.1                
## [25] stringr_1.3.0              knitr_1.20                
## [27] Biostrings_2.46.0          grid_3.4.2                
## [29] rprojroot_1.3-2            bit64_0.9-7               
## [31] R6_2.2.2                   BiocParallel_1.12.0       
## [33] XML_3.98-1.10              RMySQL_0.10.14            
## [35] rmarkdown_1.9              blob_1.1.1                
## [37] matrixStats_0.53.1         GenomicAlignments_1.14.2  
## [39] Rsamtools_1.30.0           backports_1.1.2           
## [41] htmltools_0.3.6            SummarizedExperiment_1.8.1
## [43] assertthat_0.2.0           stringi_1.1.7             
## [45] RCurl_1.95-4.10

About

A vectorized solution for GO / GOslim enrichment analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages