Install the package with:
library(devtools)
install_github("ZhenWei10/golite")
The necessary input of goea should be the gene set gene ID, background gene ID, and the orgDb object.
library(golite)
library(magrittr)
library(org.Hs.eg.db)
Use randomly sampled gene IDs for gene set and background.
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
all_eids_hg19 <- names(genes(txdb))
set.seed(1)
eids_bg <- sample(all_eids_hg19, 3500)
eids_set <- sample(eids_bg,300)
Run GO enrichment analysis given gene set and background.
goea(gene_set = eids_set,
back_ground = eids_bg,
orgDb = org.Hs.eg.db,
interpret_term = T) %>% head(.,10) %>% knitr::kable(.,"markdown")
## Loading required package: GO.db
term | definition | freq_gs | freq_bg | p | adj_BH | OR |
---|---|---|---|---|---|---|
GO:0009967 | positive regulation of signal transduction | 29 | 191 | 0.0012078 | 0.5996853 | 1.77 |
GO:0010647 | positive regulation of cell communication | 32 | 225 | 0.0020695 | 0.5996853 | 1.66 |
GO:0023056 | positive regulation of signaling | 32 | 225 | 0.0020695 | 0.5996853 | 1.66 |
GO:0006672 | ceramide metabolic process | 3 | 4 | 0.0023288 | 0.5996853 | 8.75 |
GO:2001235 | positive regulation of apoptotic signaling pathway | 7 | 24 | 0.0030061 | 0.5996853 | 3.40 |
GO:0045859 | regulation of protein kinase activity | 17 | 101 | 0.0045006 | 0.5996853 | 1.96 |
GO:0071900 | regulation of protein serine/threonine kinase activity | 12 | 61 | 0.0045852 | 0.5996853 | 2.30 |
GO:0043408 | regulation of MAPK cascade | 15 | 85 | 0.0047575 | 0.5996853 | 2.06 |
GO:1902531 | regulation of intracellular signal transduction | 34 | 256 | 0.0047870 | 0.5996853 | 1.55 |
GO:0060538 | skeletal muscle organ development | 6 | 20 | 0.0051416 | 0.5996853 | 3.50 |
The function can be vectorized, i.e. the input can be a list
of multiple gene sets.
eids_sets <- lapply(1:10,function(x) sample(eids_bg,300))
goea(gene_set = eids_sets,
back_ground = eids_bg,
orgDb = org.Hs.eg.db) %>% summary
## Length Class Mode
## [1,] 6 data.frame list
## [2,] 6 data.frame list
## [3,] 6 data.frame list
## [4,] 6 data.frame list
## [5,] 6 data.frame list
## [6,] 6 data.frame list
## [7,] 6 data.frame list
## [8,] 6 data.frame list
## [9,] 6 data.frame list
## [10,] 6 data.frame list
GO slim is a subset of GO terms that can be defined at here.
goea(gene_set = eids_set,
back_ground = eids_bg,
orgDb = org.Hs.eg.db,
interpret_term = T,
GO_Slim = T) %>% head(.,10) %>% knitr::kable(.,"markdown")
term | definition | freq_gs | freq_bg | p | adj_BH | OR |
---|---|---|---|---|---|---|
GO:0006629 | lipid metabolic process | 25 | 187 | 0.0141012 | 0.5162342 | 1.56 |
GO:0008219 | cell death | 39 | 327 | 0.0156435 | 0.5162342 | 1.39 |
GO:0048856 | anatomical structure development | 83 | 848 | 0.0683518 | 0.9268498 | 1.14 |
GO:0040007 | growth | 17 | 140 | 0.0852517 | 0.9268498 | 1.42 |
GO:0007165 | signal transduction | 77 | 795 | 0.0988005 | 0.9268498 | 1.13 |
GO:0021700 | developmental maturation | 6 | 40 | 0.1221631 | 0.9268498 | 1.75 |
GO:0006790 | sulfur compound metabolic process | 7 | 51 | 0.1415087 | 0.9268498 | 1.60 |
GO:0043473 | pigmentation | 2 | 8 | 0.1453356 | 0.9268498 | 2.92 |
GO:0071554 | cell wall organization or biogenesis | 1 | 2 | 0.1641360 | 0.9268498 | 5.83 |
GO:0007049 | cell cycle | 26 | 250 | 0.1651823 | 0.9268498 | 1.21 |
you could set EASE_score = TRUE
to get a more conservative p value.
For more information of EASE, please see here.
goea(gene_set = eids_sets,
back_ground = eids_bg,
orgDb = org.Hs.eg.db,
GO_Slim = T,
EASE_Score = F) %>% lapply(.,function(x)x$p) %>% unlist %>% hist(main = "normal hypergeometric")
goea(gene_set = eids_sets,
back_ground = eids_bg,
orgDb = org.Hs.eg.db,
GO_Slim = T,
EASE_Score = T) %>% lapply(.,function(x)x$p) %>% unlist %>% hist(main = "EASE score")
with any questions, please contact zhen.wei@xjtlu.edu.cn.
sessionInfo()
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
##
## locale:
## [1] zh_CN.UTF-8/zh_CN.UTF-8/zh_CN.UTF-8/C/zh_CN.UTF-8/zh_CN.UTF-8
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] GO.db_3.5.0
## [2] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
## [3] GenomicFeatures_1.30.3
## [4] GenomicRanges_1.30.3
## [5] GenomeInfoDb_1.14.0
## [6] org.Hs.eg.db_3.5.0
## [7] AnnotationDbi_1.40.0
## [8] IRanges_2.12.0
## [9] S4Vectors_0.16.0
## [10] Biobase_2.38.0
## [11] BiocGenerics_0.24.0
## [12] magrittr_1.5
## [13] golite_1.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.16 highr_0.6
## [3] compiler_3.4.2 XVector_0.18.0
## [5] prettyunits_1.0.2 bitops_1.0-6
## [7] tools_3.4.2 zlibbioc_1.24.0
## [9] progress_1.1.2 biomaRt_2.34.2
## [11] digest_0.6.15 bit_1.1-12
## [13] lattice_0.20-35 RSQLite_2.0
## [15] evaluate_0.10.1 memoise_1.1.0
## [17] pkgconfig_2.0.1 Matrix_1.2-12
## [19] DelayedArray_0.4.1 DBI_0.8
## [21] yaml_2.1.18 GenomeInfoDbData_1.0.0
## [23] rtracklayer_1.38.3 httr_1.3.1
## [25] stringr_1.3.0 knitr_1.20
## [27] Biostrings_2.46.0 grid_3.4.2
## [29] rprojroot_1.3-2 bit64_0.9-7
## [31] R6_2.2.2 BiocParallel_1.12.0
## [33] XML_3.98-1.10 RMySQL_0.10.14
## [35] rmarkdown_1.9 blob_1.1.1
## [37] matrixStats_0.53.1 GenomicAlignments_1.14.2
## [39] Rsamtools_1.30.0 backports_1.1.2
## [41] htmltools_0.3.6 SummarizedExperiment_1.8.1
## [43] assertthat_0.2.0 stringi_1.1.7
## [45] RCurl_1.95-4.10