Install the package with:
The necessary input of goea should be the gene set gene ID, background gene ID, and the orgDb object.
Use randomly sampled gene IDs for gene set and background.
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
all_eids_hg19 <- names(genes(txdb))
eids_bg <- sample(all_eids_hg19, 3500)
eids_set <- sample(eids_bg,300)
Run GO enrichment analysis given gene set and background.
goea(gene_set = eids_set,
back_ground = eids_bg,
orgDb =,
interpret_term = T) %>% head(.,10) %>% knitr::kable(.,"markdown")
## Loading required package: GO.db
term | definition | freq_gs | freq_bg | p | adj_BH | OR |
GO:0009967 | positive regulation of signal transduction | 29 | 191 | 0.0012078 | 0.5996853 | 1.77 |
GO:0010647 | positive regulation of cell communication | 32 | 225 | 0.0020695 | 0.5996853 | 1.66 |
GO:0023056 | positive regulation of signaling | 32 | 225 | 0.0020695 | 0.5996853 | 1.66 |
GO:0006672 | ceramide metabolic process | 3 | 4 | 0.0023288 | 0.5996853 | 8.75 |
GO:2001235 | positive regulation of apoptotic signaling pathway | 7 | 24 | 0.0030061 | 0.5996853 | 3.40 |
GO:0045859 | regulation of protein kinase activity | 17 | 101 | 0.0045006 | 0.5996853 | 1.96 |
GO:0071900 | regulation of protein serine/threonine kinase activity | 12 | 61 | 0.0045852 | 0.5996853 | 2.30 |
GO:0043408 | regulation of MAPK cascade | 15 | 85 | 0.0047575 | 0.5996853 | 2.06 |
GO:1902531 | regulation of intracellular signal transduction | 34 | 256 | 0.0047870 | 0.5996853 | 1.55 |
GO:0060538 | skeletal muscle organ development | 6 | 20 | 0.0051416 | 0.5996853 | 3.50 |
The function can be vectorized, i.e. the input can be a list
of multiple gene sets.
eids_sets <- lapply(1:10,function(x) sample(eids_bg,300))
goea(gene_set = eids_sets,
back_ground = eids_bg,
orgDb = %>% summary
## Length Class Mode
## [1,] 6 data.frame list
## [2,] 6 data.frame list
## [3,] 6 data.frame list
## [4,] 6 data.frame list
## [5,] 6 data.frame list
## [6,] 6 data.frame list
## [7,] 6 data.frame list
## [8,] 6 data.frame list
## [9,] 6 data.frame list
## [10,] 6 data.frame list
GO slim is a subset of GO terms that can be defined at here.
goea(gene_set = eids_set,
back_ground = eids_bg,
orgDb =,
interpret_term = T,
GO_Slim = T) %>% head(.,10) %>% knitr::kable(.,"markdown")
term | definition | freq_gs | freq_bg | p | adj_BH | OR |
GO:0006629 | lipid metabolic process | 25 | 187 | 0.0141012 | 0.5162342 | 1.56 |
GO:0008219 | cell death | 39 | 327 | 0.0156435 | 0.5162342 | 1.39 |
GO:0048856 | anatomical structure development | 83 | 848 | 0.0683518 | 0.9268498 | 1.14 |
GO:0040007 | growth | 17 | 140 | 0.0852517 | 0.9268498 | 1.42 |
GO:0007165 | signal transduction | 77 | 795 | 0.0988005 | 0.9268498 | 1.13 |
GO:0021700 | developmental maturation | 6 | 40 | 0.1221631 | 0.9268498 | 1.75 |
GO:0006790 | sulfur compound metabolic process | 7 | 51 | 0.1415087 | 0.9268498 | 1.60 |
GO:0043473 | pigmentation | 2 | 8 | 0.1453356 | 0.9268498 | 2.92 |
GO:0071554 | cell wall organization or biogenesis | 1 | 2 | 0.1641360 | 0.9268498 | 5.83 |
GO:0007049 | cell cycle | 26 | 250 | 0.1651823 | 0.9268498 | 1.21 |
you could set EASE_score = TRUE
to get a more conservative p value.
For more information of EASE, please see here.
goea(gene_set = eids_sets,
back_ground = eids_bg,
orgDb =,
GO_Slim = T,
EASE_Score = F) %>% lapply(.,function(x)x$p) %>% unlist %>% hist(main = "normal hypergeometric")
goea(gene_set = eids_sets,
back_ground = eids_bg,
orgDb =,
GO_Slim = T,
EASE_Score = T) %>% lapply(.,function(x)x$p) %>% unlist %>% hist(main = "EASE score")
with any questions, please contact
