This is an R wrapper for Pavlidis Lab’s ermineJ. A tool for gene set enrichment analysis with multifunctionality correction.
ermineR requries 64 bit version of java to function. If you are a Mac user make sure you have the java SDK.
After java is installed you can install ermineR by doing
devtools::install_github('PavlidisLab/ermineR')
If ermineR cannot find your java home by itself. Either install rJava or
use Sys.setenv(JAVA_HOME=javaHome)
to point ermineR to the right path.
Some users report that the ermineJ executable loses its exection privilage after installation. If this happens you will get an error like
"Error in (function (annotation = NULL, aspects = c("Molecular Function", :
Something went wrong. Blame the dev
sh: [library installation path]/ermineR/ermineJ-3.1.2/bin/ermineJ.sh: Permission denied "
To fix this just do
chmod +x [library installation path]/ermineR/ermineJ-3.1.2/bin/ermineJ.sh
You may need sudo
depending on where you install your packages
See documentation for ora
, roc
, gsr
, precRecall
and corr
to
see how to use them.
An explanation of what each method does is given. We recommend users
start with the precRecall
(for gene ranking-based enrichment analysis)
or ora
(for hit-list over-representation analysis).
GO terms are updated frequently so results can differ between versions. The default option of all ermineR functions is to get the latest GO version however this means you may get different results when you repeat the experiment later. If you want to use a specific version of GO, ermineR provides functions to deal with that.
goToday
: Downloads the latest version of go to a path you providegetGoDates
: Lists all dates where a go version is available, from the most recent to oldestgoAtDate
: Given a valid date, downloads the Go version from a specific date to a file path you provide
To use a specific version of GO, make sure to set geneSetDescription
argument of all ermineR functions to the file path where you saved the
go terms
ErmineR requires annotation files to work. These files include gene
identifiers and their Go annotations, along with some optional
information. By default, ermineR supports annotation files generated by
Gemma. And will automatically
download them if you provide a valid annotation name. You can get a list
of valid annotation names using listGemmaAnnotations()
. As a general
rule, if your platform has an identifier in GEO, the identifier that
starts with “GPL” is used as the Gemma identifier as well. There are
also generic annotation files available that contain all genes from a
species. These are typically named something like “Generic_human”.
You can manually download these annotation files from
https://gemma.msl.ubc.ca/annots/ or by using the
gemma.R::get_platform_annotations
function. ErmineR typically uses
“noParents” versions of these files since parent terms are derived using
the ontology file acquired from GO.
Here we will use a mock scores file located in our tests directory. The score file is specifically created to be enriched in genes with the term GO:0051082.
scores = read.table("tests/testthat/testFiles/pValues", header=T, row.names = 1)
head(scores)
## pvalue
## 206190_at 0.3163401
## 208385_at 0.5186824
## 65086_at 0.6620389
## 202281_at 0.4068895
## 211622_s_at 0.9128846
## 219257_s_at 0.2936740
This scores file only includes scores for 118 genes. The file was generated using GPL96’s probesets so that is the annotation we’ll be using. Any gene that is not reperesented by the score file will be ignored.
gsrOut = gsr(annotation = 'GPL96',
scores = scores,
scoreColumn = 1,
iterations = 10000,
bigIsBetter = FALSE,
logTrans = TRUE)
## Attempting to download annotation file
head(gsrOut$results) %>% knitr::kable()
Name | ID | NumProbes | NumGenes | RawScore | Pval | CorrectedPvalue | MFPvalue | CorrectedMFPvalue | Multifunctionality | Same as | GeneMembers |
---|---|---|---|---|---|---|---|---|---|---|---|
protein folding | GO:0006457 | 40 | 24 | 3.198073 | 0.0000000 | 0.0000000 | 0.000000 | 0.0000000 | 0.145 | NA | AIP|CALR|CCT5|CCT6A|CCT8L2|CDC37L1|CLGN|CLPX|DNAJB1|DNAJB4|GAK|HSP90AA1|HSPA1A|HSPA9|HSPB6|HSPD1|NUDC|PFDN6|PTGES3|RUVBL2|ST13|TCP1|TOR1A|UGGT1| |
unfolded protein binding | GO:0051082 | 52 | 29 | 3.299625 | 0.0000000 | 0.0000000 | 0.000000 | 0.0000000 | 0.174 | NA | AIP|CALR|CCT5|CCT6A|CCT8L2|CDC37L1|CHAF1A|CLGN|CLPX|DNAJB1|DNAJB4|HSP90AA1|HSPA1A|HSPA9|HSPB6|HSPD1|HTRA2|NUDC|PFDN6|PTGES3|RUVBL2|SHQ1|SRSF10|TAPBP|TCP1|TOMM20|TOR1A|TUBB4B|UGGT1| |
cytosol | GO:0005829 | 74 | 41 | 2.118055 | 0.0005727 | 0.0150806 | 0.000400 | 0.0105333 | 0.493 | NA | AAMP|AIP|ARC|BHMT2|CALR|CCT5|CCT6A|CDC37L1|CLPX|CRABP1|DNAJB1|DNAJB4|EPHB2|FRS2|GAK|GEMIN2|HCK|HSP90AA1|HSPA1A|HSPB6|HSPD1|HTRA2|NELFA|NUDC|OGG1|PASK|PEX5|PIKFYVE|POLR3K|PRKCI|PTGES3|RUVBL2|SHQ1|SPHK1|SRSF10|ST13|TCP1|TOR1A|TUBB4B|UNC13B|USP33| |
cellular component organization | GO:0016043 | 67 | 34 | 2.146421 | 0.0024970 | 0.0493218 | 0.004267 | 0.0842792 | 0.829 | NA | ARC|CALR|CHAF1A|CLGN|DDX46|EPHB2|GAK|GEMIN2|HCK|HSP90AA1|HSPA1A|HSPA9|HSPD1|HTRA2|NUDC|PEX5|PFDN6|PIKFYVE|PRKCI|PTGES3|RUVBL2|SEMA3B|SHQ1|SRSF10|SULF1|TAPBP|TCP1|TOMM20|TOR1A|TUBB4B|UNC13B|USP33|VPS8|ZNF207| |
cellular component organization or biogenesis | GO:0071840 | 67 | 34 | 2.146421 | 0.0024970 | 0.0394574 | 0.004267 | 0.0674234 | 0.828 | NA | ARC|CALR|CHAF1A|CLGN|DDX46|EPHB2|GAK|GEMIN2|HCK|HSP90AA1|HSPA1A|HSPA9|HSPD1|HTRA2|NUDC|PEX5|PFDN6|PIKFYVE|PRKCI|PTGES3|RUVBL2|SEMA3B|SHQ1|SRSF10|SULF1|TAPBP|TCP1|TOMM20|TOR1A|TUBB4B|UNC13B|USP33|VPS8|ZNF207| |
cellular component assembly | GO:0022607 | 45 | 22 | 2.386422 | 0.0032000 | 0.0421333 | 0.004500 | 0.0592500 | 0.556 | NA | ARC|CALR|CHAF1A|CLGN|DDX46|EPHB2|GAK|GEMIN2|HSP90AA1|HSPA1A|HSPA9|HSPD1|PFDN6|PIKFYVE|PTGES3|RUVBL2|SHQ1|SRSF10|TAPBP|TCP1|UNC13B|ZNF207| |
We will use the same scores file from the example above
precRecallOut = precRecall(annotation = 'GPL96',
scores = scores,
scoreColumn = 1,
iterations = 10000,
bigIsBetter = FALSE,
logTrans = TRUE)
## Attempting to download annotation file
head(precRecallOut$results) %>% knitr::kable()
Name | ID | NumProbes | NumGenes | RawScore | Pval | CorrectedPvalue | MFPvalue | CorrectedMFPvalue | Multifunctionality | Same as | GeneMembers |
---|---|---|---|---|---|---|---|---|---|---|---|
binding | GO:0005488 | 143 | 81 | 0.9992282 | 0.0000 | 0.0000000 | 0.0000 | 0.0000000 | 0.479 | NA | AAMP|AIP|ARC|ARF3|BHMT2|C5AR2|CACNA1F|CALR|CCNG1|CCT5|CCT6A|CCT8L2|CDC37L1|CHAF1A|CLGN|CLPX|CPT1A|CRABP1|DDX46|DMBT1|DNAJB1|DNAJB4|DZIP3|EPHB2|FBLN2|FOXB1|FPR3|FRS2|GAK|GEMIN2|GPR17|HCK|HMGCR|HSP90AA1|HSPA1A|HSPA9|HSPB6|HSPD1|HTRA2|ITIH2|KCNJ1|LIPF|MAN1B1|NELFA|NR2E3|NUDC|OGG1|PASK|PEX5|PFDN6|PIKFYVE|PLCH1|POLR3K|PPARA|PRKCI|PRPSAP1|PTGES3|RUVBL2|SEMA3B|SHOX2|SHQ1|SLC22A14|SLC24A1|SPHK1|SRSF10|ST13|SULF1|TAPBP|TBKBP1|TCP1|TNFRSF12A|TOMM20|TOR1A|TUBB4B|UGGT1|UNC13B|USP33|VPS8|YIPF2|ZCCHC8|ZNF207| |
protein folding | GO:0006457 | 40 | 24 | 0.7176581 | 0.0000 | 0.0000000 | 0.0000 | 0.0000000 | 0.145 | NA | AIP|CALR|CCT5|CCT6A|CCT8L2|CDC37L1|CLGN|CLPX|DNAJB1|DNAJB4|GAK|HSP90AA1|HSPA1A|HSPA9|HSPB6|HSPD1|NUDC|PFDN6|PTGES3|RUVBL2|ST13|TCP1|TOR1A|UGGT1| |
unfolded protein binding | GO:0051082 | 52 | 29 | 0.8507590 | 0.0000 | 0.0000000 | 0.0000 | 0.0000000 | 0.174 | NA | AIP|CALR|CCT5|CCT6A|CCT8L2|CDC37L1|CHAF1A|CLGN|CLPX|DNAJB1|DNAJB4|HSP90AA1|HSPA1A|HSPA9|HSPB6|HSPD1|HTRA2|NUDC|PFDN6|PTGES3|RUVBL2|SHQ1|SRSF10|TAPBP|TCP1|TOMM20|TOR1A|TUBB4B|UGGT1| |
protein binding | GO:0005515 | 127 | 73 | 0.9569410 | 0.0002 | 0.0039500 | 0.0001 | 0.0019750 | 0.267 | NA | AAMP|AIP|ARC|ARF3|C5AR2|CALR|CCNG1|CCT5|CCT6A|CCT8L2|CDC37L1|CHAF1A|CLGN|CLPX|CPT1A|CRABP1|DDX46|DMBT1|DNAJB1|DNAJB4|DZIP3|EPHB2|FBLN2|FOXB1|FPR3|FRS2|GAK|GEMIN2|GPR17|HCK|HMGCR|HSP90AA1|HSPA1A|HSPA9|HSPB6|HSPD1|HTRA2|ITIH2|NELFA|NR2E3|NUDC|OGG1|PASK|PEX5|PFDN6|PIKFYVE|POLR3K|PPARA|PRKCI|PRPSAP1|PTGES3|RUVBL2|SEMA3B|SHQ1|SLC22A14|SLC24A1|SPHK1|SRSF10|ST13|TAPBP|TBKBP1|TCP1|TNFRSF12A|TOMM20|TOR1A|TUBB4B|UGGT1|UNC13B|USP33|VPS8|YIPF2|ZCCHC8|ZNF207| |
cytosol | GO:0005829 | 74 | 41 | 0.7108333 | 0.0013 | 0.0205400 | 0.0011 | 0.0144833 | 0.493 | NA | AAMP|AIP|ARC|BHMT2|CALR|CCT5|CCT6A|CDC37L1|CLPX|CRABP1|DNAJB1|DNAJB4|EPHB2|FRS2|GAK|GEMIN2|HCK|HSP90AA1|HSPA1A|HSPB6|HSPD1|HTRA2|NELFA|NUDC|OGG1|PASK|PEX5|PIKFYVE|POLR3K|PRKCI|PTGES3|RUVBL2|SHQ1|SPHK1|SRSF10|ST13|TCP1|TOR1A|TUBB4B|UNC13B|USP33| |
cellular anatomical entity | GO:0110165 | 142 | 80 | 0.9860952 | 0.0020 | 0.0263333 | 0.0011 | 0.0173800 | 0.366 | NA | AAMP|AIP|ARC|ARF3|BHMT2|C5AR2|CACNA1F|CALR|CCNG1|CCT5|CCT6A|CDC37L1|CHAF1A|CLGN|CLPX|CPT1A|CRABP1|DDX46|DMBT1|DNAJB1|DNAJB4|DZIP3|EPHB2|FBLN2|FOXB1|FPR3|FRS2|GAK|GEMIN2|GPR17|HCK|HMGCR|HSP90AA1|HSPA1A|HSPA9|HSPB6|HSPD1|HTRA2|ITIH2|KCNJ1|LIPF|MAN1B1|NELFA|NR2E3|NUDC|OGG1|PASK|PEX5|PFDN6|PIKFYVE|PLCH1|POLR3K|PPARA|PRKCI|PRPSAP1|PTGES3|RUVBL2|SEMA3B|SHOX2|SHQ1|SLC22A14|SLC24A1|SPHK1|SRSF10|ST13|SULF1|TAPBP|TBKBP1|TCP1|TNFRSF12A|TOMM20|TOR1A|TUBB4B|UGGT1|UNC13B|USP33|VPS8|YIPF2|ZCCHC8|ZNF207| |
library(dplyr)
# genes for GO:0051082
hitlist = c("AAMP", "AFG3L2", "AHSP", "AIP", "AIPL1", "APCS", "BBS12",
"CALR", "CALR3", "CANX", "CCDC115", "CCT2", "CCT3", "CCT4", "CCT5",
"CCT6A", "CCT6B", "CCT7", "CCT8", "CCT8L1P", "CCT8L2", "CDC37",
"CDC37L1", "CHAF1A", "CHAF1B", "CLGN", "CLN3", "CLPX", "CRYAA",
"CRYAB", "DNAJA1", "DNAJA2", "DNAJA3", "DNAJA4", "DNAJB1", "DNAJB11",
"DNAJB13", "DNAJB2", "DNAJB4", "DNAJB5", "DNAJB6", "DNAJB8",
"DNAJC4", "DZIP3", "ERLEC1", "ERO1B", "FYCO1", "GRPEL1", "GRPEL2",
"GRXCR2", "HEATR3", "HSP90AA1", "HSP90AA2P", "HSP90AA4P", "HSP90AA5P",
"HSP90AB1", "HSP90AB2P", "HSP90AB3P", "HSP90AB4P", "HSP90B1",
"HSP90B2P", "HSPA1A", "HSPA1B", "HSPA1L", "HSPA2", "HSPA5", "HSPA6",
"HSPA8", "HSPA9", "HSPB6", "HSPD1", "HSPE1", "HTRA2", "LMAN1",
"MDN1", "MKKS", "NAP1L4", "NDUFAF1", "NPM1", "NUDC", "NUDCD2",
"NUDCD3", "PDRG1", "PET100", "PFDN1", "PFDN2", "PFDN4", "PFDN5",
"PFDN6", "PIKFYVE", "PPIA", "PPIB", "PTGES3", "RP2", "RUVBL2",
"SCAP", "SCG5", "SERPINH1", "SHQ1", "SIL1", "SPG7", "SRSF10",
"SRSF12", "ST13", "SYVN1", "TAPBP", "TCP1", "TMEM67", "TOMM20",
"TOR1A", "TRAP1", "TTC1", "TUBB4B", "UGGT1", "ZFYVE21")
oraOut = ora(annotation = 'Generic_human',
hitlist = hitlist)
head(oraOut$results) %>% knitr::kable()
Name | ID | NumProbes | NumGenes | RawScore | Pval | CorrectedPvalue | MFPvalue | CorrectedMFPvalue | Multifunctionality | Same as | GeneMembers |
---|---|---|---|---|---|---|---|---|---|---|---|
unfolded protein binding | GO:0051082 | 116 | 116 | 99 | 0 | 0 | 0 | 0 | 0.726 | NA | AFG3L2|AHSP|AIP|AIPL1|APCS|CALR|CALR3|CANX|CCAR2|CCDC115|CCT2|CCT3|CCT4|CCT5|CCT6A|CCT6B|CCT7|CCT8|CCT8L2|CDC37|CDC37L1|CHAF1A|CHAF1B|CLGN|CLPX|CLU|CRYAA|CRYAB|DNAJA1|DNAJA2|DNAJA3|DNAJA4|DNAJB1|DNAJB11|DNAJB13|DNAJB2|DNAJB3|DNAJB4|DNAJB5|DNAJB6|DNAJB7|DNAJB8|DNAJC4|ERLEC1|ERN1|ERN2|ERO1B|GRPEL1|GRPEL2|HEATR3|HSP90AA1|HSP90AB1|HSP90AB4P|HSP90B1|HSP90B2P|HSPA1A|HSPA1B|HSPA1L|HSPA2|HSPA5|HSPA6|HSPA8|HSPA9|HSPB1|HSPB2|HSPB6|HSPD1|HSPE1|HTRA2|HYOU1|LMAN1|MKKS|NACA|NACA2|NACA4P|NACAD|NAP1L4|NDUFAF1|NPM1|NUDC|NUDCD2|NUDCD3|PDRG1|PET100|PFDN1|PFDN2|PFDN4|PFDN5|PFDN6|PPIA|PPIB|PTGES3|RP2|RUVBL2|SCAP|SCG5|SERPINH1|SHQ1|SIL1|SPG7|SRSF10|SRSF12|SSUH2|SYVN1|TAPBP|TCP1|TIMM10B|TMEM67|TOMM20|TOR1A|TRAP1|TTC1|TUBB4B|UGGT1|UGGT2|VBP1| |
protein folding chaperone | GO:0044183 | 60 | 60 | 37 | 0 | 0 | 0 | 0 | 0.638 | NA | ANP32E|APLF|CALR|CALR3|CCDC47|CCT2|CCT3|CCT4|CCT5|CCT6A|CCT6B|CCT7|CCT8|CCT8L2|CD74|CLGN|CLPX|DFFA|DNAJB1|DNAJB6|DNAJB7|DNAJB8|FKBP8|HSP90AA1|HSP90AB1|HSP90AB4P|HSP90B1|HSP90B2P|HSPA13|HSPA14|HSPA1A|HSPA1B|HSPA1L|HSPA2|HSPA4|HSPA4L|HSPA5|HSPA6|HSPA7|HSPA8|HSPA9|HSPB1|HSPB6|HSPD1|HSPE1|HSPH1|HYOU1|HYPK|KHSRP|LYRM7|PDCL3|PFDN1|PFDN2|RIC3|TCP1|TOR1A|TRAP1|WDR83OS|WIPF1|ZMYND10| |
chaperone-mediated protein folding | GO:0061077 | 74 | 74 | 35 | 0 | 0 | 0 | 0 | 0.588 | NA | BAG1|CCT2|CCT3|CCT4|CCT5|CCT6A|CCT7|CCT8|CD74|CHORDC1|CLU|CRTAP|CSNK2A1|DFFA|DNAJB1|DNAJB12|DNAJB13|DNAJB14|DNAJB2|DNAJB3|DNAJB4|DNAJB5|DNAJB6|DNAJB7|DNAJB8|DNAJC18|DNAJC24|DNAJC5|DNAJC7|ERO1A|FKBP11|FKBP2|FKBP4|FKBP5|GAK|HSPA13|HSPA14|HSPA1A|HSPA1B|HSPA1L|HSPA2|HSPA5|HSPA6|HSPA7|HSPA8|HSPA9|HSPB1|HSPB6|HSPE1|HSPH1|P3H1|PDCL3|PDIA4|PEX19|PFDN1|PFDN2|PFDN4|PFDN5|PFDN6|PPIB|PPID|PTGES3|SDF2|SDF2L1|ST13|TCP1|TOR1A|TOR1B|TOR2A|TRAP1|UMOD|UNC45A|UNC45B|VBP1| |
ATP-dependent protein folding chaperone | GO:0140662 | 34 | 34 | 27 | 0 | 0 | 0 | 0 | 0.662 | NA | CCT2|CCT3|CCT4|CCT5|CCT6A|CCT6B|CCT7|CCT8|CCT8L2|CLPX|HSP90AA1|HSP90AB1|HSP90AB4P|HSP90B1|HSP90B2P|HSPA13|HSPA14|HSPA1A|HSPA1B|HSPA1L|HSPA2|HSPA4|HSPA4L|HSPA5|HSPA6|HSPA7|HSPA8|HSPA9|HSPD1|HSPH1|HYOU1|TCP1|TOR1A|TRAP1| |
protein-folding chaperone binding | GO:0051087 | 133 | 133 | 27 | 0 | 0 | 0 | 0 | 0.928 | NA | AHSA1|AHSA2P|ALB|AMFR|ATP1A1|ATP1A2|ATP1A3|ATP7A|BAG1|BAG2|BAG3|BAG4|BAG5|BAG6|BAK1|BAX|BIN1|BIRC2|BIRC5|CALR|CDC25A|CDC37|CDC37L1|CDK1|CDKN1B|CFTR|CLU|CP|CTSC|CYP1A1|CYP2E1|DNAAF6|DNAJA1|DNAJA2|DNAJA3|DNAJA4|DNAJB1|DNAJB12|DNAJB13|DNAJB14|DNAJB2|DNAJB3|DNAJB4|DNAJB5|DNAJB6|DNAJB7|DNAJB8|DNAJB9|DNAJC1|DNAJC10|DNAJC18|DNAJC2|DNAJC3|DNAJC8|DNAJC9|DNLZ|ERN1|ERP29|FGB|FGF1|FICD|FNIP1|FNIP2|GAK|GET4|GNB5|GPR37|GRN|GRPEL1|GRPEL2|HDAC8|HES1|HIKESHI|HLA-B|HSCB|HSPA2|HSPA5|HSPA8|HSPB6|HSPD1|HSPE1|IQCG|LRP2|MAPT|METTL21A|MVD|NOD2|NUP62|PACRG|PDPN|PFDN4|PFDN6|PGLYRP1|PLG|PPEF2|PPID|PRKN|PRNP|PTGES3|PTGES3L|RNF207|RPS3|SACS|SCARB2|SDF2L1|SGTB|SLC12A2|SLC25A17|SNCA|SOD1|SPN|ST13|STAU2|STUB1|SUGT1|SYVN1|TBCA|TBCC|TBCD|TBCE|TERT|TFRC|TIMM10|TIMM44|TIMM9|TP53|TSACC|TSC1|TTC4|UBL4A|USP13|VWF|WRAP53| |
protein refolding | GO:0042026 | 25 | 25 | 17 | 0 | 0 | 0 | 0 | 0.591 | NA | B2M|CRYAA|CRYAB|DNAJA1|DNAJA2|DNAJA4|DNAJB2|FKBP1A|FKBP1B|HSP90AA1|HSPA13|HSPA14|HSPA1A|HSPA1B|HSPA1L|HSPA2|HSPA5|HSPA6|HSPA7|HSPA8|HSPA9|HSPB1|HSPB2|HSPB6|HSPD1| |
If you want to use your own GO annotations instead of getting files
provided by Pavlidis Lab, you can use makeAnnotation
after turning
your annotations into a list. See the example below
library('org.Hs.eg.db') # get go terms from bioconductor
goAnnots = as.list(org.Hs.egGO)
goAnnots = goAnnots %>% lapply(names)
goAnnots %>% head
## $`1`
## [1] "GO:0008150" "GO:0005576" "GO:0005576" "GO:0005576" "GO:0005615"
## [6] "GO:0005886" "GO:0031093" "GO:0034774" "GO:0062023" "GO:0070062"
## [11] "GO:0072562" "GO:1904813" "GO:0003674"
##
## $`2`
## [1] "GO:0001553" "GO:0001869" "GO:0002438" "GO:0006953" "GO:0007584"
## [6] "GO:0010037" "GO:0034695" "GO:0048863" "GO:0051384" "GO:1990402"
## [11] "GO:0005576" "GO:0005576" "GO:0005615" "GO:0031093" "GO:0062023"
## [16] "GO:0070062" "GO:0072562" "GO:0002020" "GO:0002020" "GO:0004866"
## [21] "GO:0004866" "GO:0004867" "GO:0005102" "GO:0005515" "GO:0019838"
## [26] "GO:0019899" "GO:0019959" "GO:0019966" "GO:0042802" "GO:0043120"
## [31] "GO:0048306" "GO:0048403" "GO:0048406"
##
## $`3`
## NULL
##
## $`9`
## [1] "GO:0006805" "GO:0005829" "GO:0004060" "GO:0004060"
##
## $`10`
## [1] "GO:0006805" "GO:0005829" "GO:0004060" "GO:0004060" "GO:0005515"
##
## $`11`
## NULL
The goAnnots object we created has go terms per entrez ID. Similar lists
can be obtained from other species db packages in bioconductor and some
array annotation packages. We will now use the makeAnnotation
function
to create our annotation file. This file will have the names of this
list (entrez IDs) as gene identifiers so any score or hitlist file you
provide should have the entrez IDs as well.
makeAnnotation
only needs the list with gene identifiers and go terms
to work. But if you want to have a complete annotation file you can also
provide gene symbols and gene names. Gene names have no effect on the
analysis. Gene symbols matter if you are providing custom gene
sets and using
“Option 2” or if same genes are represented by multiple gene identifiers
(eg. probes). Gene symbols will also be returned in the GeneMembers
column of the output. If they are not provided, gene IDs will also be
used as gene symbols
Here we’ll set them both for good measure.
geneSymbols = as.list(org.Hs.egSYMBOL) %>% unlist
geneName = as.list(org.Hs.egGENENAME) %>% unlist
annotation = makeAnnotation(goAnnots,
symbol = geneSymbols,
name = geneName,
output = NULL, # you can choose to save the annotation to a file
return = TRUE) # if you only want to save it to a file, you don't need to return
Now that we have the annotation object, we can use it to run an analysis. We’ll try to generate a hitlist only composed of genes annotated with GO:0051082.
mockHitlist = goAnnots %>% sapply(function(x){'GO:0051082' %in% x}) %>%
{goAnnots[.]} %>%
names
mockHitlist %>% head
## [1] "325" "811" "821" "871" "908" "1047"
oraOut = ora(annotation = annotation,
hitlist = mockHitlist)
head(oraOut$results) %>% knitr::kable()
Name | ID | NumProbes | NumGenes | RawScore | Pval | CorrectedPvalue | MFPvalue | CorrectedMFPvalue | Multifunctionality | Same as | GeneMembers |
---|---|---|---|---|---|---|---|---|---|---|---|
unfolded protein binding | GO:0051082 | 122 | 122 | 122.000 | 0E00 | 0E00 | 1.226E-306 | 5.253E-303 | 0.695 | NA | AFG3L2|AHSP|AIP|AIPL1|APCS|CALR|CALR3|CANX|CCAR2|CCDC115|CCT2|CCT3|CCT4|CCT5|CCT6A|CCT6B|CCT7|CCT8|CCT8L1P|CCT8L2|CDC37|CDC37L1|CHAF1A|CHAF1B|CLGN|CLPX|CLU|CRYAA|CRYAB|DNAJA1|DNAJA2|DNAJA3|DNAJA4|DNAJB1|DNAJB11|DNAJB13|DNAJB2|DNAJB3|DNAJB4|DNAJB5|DNAJB6|DNAJB7|DNAJB8|DNAJC4|ERLEC1|ERN1|ERN2|ERO1B|GRPEL1|GRPEL2|HEATR3|HSP90AA1|HSP90AA2P|HSP90AA4P|HSP90AA5P|HSP90AB1|HSP90AB2P|HSP90AB3P|HSP90AB4P|HSP90B1|HSP90B2P|HSPA1A|HSPA1B|HSPA1L|HSPA2|HSPA5|HSPA6|HSPA8|HSPA9|HSPB1|HSPB2|HSPB6|HSPD1|HSPE1|HTRA2|HYOU1|LMAN1|MKKS|NACA|NACA2|NACA4P|NACAD|NAP1L4|NDUFAF1|NPM1|NUDC|NUDCD2|NUDCD3|PDRG1|PET100|PFDN1|PFDN2|PFDN4|PFDN5|PFDN6|PPIA|PPIB|PTGES3|RP2|RUVBL2|SCAP|SCG5|SERPINH1|SHQ1|SIL1|SPG7|SRSF10|SRSF12|SSUH2|SYVN1|TAPBP|TCP1|TIMM10B|TMEM67|TOMM20|TOR1A|TRAP1|TTC1|TUBB4B|UGGT1|UGGT2|VBP1| |
protein folding chaperone | GO:0044183 | 69 | 69 | 47.000 | 2.823E-92 | 6.049E-89 | 1.565E-89 | 3.353E-86 | 0.573 | NA | ANP32E|APLF|CALR|CALR3|CCDC47|CCT2|CCT3|CCT4|CCT5|CCT6A|CCT6B|CCT7|CCT8|CCT8L1P|CCT8L2|CD74|CDC123|CLGN|CLPX|DFFA|DNAJB1|DNAJB6|DNAJB7|DNAJB8|FKBP8|HSP90AA1|HSP90AA2P|HSP90AA4P|HSP90AA5P|HSP90AB1|HSP90AB2P|HSP90AB3P|HSP90AB4P|HSP90B1|HSP90B2P|HSPA13|HSPA14|HSPA1A|HSPA1B|HSPA1L|HSPA2|HSPA4|HSPA4L|HSPA5|HSPA6|HSPA7|HSPA8|HSPA9|HSPB1|HSPB6|HSPD1|HSPE1|HSPH1|HYOU1|HYPK|KHSRP|LYRM7|PDCL3|PFDN1|PFDN2|RIC3|TAPBP|TCP1|TOR1A|TRAP1|WDR83OS|WIPF1|ZMYND10|ZPR1| |
ATP-dependent protein folding chaperone | GO:0140662 | 40 | 40 | 34.000 | 3.365E-72 | 4.807E-69 | 3.79E-69 | 5.413E-66 | 0.541 | NA | CCT2|CCT3|CCT4|CCT5|CCT6A|CCT6B|CCT7|CCT8|CCT8L1P|CCT8L2|CLPX|HSP90AA1|HSP90AA2P|HSP90AA4P|HSP90AA5P|HSP90AB1|HSP90AB2P|HSP90AB3P|HSP90AB4P|HSP90B1|HSP90B2P|HSPA13|HSPA14|HSPA1A|HSPA1B|HSPA1L|HSPA2|HSPA4|HSPA4L|HSPA5|HSPA6|HSPA7|HSPA8|HSPA9|HSPD1|HSPH1|HYOU1|TCP1|TOR1A|TRAP1| |
chaperone-mediated protein folding | GO:0061077 | 74 | 74 | 39.000 | 1.513E-69 | 1.621E-66 | 4.022E-67 | 4.308E-64 | 0.599 | NA | BAG1|CCT2|CCT3|CCT4|CCT5|CCT6A|CCT7|CCT8|CD74|CHORDC1|CLU|CRTAP|CSNK2A1|DFFA|DNAJB1|DNAJB12|DNAJB13|DNAJB14|DNAJB2|DNAJB3|DNAJB4|DNAJB5|DNAJB6|DNAJB7|DNAJB8|DNAJC18|DNAJC24|DNAJC5|DNAJC7|ERO1A|FKBP11|FKBP2|FKBP4|FKBP5|GAK|HSPA13|HSPA14|HSPA1A|HSPA1B|HSPA1L|HSPA2|HSPA5|HSPA6|HSPA7|HSPA8|HSPA9|HSPB1|HSPB6|HSPE1|HSPH1|P3H1|PDCL3|PDIA4|PEX19|PFDN1|PFDN2|PFDN4|PFDN5|PFDN6|PPIB|PPID|PTGES3|SDF2|SDF2L1|ST13|TCP1|TOR1A|TOR1B|TOR2A|TRAP1|UMOD|UNC45A|UNC45B|VBP1| |
protein-folding chaperone binding | GO:0051087 | 133 | 133 | 30.000 | 8.981E-40 | 7.697E-37 | 5.724E-38 | 4.906E-35 | 0.928 | NA | AHSA1|AHSA2P|ALB|AMFR|ATP1A1|ATP1A2|ATP1A3|ATP7A|BAG1|BAG2|BAG3|BAG4|BAG5|BAG6|BAK1|BAX|BIN1|BIRC2|BIRC5|CALR|CDC25A|CDC37|CDC37L1|CDK1|CDKN1B|CFTR|CLU|CP|CTSC|CYP1A1|CYP2E1|DNAAF6|DNAJA1|DNAJA2|DNAJA3|DNAJA4|DNAJB1|DNAJB12|DNAJB13|DNAJB14|DNAJB2|DNAJB3|DNAJB4|DNAJB5|DNAJB6|DNAJB7|DNAJB8|DNAJB9|DNAJC1|DNAJC10|DNAJC18|DNAJC2|DNAJC3|DNAJC8|DNAJC9|DNLZ|ERN1|ERP29|FGB|FGF1|FICD|FNIP1|FNIP2|GAK|GET4|GNB5|GPR37|GRN|GRPEL1|GRPEL2|HDAC8|HES1|HIKESHI|HLA-B|HSCB|HSPA2|HSPA5|HSPA8|HSPB6|HSPD1|HSPE1|IQCG|LRP2|MAPT|METTL21A|MVD|NOD2|NUP62|PACRG|PDPN|PFDN4|PFDN6|PGLYRP1|PLG|PPEF2|PPID|PRKN|PRNP|PTGES3|PTGES3L|RNF207|RPS3|SACS|SCARB2|SDF2L1|SGTB|SLC12A2|SLC25A17|SNCA|SOD1|SPN|ST13|STAU2|STUB1|SUGT1|SYVN1|TBCA|TBCC|TBCD|TBCE|TERT|TFRC|TIMM10|TIMM44|TIMM9|TP53|TSACC|TSC1|TTC4|UBL4A|USP13|VWF|WRAP53| |
protein refolding | GO:0042026 | 25 | 25 | 19.000 | 1.628E-38 | 1.163E-35 | 8.77E-36 | 6.263E-33 | 0.574 | NA | B2M|CRYAA|CRYAB|DNAJA1|DNAJA2|DNAJA4|DNAJB2|FKBP1A|FKBP1B|HSP90AA1|HSPA13|HSPA14|HSPA1A|HSPA1B|HSPA1L|HSPA2|HSPA5|HSPA6|HSPA7|HSPA8|HSPA9|HSPB1|HSPB2|HSPB6|HSPD1| |
We can see GO:0051082 is the top scoring hit as expected.