Stochastic GOEA Simulations

Stochastic simulations of multitudes of Gene Ontology Enrichment Analyses (GOEAs)
are used to generate simulated values of FDR, sensitivity, and specificity for GOEAs run using GOATOOLS.

This repo also contains stochastic simulations showing the FDR, sensitivity, and specificity of multipletest correction methods including FDR Benjamini/Hochberg (non-negative) and Bonferroni one-step correction. These simulations were used to architect the overall simulation strategy and investigate an effective figure to display multiple sets of information including:

Study size
Percentage of the study that was background (noise).
FDR
Sensitivity
Specificty

Conclusions from Stochastic GOEA Simulations

GO terms associated with huge numbers (thousands, in human) of genes cause FDR failures
Removing even just 30 of the 17,000+ (human) GOs which are highly annotated causes good passing FDRs
A study size of 4 genes in a GOEA will likely return an unacceptable amount of misses (False Negative)
As study size increases, sensitivity improves (e.g., better sensitivity, fewer False Negatives)
As the percentage of 'actually significant genes' rises in the study set, so does sensitivity
Using a version of propagate counts greatly improves sensitivity
Remove selected highly annotated GO terms prior to running a GOEA using these criteria:
- Highly annotated GO terms (e.g., top 1%). Example in human: remove GOs assc. w/thousands of genes
- low depth (near the top)
- high descendant count

Recreating the stochastic simulations
Figures in manuscript:
- Manuscript Figures
- Supplemental Figures
  - Initial failing simulations
  - Exploratory simulations: Stress tests with associations shuffled stochastically:
    - Enriched-only viewed
    - 30 Broad GO terms removed

To Cite

Please cite the following paper if you mention the stochastic simulations in this repo in your research

GOATOOLS: A Python library for Gene Ontology analyses
Klopfenstein DV, Zhang L, Pedersen BS, ... Tang H
2018 | Scientific reports | PMID:30022098 | DOI:10.1038/s41598-018-28948-z

Details

Recreating the stochastic simulations

To recreate all five of our stochastic GOEA simulation plots (for a total of 100,000 total stochastic simulations) featured in the GOATOOLS manuscript and supplemental data, clone the repository, https://github.com/dvklopfenstein/goatools_simulation, and run this make target from the command line:

  $ make run_ms

Manuscript Figures

Results for 40,000 GOATOOLS GOEA stochastic simulations (20,000 simulations for each panel) with varying sensitivity and consistently high specificity. GOEAs performed well on study groups of 8+ genes if the GOATOOLS GOEA option propagate_counts set to True.

Supplemental Figures

Supplemental Figure 1) Initial failing simulations

The first GOATOOLS GOEA simulations fail in panels A3 and A4 with FDR values exceeding the alpha of 0.05 set by the researcher. The values of failing FDRs are shown using red text. The source of the failures were false positives for GO terms annotated with large numbers of gene products. For mouse annotations in the biological_process branch, GO terms annotated with 1,000 or more genes were the source of failures.

Supplemental Figure 2) Enriched-only viewed

GOATOOLS GOEAs stress tests with randomly shuffled associations nearly pass if only enriched GO terms are viewed. The associations are randomly shuffled while still maintaining the distribution number of GO terms per gene. The failing FDRs (above 0.05) are seen in panels A2 and A3 for gene groups having 96, 112, or 124 genes.

Supplemental Figure 3) 30 Broad GO terms removed

GOATOOLS GOEAs stress tests with randomly shuffled associations pass for all cases if only 30 out of over 17k+ GO terms associated with more than 1000 genes are removed. The median number of genes per GO term in the mouse associations is 3 genes/GO. Genes per GO term ranges from 1 gene to ~7k genes per GO term. (mean=16 genes/GO, SD=128).

Name		Name	Last commit message	Last commit date
Latest commit History 713 Commits
doc		doc
log/plt_goea_small		log/plt_goea_small
src		src
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
README_details.md		README_details.md
go-basic.obo		go-basic.obo
goslim_generic.obo		goslim_generic.obo
makefile		makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stochastic GOEA Simulations

Conclusions from Stochastic GOEA Simulations

Table of Contents

To Cite