Skip to content

Downstreamer

Patrick Deelen edited this page Sep 28, 2024 · 54 revisions

Downstreamer can be used to to perform key gene prioritization using GWAS summary statistics. We do this using 57 tissue specific co-expression networks derived from the Recount3 data.

Content

1️⃣ Getting started
2️⃣ Running PascalX to obtain gene p-values
3️⃣ Tissue enrichment
4️⃣ Key gene enrichment
5️⃣ Code availability

1. Getting started

Download tool and reference data here: https://downloads.molgeniscloud.org/downloads/downstreamerRelease2.tar.gz

This includes the files that are needed for PascalX

2. Running PascalX to obtain gene p-values

Downstreamer needs gene level p-values for the analysis. PascalX can be used to convert the variant level summary statistics of GWAS to gene level summary statistics.

The instruction to do so are listed here: PascalX for Downstreamer

Other sources of gene p-values

In principle Downstreamer can also use gene p-values from another source. This is however not recommend as you would then also need to create a new null distribution for the gene p-values.

The expected format of gene p-values is a tab-separated file with 4 columns:

Column name Description
gene The name of the gene
pvalue The gene p-value
nsnps The number of SNPs on which p-value is based. Can be 1 for all if not applicable
min_pvalue The smallest SNP p-value. Can be zero for all if not applicable

The gene-gene correlations of the null gwas p-values are stored per chromosome arm and are using the following naming scheme: NAME_1_q_correlations.datg

The .datg files and corresponding .rows.txt.gz and .cols.txt.gz files can be created from a tab-seperated .txt file using the CONVERT_TXT mode of the Downstreamer.

Note: without updated null distributions the results might not be reliable.

3. Tissue enrichment

First we use Downstreamer to determine which tissue express the genes implicated by the GWAS using a tissue enrichment analysis. By doing this we make sure that the key genes predictions are driven by relevant co-expression instead of cell tissue specific expression.

For this first run: runDownstreamerTissueEnrichment.sh followed by the R code in: selectSignficantTissues.R

This will prepare a parameter specifying which tissue specific networks Downstreamer should use in the next step.

4. Key gene enrichment

We are now ready to run the actual key gene prioritization. This done by: runDownstreamerKeygenePrediction.sh

The resulting key gene prioritization per tissue are found in: _keygene_enrichtments.xlsx

If needed the Z-scores of the different tissues can be meta-analyzed to obtain the final key gene prioritization score.

5. Code availability

https://github.com/molgenis/systemsgenetics/tree/master/Downstreamer

Clone this wiki locally