Integration of proteomics with genomics and transcriptomics increases the diagnosis rate of Mendelian disorders
This project contains different scripts to automatize and visualize analysis performed for the "Integration of proteomics with genomics and transcriptomics increases the diagnosis rate of Mendelian disorders" paper.
Webserver, produced as one of the outputs of the pipeline.
This project is setup as a wBuild workflow. This is an automatic build tool for R reports based on snakemake.
- The
wbuild.yamlis the main configuration file to setup up the workflow - The
Scriptsfolder contains scripts which will be rendered as HTML reports - The
srcfolder contains additional helper functions and scripts - The
Outputfolder will contain all files produced in the analysis pipelineOutput/htmlcontains the final HTML report
This project depends on the packages wBuild and PROTRIDER, developed by Gagneur Lab
The pipeline starts with the series of files available via Zenodo: DOI:10.5281/zenodo.4501904
-
raw_dataproteomics_annotation.tsv- sample annotationproteomics_not_normalized.tsv- Proteomics intensity matrixraw_counts.tsv- RNA-seq count matrixPatient_HPO_phenotypes.tsv- Phenotype data recorded using HPO terms for diagnosed cases.enrichment_proportions_variants.tsv- Results of rare variant enrichment/proportion analysis calculated on the full dataset.patient_variant_hpo_data.tsv- Gene annotation for all individuals. Since the genetic data are not publicly shareable, we provide only gene-level information for outlier genes only.
-
datasets-
disease_genes.tsv- List of Mendelian disease genes aggregated from several studies. -
HGNC_mito_groups.tsv- Subset of HGNC gene groups related to mitochondria.Downloaded automatically:
-
gencode.v29lift37.annotation.gtf.gz- Gene-level model based on the GENCODE 29 transcript model -
Table_S1_gene_info_at_protein_level.xlsx- Supplementary Tble1 from GTEx proteomics study Jiang et al, 2020, Cell Data is available at the GTEx page -
allComplexes.txt- CORUM protein complexes, available at CORUM web page
-
The proteomic raw data and MaxQuant search files have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository and can be accessed using the dataset identifier PXD022803
First download the repo and its dependencies:
# analysis code
git clone https://github.com/prokischlab/omicsDiagnostics
cd omicsDiagnostics
and install wbuild using pip by running.
pip install wBuild
wbuild init
Since wBuild init will reset the current Snakefile, readme.md, and wbuild.yaml we have to revert them again with git.
git checkout Snakefile
git checkout wbuild.yaml
git checkout readme.md
Next clone outrider2 branch from original OUTRIDER repository. OUTRIDER2 includes implementation of protrider algorithm.
# OUTRIDER2 to detect outliers in proteomics data
git clone --branch outrider2 https://github.com/gagneurlab/OUTRIDER.git
Specify correct file and folder locations in the wbuild.yaml.
For higher stability we recommend specifying of full paths.
-
Create Conda environment
conda env create --name omicsDiagnosticsMinimal --file=environment.yml
-
R packages
- Make sure that
data.tableis installed or install withinstall.packages("data.table") Rscript src/installRPackages.R src/requirementsR.txt
- Make sure that
To run the full pipeline, execute the following commands with 10 cores in parallel:
-
conda activate omicsDiagnosticsMinimal -
snakemake graph -
snakemake -c 10
If dag doesn't work run: snakemake --snakefile Snakefile.dag --dag | dot -Tpng > dag.png
A comprehensive tool for analyzing and visualizing multi-omics data in the context of rare disease diagnostics.
- Integration of RNA and protein expression data
- Visualization of patient-specific omics profiles
- Interactive exploration of genetic variants
- Phenotype similarity analysis
- Protein complex analysis
The application is available online at: https://prokischlab.shinyapps.io/omicsDiagnosticsAPP/
- Clone this repository
- Install required R packages:
install.packages(c("shiny", "data.table", "plotly", "DT", "yaml", "gganatogram", "shinyjs", "shinybusy", "shinyWidgets", "shinythemes", "tippy", "bslib"))
-
Run the app locally:
shiny::runApp("omicsDiagnosticsAPP")
-
Or use the online version at https://prokischlab.shinyapps.io/omicsDiagnosticsAPP/
The app uses pre-processed data stored in the shiny_data directory. To prepare the data:
- Run the data preparation script:
source("omicsDiagnosticsAPP/prepare_shiny_data.R")
To deploy the app to ShinyApps.io:
- Make sure you have the
rsconnectpackage installed - Run the deployment script:
source("omicsDiagnosticsAPP/deploy.R")
This project is licensed under the MIT License - see the LICENSE file for details.
For questions or support, please contact the development team.