-
Notifications
You must be signed in to change notification settings - Fork 0
Pathway analysis
🚧 This page is still under construction! 🚧
Genes involved in the 🔸same molecular pathway🔸 ("functional unit") are frequently involved in 🔹similar patient phenotypic presentations🔹. This means that we might draw statistical power from multiple independent, deleterious variants in the same functional units, rather than just in the same genes !
However, such an approach should be pursued with caution, as patient phenotypes stemming from changes to different genes in the same functional unit may vary to a great extent. Such differences in patient presentations may render the clinical evaluations and therapeutic potential of statistically significant findings virtually impossible.
To mitigate this issue, we first initially consider groups of patients with similar phenotypes, and then within each of these groups, assess the overrepresentation of deleterious mutations across established biological pathways.

We compute all-against-all semantic similarity of patients' HPO phenotype terms using Phrank, for which open-source code is available.
-
Download the relevant Phrank Python scripts from Bitbucket into this working directory as follows:
# Update this to reflect the full path to this repo on your machine: RAMEDIES_REPO=/full/path/to/github/repo/RaMeDiES mkdir -p $RAMEDIES_REPO/phrank wget -O $RAMEDIES_REPO/phrank/__init__.py https://bitbucket.org/bejerano/phrank/raw/7e0adbbe07cb12589ddba730565d283f6f06507e/phrank/__init__.py wget -O $RAMEDIES_REPO/phrank/utils.py https://bitbucket.org/bejerano/phrank/raw/7e0adbbe07cb12589ddba730565d283f6f06507e/phrank/utils.py
-
Create a tab-delimited file (called
test_patient_hpoterms.tsvin our implementation) with the following columns:-
Required:
patient_idcorresponding to a unique patient, e.g., UDN123456 -
Required:
hpo_idwith no spaces and prefixed with "HP:", e.g., HP:0004209 - (optional)
patient_statuswith a string indicatingaffected,unaffected, orunknown; only affected patients are considered - (optional)
hpo_namefull name of HPO term, e.g.,Clinodactyly of the 5th finger - (optional)
hpo_statusindicating if the HPO term is present (True) or absent (False); only True terms are considered - (optional)
quality_flagindicating PASS if the HPO terms should be considered; some artifactual HPO terms or those obtained from questionable procedures can be excluded; only PASS terms are considered
-
Required:
-
Compute all-against-all pairwise similarities between patients' sets of HPO terms and write out to file
test_pairwise_similarity.tsv:
python /full/path/to/github/repo/RaMeDiES/pathway_analysis.py \
--pairwise_similarity \
--hpo_file=test_patient_hpoterms.tsv \
--similarity_file=test_pairwise_similarity.tsvWe load pairwise similarity scores into a dissimilarity matrix in R to perform hierarchical clustering of patients and export the per-patient cluster assignments. You must have a local installation of R and you must have the cluster library installed (e.g., install.packages("cluster")).
We run the relevant R code from within the same Python script; the Python rpy library must be installed as well (e.g., pip3 install rpy2).
python /full/path/to/github/repo/RaMeDiES/pathway_analysis.py \
--cluster_patients \
--similarity_file=test_pairwise_similarity.tsv \
--cluster_assignments_file=test_cluster_assignments.tsvWe identify candidate genes per patient and assign these genes to each cluster. In our implementation, we consider candidate genes to be:
- all known diagnostic gene(s) as annotated in the Undiagnosed Diseases Network (UDN) researcher portal
- genes harboring a compelling de novo variant (see details in our bioRxiv preprint) if no certain and complete diagnostic gene(s) were listed
- genes harboring a compound heterozygous variant ranked in the top 100 using RaMeDiES individual-level test if no certain and diagnostic gene(s) were listed and fewer than two de novo candidates were identified.
You must create a tab-delimited list of candidate genes (called test_candidate_genes.tsv in our implementation) for all samples in your cohort with the following columns:
-
sample_idcorresponding to a unique patient and matching sample_id from the previous step, e.g., UDN123456 -
gene_namecorresponding to an HGNC gene symbol - (Optional)
gene_typein our case, corresponding toknown_diagnosis,denovo, orcomphet - (Optional)
diagnosisin our case, corresponding todiagnosis_match_{tentative/likely/certain}_{partial/complete} - (Optional)
pheno_matchin our case, corresponding to the Phrank similarity score between patient and gene
Run the following to generate a tab-delimited list of cluster assignments and patient details and query genes per cluster:
python /full/path/to/github/repo/RaMeDiES/pathway_analysis.py \
--query_genes \
--cluster_assignments_file=test_cluster_assignments.tsv \
--candidate_genes_file=test_candidate_genes.tsv \
--query_genes_file=test_cluster_details.tsv \Finally, we run gene set enrichment analysis on the sets of query genes created in the previous step, and we output tab-delimited results.
python /full/path/to/github/repo/RaMeDiES/pathway_analysis.py \
--run_gsea \
--query_genes_file=test_cluster_details.tsv \
--enrichment_output_file=test_cluster_enrichments.tsvRaMeDiES | Last updated: February 2024