This estimates how likely it is for groups of individuals to have similar phenotypes. To estimate this probability, we need three things:
- a way to quantify phenotypic similarity of two individuals. We use the maximum information content of the most informative common ancestor for each pair of HPO terms from two probands.
- a way to quantify similarity across more than two probands. We sum phenotypic similarity scores from all pairs of probands.
- a null distribution of similarity scores for those probands, generated by randomly sampled groups of probands
The P value is calculated as the proportion of simulated scores greater than the observed probands' score.
Install the package with:
pip install hpo_similarity
hpo_similarity --genes genes.json --phenotypes phenotypes.json
The data directory on github includes example files, one with proband IDs per gene (data/example_genes.json), and the other with HPO terms per proband (data/example_phenotypes.json).
Additional options:
--output PATH
to send output gene and P-values to a file.--ontology PATH
to use a HPO ontology file other than the default.--iterations INTEGER
to change the number of iterations (default=100000)
You can also explore the HPO graph using the hpo_similarity package within python, for example:
from hpo_similarity import open_ontology
graph, alt_ids, obsolete_ids = open_ontology()
# find all descendant terms
# get the text for the phenotypic abnormality
This code incorporates the following code and datasets:
- a python ontology parser written by Tamás Nepusz.
- the hp.obo file from the Human Phenotype Ontology Consortium.