Releases: callahantiff/PheKnowLator
v3.1.2
Release: v3.1.2
Website: https://github.com/callahantiff/PheKnowLator/wiki/v3-Build-Details
Data Access: Archived Builds
Docker Container: DockerHub Dedicated Project Container
PyPI: pkt-kg 3.1.2
Description
This release provides bug repairs for merging ontologies and addressed issue #140. Thanks to @ GuarinoValentina for helping point out this error!
v3.1.1
Release: v3.1.1
Website: https://github.com/callahantiff/PheKnowLator/wiki/v3-Build-Details
Data Access: Archived Builds
Docker Container: DockerHub Dedicated Project Container
PyPI: pkt-kg 3.1.1
Description
This release provides minor bug repairs for the updates that were made to the OWL-NETS workflow in v3.1.0
. Thanks to @sanyabt for helping point out this error!
v3.1.0
Release: v3.1.0
Website: https://github.com/callahantiff/PheKnowLator/wiki/v3-Build-Details
Data Access: Archived Builds
Docker Container: DockerHub Dedicated Project Container
PyPI: pkt-kg 3.1.0
Description
This release includes updates to the OWL-NETS workflow, addresses deprecated functions associated with Networkx v3.0, and removes pkt
namespace from the final OWL files.
Updated Jupyter Notebooks:
Updated Scripts:
.github/workflows/build-qa.yml
pkt_kg/__version__.py
pkt_kg/metadata.py
pkt_kg/construction_approach.py
pkt_kg/downloads.py
pkt_kg/edge_list.py
pkt_kg/knowledge_graph.py
pkt_kg/metadata.py
pkt_kg/owlnets.py
pkt_kg/utils/kg_utils.py
pkt_kg/utils/data_utils.py
tests/test_metadata.py
tests/test_owlnets.py
v3.0.2
Release: v3.0.2
Website: https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0
Data Access: Archived Builds
Docker Container: DockerHub Dedicated Project Container
PyPI: pkt-kg 3.0.2
Updated Jupyter Notebooks:
Updated Scripts:
builds/data_preprocessing.py
pkt_kg/metadata.py
pkt_kg/utils/kg_utils.py
builds/data_to_download.txt
pkt_kg/utils/data_utils.py
tests/test_data_utils_downloading.py
Updates
-
Addresses issue #118 (PR: #119) by patching the prior functionality related to obtaining labels and definitions from ontologies. Specifically, it now ensures that whenever possible the language encoding for these fields is English. Please see details below for information on how to address nodes containing foreign characters prior to this release.
Solution for Builds Prior to
v3.0.2
The (bad_node_patch.json
- attached) file contains a dictionary where the outer keys are theentity_uri
and the outer values are another dictionary where the inner keys arelabel
anddescription/definition
and the inner values for these inner keys are the updated strings without foreign characters. An example of this dictionary is shown below:key = '<http://purl.obolibrary.org/obo/UBERON_0000468>'
print(bad_node_patch[key])
{'label': 'multicellular organism', 'description/definition': 'Anatomical structure that is an individual member of a species and consists
of more than one cell.'}
The code to identify the nodes with erroneous foreign characters is shown below:
```python
import re
import pandas as pd
# link to downloaded `NodeLabels.txt` file
input_file = `'NodeLabels.txt'`
# load data as Pandas DataFrame
nodedf = pd.read_csv(input_file, sep='\t', header=0)
# identify bad nodes and filter DataFrame so it only contains these rows
nodedf['bad'] = nodedf['label'].apply(lambda x: re.search("[\u4e00-\u9FFF]", x) if not pd.isna(x) else None)
nodedf_bad_nodes = nodedf[~pd.isna(nodedf['bad'])].drop_duplicates()
v3.0.1
Release: v3.0.1
Website: https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0
Data Access: Archived Builds
Docker Container: DockerHub Dedicated Project Container
PyPI: pkt-kg 3.0.1
Updated Jupyter Notebooks:
Updated Scripts:
pkt_kg/metadata.py
Updates
v3.0.0
Release: v3.0.0
Website: https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0
Data Access: Archived Builds
Docker Container: DockerHub Dedicated Project Container
PyPI: pkt-kg 3.0.0
Updated Jupyter Notebooks:
Updated Scripts:
pkt_kg/utils/kg_utils.py
builds/data_preprocessing.py
builds/deploy/triple-store/docker-compose.yml
Updates
- The
gets_ontology_class_dbxrefs()
andgets_ontology_class_synonyms()
functions were updated to account for classes and object properties that may have the same synonym and dbXref and/or multiple synonyms and dbXrefs. Originally, these functions were keyed by a synonym or dbXref string with class and object property URLs as values. This change maintains the same keys, but now includes a list of potential class and object property URLs for each key - Both notebooks and the
builds/data_preprocessing.py
script have been updated to reflect and account for this change - Updated the
docker-compose.yml
file to account for changes made in the DBCLS SPARQL proxy
v2.1.1
Release: v2.1.1
Website: https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0
Data Access: Archived Builds
Docker Container: DockerHub Dedicated Project Container
PyPI: pkt-kg 2.1.1
Updated Jupyter Notebooks:
Updated Scripts:
pkt_kg/owlnets.py
pkt_kg/utils/kg_utils.py
Updates
- For the
owlnets.py
script, three new hyperparameters were added to provide users with more flexibility in terms of what support, top, and relation ontology objects are included in an OWL-NETS graph. The pruning functions were also improved to make sure that metadata are not getting through (i.e., obsolete classes and XML Schema). - For the
OWLNETS_Example_Application.ipynb
Jupyter Notebook, new functionality was added to include node and relation definitions. - Added new function to obtain the definitions for all
owl:Class
andowl:ObjectProperty
objects to 'pkt_kg/utils/kg_utils.py`
V2.1.0
Release: v2.1.0
Website: https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0
Data Access: Archived Builds
Docker Container: DockerHub Dedicated Project Container
PyPI: pkt-kg 2.1.0
New Jupyter Notebooks:
Updates
- Parallelize
edge_list.py
,knowledge_graph.py
, andowlnets.py
usingray
- Moderate updates to the logic for how non-ontology data are added to the merged set of base ontologies. Please see the
resources/consrtuction_approach/README.md
for additional details and updated examples. - New functionality added for splitting the logical core of a graph from its annotation assertions.
- Changed the output files: no longer generating
.owl
files. - Cleaned up OWL-NETS helper functions and modified the logic for filtering OWL-specific annotations and axioms. Also added logic to enforce that the OWL-NETS graphs are all a single connected component.
- Added more extensive statistics to logging and which print during the run-time.
- Adding arguments for progress/logging verbosity.
- New method added for better load balancing when input into Ray
Performance Stats used in Testing
GCP Instance:
- Machine Type: custom (24 vCPU, 500 GB memory)
- CPU Platform: Intel Haswell
- Image OS: Debian, Debian GNU/Linux, 10 (buster), amd64 built on 20210217
- Boot Disc: Balanced persistent disk (150 GB)
Graph Build Statistics:
Maximum Memory Use (GiB):
Runtime (minutes):
V2.0.0
First Official Release
Website: https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0
Data Access: Archived Builds
Docker Container: DockerHub Dedicated Project Container
PyPI: pkt-kg 2.0.1
Jupyter Notebooks:
All changes between this and the last release are thoroughly documented on the project Wiki under the v2.0.0 release. Please see that page for all described changes and updates between this and the prior release. This page also contains a description of the data used for the build as well as the data files generated as part of the build.
Note: The version on PyPI has been bumped to v2.0.1
instead of v2.0.0
. This is the result of a testing error that caused an early release of the software on PyPI. Please use the latest version of the library available on PyPI. This issue will be resolved to equate the version on GitHub and PyPI in a future release.
First pre-release
Release: v1.0.0
This is the first release of the PheKnowLator project. Additional information can be found here.
Data Sources
Ontologies
Classes
- Human Disease Ontology
- Gene Ontology: gene associations
- Reactome: gene associations
- Human Phenotype Ontology: all source annotations - genes to phenotypes
- Human Phenotype Ontology: all source annotations - diseases to genes to phenotypes
Instances
- CTD: chemicals-genes
- CTD: chemicals-pathways
- CTD: chemicals-diseases
- CTD: genes-pathways
- CTD: diseases-pathways
- STRING DB: Proteins
- String DB: entrez gene mappings
Knowledge Representation
Results
- Deductively closed knowledge graph - a text file that contains all of the knowledge graph edges, labeled.
- Labeled embeddings - a text file containing an embedding for each node in the knowledge graph.