-
Notifications
You must be signed in to change notification settings - Fork 12
Laboratory Tests
- Adrianne Stefanski - Biocurator 🔬
- Tellen D. Bennett - Clinician 🏥
- James A. Feinstein - Clinician 🏥
- Blake Martin - Clinician 🏥
- Nicole Vasilevsky - Biocurator 🔬
- Xingmin Aaron Zhang - Biocurator 🔬
- Leigh Carmody - Biocurator 🔬
- Peter N. Robinson - Biocurator 🔬
The goal of this project was to map measurement results drawn from the Observational Medical Outcomes Partnership (OMOP) common data model to the Open Biomedical Ontologies (OBO). Specifically, we aimed to annotate all unique test results LOINC codes assigned to at least 1 patient (n=902 codes; 2,706 test results) to an OBO ontology.
“The Open Biological and Biomedical Ontology (OBO) Foundry is a collective of ontology developers that are committed to collaboration and adherence to shared principles. The mission of the OBO Foundry is to develop a family of interoperable ontologies that are both logically well-formed and scientifically accurate.” -OBO Foundry
Currently, there are very few annotations (i.e. mappings or connecting of similar concepts from different sources) that exist between clinical terminologies and the OBO ontologies. Creating these mappings enables transition into a reproducible research framework where clinical observations can be viewed within the context of their underlying molecular mechanism(s).
This task will use the Human Phenotype Ontology (HPO), the uber-anatomy (UBERON)/Chemical Entities of Biological Interest (ChEBI), :
The Human Phenotype Ontology (HPO) provides a standardized vocabulary of phenotypic abnormalities encountered in human disease. Each term in the HPO describes a phenotypic abnormality, such as Atrial septal defect. - HPO
The Uber-Anatomy Ontology (UBERON) represents anatomy (i.e. body parts, organs and tissues) for multiple species. - UBERON
The Chemical Entities of Biological Interest (ChEBI) represents molecular entities, specifically, small chemical compounds. - ChEBI
The National Center for Biotechnology Information Taxonomy (NCBITaxon) ontology is an automatic translation of the NCBI taxonomy database into obo/owl. - NCBITaxon
The Protein Ontology (PRO) provides an ontological representation of protein-related entities by explicitly defining them and showing the relationships between them. Each PRO term represents a distinct class of entities (including specific modified forms, orthologous isoforms, and protein complexes) ranging from the taxon-neutral to the taxon-specific (e.g. the entity representing all protein products of the human SMAD2 gene is described in PR:Q15796; one particular human SMAD2 protein form, phosphorylated on the last two serines of a conserved C-terminal SSxS motif is defined by PR:000025934). - PRO
The Cell Ontology (CL) is designed as a structured controlled vocabulary for cell types. This ontology was constructed for use by the model organism and other bioinformatics databases, where there is a need for a controlled vocabulary of cell types. This ontology is not organism specific it covers cell types from prokaryotes to mammals. However, it excludes plant cell types, which are covered by PO. - CL
Our goal, was to map all unique LOINC laboratory test results (i.e. low, normal, or high), assigned to at least one pediatric patient, to HPO. For laboratory tests, each result is considered independently in order to find the best possible mapping to an ontology concept.
LOINC | Result | HPO |
---|---|---|
LOINC_28606-2 : 1-Methylhistidine/Creatinine [Ratio] in Urine | Low | Decreased urinary 1-methylhistidine (HP_0410314 ) |
LOINC_28606-2 : 1-Methylhistidine/Creatinine [Ratio] in Urine | Normal | NOT(Abnormal urinary 1-methylhistidine concentration) (HP_0410313 ) |
LOINC_28606-2 : 1-Methylhistidine/Creatinine [Ratio] in Urine | High | Increased urinary 1-methylhistine (HP_0410315 ) |
The following tasks were performed to map LOINC laboratory test results to the HPO:
- Export each LOINC id and it’s ancestors from a pediatric (CHCO) instance of the OMOP common data model (data exported October, 2018). The SQL code that was used to retrieve these codes is stored as a GitHub Gist and can be found here. For convenience, the queries are also shown below:
WITH
measurement_concepts
AS (SELECT
m.measurement_concept_id AS CONCEPT_ID,
CONCAT(LOWER(c.vocabulary_id), ":", c.concept_code) AS CONCEPT_SOURCE_CODE,
c.concept_name AS CONCEPT_LABEL,
c.vocabulary_id AS CONCEPT_VOCAB,
v.vocabulary_version AS CONCEPT_VOCAB_VERSION
FROM
CHCO_DeID_Oct2018.measurement m
JOIN CHCO_DeID_Oct2018.concept c ON m.measurement_concept_id = c.concept_id
JOIN CHCO_DeID_Oct2018.vocabulary v ON c.vocabulary_id = v.vocabulary_id
WHERE
c.concept_name != "No matching concept"
AND c.domain_id = "Measurement"
GROUP BY CONCEPT_ID, CONCEPT_SOURCE_CODE, CONCEPT_LABEL, CONCEPT_VOCAB, CONCEPT_VOCAB_VERSION),
measurement_ancestors
AS (SELECT
ca.descendant_concept_id AS CONCEPT_ID,
STRING_AGG(DISTINCT(CAST(c1.concept_id as STRING)), " | ") AS ANCESTOR_CONCEPT_ID,
STRING_AGG(DISTINCT(CONCAT(LOWER(c1.vocabulary_id), ":", c1.concept_code)), " | ") AS ANCESTOR_SOURCE_CODE,
STRING_AGG(DISTINCT(c1.concept_name), " | ") AS ANCESTOR_LABEL,
STRING_AGG(DISTINCT(c1.vocabulary_id), " | ") AS ANCESTOR_VOCAB,
STRING_AGG(DISTINCT(v.vocabulary_version), " | ") AS ANCESTOR_VOCAB_VERSION
FROM
CHCO_DeID_Oct2018.concept_ancestor ca
JOIN CHCO_DeID_Oct2018.concept c1 ON ca.ancestor_concept_id = c1.concept_id
JOIN CHCO_DeID_Oct2018.vocabulary v ON c1.vocabulary_id = v.vocabulary_id
WHERE
ca.descendant_concept_id IN (SELECT CONCEPT_ID FROM measurement_concepts)
AND c1.concept_name != "No matching concept"
AND c1.concept_id IS NOT NULL
AND c1.domain_id = "Measurement"
GROUP BY CONCEPT_ID),
measurement_results
AS (SELECT
measurement_concept_id AS CONCEPT_ID,
CASE WHEN REGEXP_CONTAINS(STRING_AGG(range_low_source_value, ""), r'(?i)(positive|negative)') IS TRUE THEN "Negative/Positive"
WHEN REGEXP_CONTAINS(STRING_AGG(range_high_source_value, ""), r'(?i)(positive|negative)') IS TRUE THEN "Negative/Positive"
WHEN REGEXP_CONTAINS(STRING_AGG(range_low_source_value, ""), r'[[:digit:]]') IS TRUE THEN "Normal/Low/High"
WHEN REGEXP_CONTAINS(STRING_AGG(range_high_source_value, ""), r'[[:digit:]]') IS TRUE THEN "Normal/Low/High"
ELSE NULL END AS RESULT_TYPE
FROM CHCO_DeID_Oct2018.measurement
WHERE measurement_concept_id in (SELECT CONCEPT_ID FROM measurement_concepts)
GROUP BY CONCEPT_ID),
measurement_scale
AS (SELECT
s.concept_id AS CONCEPT_ID,
REPLACE(STRING_AGG(DISTINCT(s.concept_synonym_name), " | "), '; ', ' | ') AS CONCEPT_SYNONYM,
STRING_AGG(s.concept_synonym_name, ""),
CASE WHEN REGEXP_CONTAINS(STRING_AGG(s.concept_synonym_name, ""), r'(?i)ordinal') IS TRUE THEN "ORD"
WHEN REGEXP_CONTAINS(STRING_AGG(s.concept_synonym_name, ""), r'(?i)nominal') IS TRUE THEN "NOM"
WHEN REGEXP_CONTAINS(STRING_AGG(s.concept_synonym_name, ""), r'(?i)quantitative') IS TRUE THEN "QUANT"
WHEN REGEXP_CONTAINS(STRING_AGG(s.concept_synonym_name, ""), r'(?i)qualitative') IS TRUE THEN "QUAL"
WHEN REGEXP_CONTAINS(STRING_AGG(s.concept_synonym_name, ""), r'(?i)narrative') IS TRUE THEN "NAR"
WHEN REGEXP_CONTAINS(STRING_AGG(s.concept_synonym_name, ""), r'(?i)doc') IS TRUE THEN "DOC"
WHEN REGEXP_CONTAINS(STRING_AGG(s.concept_synonym_name, ""), r'(?i)(panel|pnl|panl)') IS TRUE THEN "PNL"
ELSE "Unmapped Scale Type" END AS SCALE
FROM CHCO_DeID_Oct2018.concept_synonym s
WHERE s.concept_id in (SELECT CONCEPT_ID FROM measurement_concepts)
GROUP BY CONCEPT_ID),
measurement_metadata_update
AS (SELECT
r.CONCEPT_ID,
CASE WHEN (r.RESULT_TYPE IS NULL AND s.SCALE = "ORD") AND REGEXP_CONTAINS(s.CONCEPT_SYNONYM, r'(?i)screen') IS TRUE THEN "Negative/Positive"
WHEN (r.RESULT_TYPE IS NULL AND s.SCALE = "ORD") AND REGEXP_CONTAINS(s.CONCEPT_SYNONYM, r'(?i)presence') IS TRUE THEN "Negative/Positive"
WHEN r.RESULT_TYPE IS NULL AND s.SCALE = "QUANT" THEN "Normal/Low/High"
WHEN r.RESULT_TYPE IS NOT NULL THEN r.RESULT_TYPE
ELSE "Unknown Result Type" END AS RESULT_TYPE,
CASE WHEN s.SCALE IS NULL THEN "Other" # for non-LOINC scale types
ELSE s.SCALE END AS SCALE
FROM
(SELECT * FROM measurement_results) r
FULL JOIN (SELECT * FROM measurement_scale) s ON r.CONCEPT_ID = s.CONCEPT_ID)
SELECT
m.CONCEPT_ID,
m.CONCEPT_SOURCE_CODE,
m.CONCEPT_LABEL,
m.CONCEPT_VOCAB,
m.CONCEPT_VOCAB_VERSION,
s.CONCEPT_SYNONYM,
a.ANCESTOR_CONCEPT_ID,
a.ANCESTOR_SOURCE_CODE,
a.ANCESTOR_LABEL,
a.ANCESTOR_VOCAB,
a.ANCESTOR_VOCAB_VERSION,
u.SCALE,
u.RESULT_TYPE
FROM measurement_concepts m
FULL JOIN measurement_ancestors a ON m.CONCEPT_ID = a.CONCEPT_ID
FULL JOIN measurement_scale s ON m.CONCEPT_ID = s.CONCEPT_ID
FULL JOIN measurement_metadata_update u ON m.CONCEPT_ID = u.CONCEPT_ID;
Two verification approaches were applied, the first was survey-based and the second involved manual mapping verification by a professional biocurator.
A subset (n=270) of pediatric-specific laboratory test result mappings were independently validated by five domain experts (i.e. three pediatric clinicians, a PhD-level molecular biologist, and a master’s-level epidemiologist). The study was approved by the Colorado Multiple Institutional Review Board (15-0445).
To perform this validation, a Qualtrics survey (see QR code) was designed so that each question featured a laboratory test description and set of reasonable HPO concepts.
The survey was completed by all experts between October and December (2018). After completion, any laboratory test mapping that did not meet agreement by at least one clinician and both the biologist/epidemiologist were re-evaluated with one clinician until consensus was reached (n=58 lab results). These terms were additionally vetted on the loinc2hpoAnnotation GitHub tracker by the entire team of HPO biocurators.
Results. Agreement on mapping was 95.9% between the clinicians, 79.3% between the epidemiologist and biologist, and 90.7% between the clinicians and the biologist and epidemiologist. The best mapping across all experts, was 92% in agreement with existing LOINC2HPO mappings.
The subset of 691 randomly selected LOINC codes were verified by a professional biocurator. A screenshot of the verification table is shown below. Additional information on this mapping process, including the new terms we requested in order to complete this mapping, can be found in the Human Phenotype Ontology GitHub tracker.
To verify or search the ontologies for alternative terms, the biocurator was asked to use the following resource:
- Verify each of the mappings, row-by-row considering each LOINC lab code result within the context of the ontologiy mappings that have been provided.
- The goal is to find the best mapping between a single ontology term and a LOINC laboratory test result.
Mapping (10/2018-11-2019); Clinician verification (survey) (10/2019-12/2018); Biocurator verification (01/2019-03/2019); Mapping finalized (10/13/2019). Results update (05/08/20)
We completed the mapping of 902 unique measurements and 2,706 unique measurement results.
Mapping Type | Count |
---|---|
Manually Mapped | 2616 |
UnMapped | 90 |
Results Update: There were 1,606 unique tests and 4,358 tests results that could be mapped.
Manually Mapped | Manually Mapped - Constructor | UnMapped - None | UnMapped - Not Mapped Test Type | UnMapped - Unspecified Sample | N/A |
---|---|---|---|---|---|
HPO | 1380 | 4 | 54 | 93 | 74 |
UBERON | 946 | 411 | 54 | 93 | 74 |
CL | 157 | 2 | 54 | 93 | 74 |
CHEBI | 673 | 8 | 54 | 93 | 74 |
NCBITaxon | 279 | 1 | 54 | 93 | 74 |
PRO | 180 | 2 | 54 | 93 | 74 |