This repository contains code used to transform the National Library of Medicine’s Semantic Representation (SemRep) predications into open semantically-linked annotations. Please see the Wiki for more details.
To obtain an RDFized version of SemRep, download the zip file or fork the project repository. Additional instructions can be found under Installation.
This program was written on a system running OS X Sierra. Successful execution of this program requires Python version 2.7.
- Python 2.7.13 modules
- Native Modules: base64, hashlib, multiprocessing, MySQLdb, os, sys
- To download needed modules run the following from the working directory of the project folder:
pip install -r requirements.txt
-
Semantic MEDLINE Database
- If using SemRepRDF-UMLS, you will need to obtain a free UMLS license
- Download and configure the latest SemMedDB MySQL data dump SemMedDB
-
Although not a requirement, the program has been written to run in parallel on a super computer.
The program can be run from the command line via argparse arguments.
# from project directory - find help menu
tiffanycallahan$ python RDFizer.py -h
# to run the program
tiffanycallahan$ python RDFizer.py
We use SemVer for versioning.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This repository generates two different kinds of output that are subject to two different kind of licensing (see details regarding these representations on the Wiki):
- The SemRepRDF-UMLS representation uses concepts that are part of the UMLS Metathesaurus and are thus subject to the UMLS license agreement.
- The SemRepRDF-LOD version has been generated in such a way that all annotated concept identifiers come only from terminologies, vocabularies, and ontologies with open license agreements. Minor modifications to the original SemRep predications include the mapping of UMLS CUIs to open resource concept identifiers.
Table of UMLS Metathesaurus vocabularies and NLM tools used when generating SemRepRDF-LOD. For details regarding how each of these sources will be used when generating SemRepRDF-LOD, see Wiki page on Resource Mapping.
Source | Version | License | Terms of Use | |
RESOUCES | ||||
Anatomical Therapeutic Chemical (ATC) | 2017AB | Non-Commercial | Use of all or parts of the material requires reference to the WHO Collaborating Centre for Drug Statistics Methodology. Copying and distribution for commercial purposes is not allowed. Additional Information. Changing or manipulating the material is not allowed. | |
DrugBank | 2017AB | CC BY-NC 4.0 | DrugBank is offered to the public as a freely available resource. Use and re-distribution of the data, in whole or in part, for commercial purposes (including internal use) requires a license. Additional Information | |
Gene Ontology (GO) | 2017AB | CC-BY 4.0 | The GOC wishes the users and consumers of GO data publicly display the date(s) and/or version number(s) of the relevant GO files, data, or software version. The GO is evolving and changes regularly--this information is critical to downstream consumers and users. Additional Information | |
HUGO Gene Nomenclature Committee (HGNC) | 2017AB | It is a condition of our funding from NIH and the Wellcome Trust that the nomenclature and information we provide is freely available to all. | Anyone may use the HGNC data, but we request that they reference the "HUGO Gene Nomenclature Committee at the European Bioinformatics Institute" and the website where possible. Additional Information | |
Human Phenotype Ontology (HPO) | 2017AB | The HPO vocabularies, annotation files, tools and documentation are freely available. | The HPO is copyrighted to protect the integrity of the vocabularies, which means that changes to the HPO vocabularies need to be done by HPO developers. However, anyone can download the HPO and use the ontologies or other HPO files under three conditions:
|
|
The International Classification of Diseases, Ninth Revision, Clinical Modification (ICD9-CM) | 2017AB | Not specified in readme when downloaded | Availible for download via ftp through the CDC. Additional Information | |
The International Classification of Diseases, Tenth Revision (ICD10) | 2017AB | WHO is able to issue internal licences to organizations wishing to incorporate WHO classifications into their internal information systems for use by employees for use for administrative purposes eg. health records management". | For more information regarding guidelines for using resource, see link. Additional Information | |
The International Classification of Diseases, Tenth Revision, Procedure Coding System (ICD10-PCS) | 2017AB | WHO is able to issue internal licences to organizations wishing to incorporate WHO classifications into their internal information systems for use by employees for use for administrative purposes eg. health records management. | For more information regarding guidelines for using resource, see link. Additional Information | |
The Logical Observation Identifiers Names and Codes terminology (LOINC) | 2017AB | Licensed and Copyrighted, free to use | The Terms of Use are very detailed, see link. Additional Information | |
NCBI Taxonomy | 2017AB | Databases of molecular data on the NCBI Web site include such examples as nucleotide sequences (GenBank), protein sequences, macromolecular structures, molecular variation, gene expression, and mapping data. They are designed to provide and encourage access within the scientific community to sources of current and comprehensive information. Therefore, NCBI itself places no restrictions on the use or distribution of the data contained therein. Nor do we accept data when the submitter has requested restrictions on reuse or redistribution. | For more information regarding guidelines for using resource, see link. Additional Information | |
National Drug File - Reference Terminology | 2017AB | UMLS Category 0 license; but vocabulary can be downloaded from NCI without license or registration. | For more information regarding guidelines for using resource, see link. Additional Information | |
Foundational Model of Anatomy (FMA) | 2017AB | Licensed through the University of Washington, which states "The Foundational Model of Anatomy ontology (FMA) is OPEN SOURCE and available for general use". | For more information regarding guidelines for using resource, see link. Additional Information | |
The Healthcare Common Procedure Coding System (HCPCS) | 2017AB | Subject to same licensing as Current Procedural Terminology (CPT) codes. The AMA licenses thousands of organizations to use CPT data in a broad array of applications. The AMA’s licensing model for CPT is based on individual users. In each of these cases, organizations that utilize CPT in one of these systems are required to obtain a license for each system and for each Individual user—regardless of the number of codes they use. | For more information regarding guidelines for using resource, see link. Additional Information | |
Online Mendelian Inheritance in Man (OMIM) | 2017AB | License prevents redistribution. | For more information regarding guidelines for using resource, see link. Additional Information | |
TOOLS | ||||
Semantic Knowledge Representation | 2017AB; v 1.7 |
SKR resources are available to all applicants at no charge, both within and outside the United States. | Redistributions of SKR resources in source or binary form must include the following list of conditions in the documentation and other materials provided with the distribution.
|
|
Semantic Network | 2017AB; v 54 |
The following Terms and Conditions apply for use of the UMLS Semantic Network. Using the UMLS Semantic Network indicates your acceptance of the following Terms and Conditions. These Terms and Conditions apply to all the UMLS Semantic Network files, independent of format and method of acquisition. | For more information regarding guidelines for using resource, see link. Additional Information |
-
Project completed as part of the 4th annual Biomedical Linked Annotation Hackathon (BLAH) held in Kashiwa, Japan.
-
README was generated from a modified markdown template originally created by Billie Thompson PurpleBooth.