Skip to content

A worflow and provenance-based approach to produce human and machine -oriented data summaries. Reused PROV-O, EDAM, MicroPublication ontologies, Nanopublications and the Bio.Tools bioinforatics registry.

License

Notifications You must be signed in to change notification settings

albangaignard/fresh-toolbox

Repository files navigation

fresh-toolbox Binder

This notebook aims at demonstrating how to leverage workflow provenance (information on data processing chains) with a knowledge graph to produce human and machine -oriented data summaries.** We propose to leverage domain-specific annotation (EDAM ontology) from the bioinformatics tools registry Bio.Tools to automatically annotate workflow processed data in the form of data summaries.

All the process can be reproduced through the Binder online platform.

Contacts

Citation

Alban Gaignard, Hala Skaf-Molli and Khalid Belhajjame Findable and Reusable Workflow DataProducts: A Genomic Workflow Case Study. Accepted at Semantic Web Journal 2020. http://www.semantic-web-journal.net/content/findable-and-reusable-workflow-dataproducts-genomic-workflow-case-study

Approach

alt text Here are the main steps of this demonstration :

  1. Knowledge graph loading (With assume that a provenance is already available)
  2. Machine-oriented provenance mining queries
  3. Human-oriented provenance mining queries

Results

Here is an example of the generated human-oriented data summaries.

...
The file Samples/Sample1/BAM/Sample1.realign.bai results from 
tool gatk2_indel_realigner-IP which Locally align two or more molecular 
sequences.

It was produced in the context of Rare Coding Variants in ANGPTL6 Are 
Associated with Familial Forms of Intracranial Aneurysm
...

alt text

Software dependencies

  • RdfLib for RDF data management and SPARQL querying
  • NetworkX for graph visualization

About

A worflow and provenance-based approach to produce human and machine -oriented data summaries. Reused PROV-O, EDAM, MicroPublication ontologies, Nanopublications and the Bio.Tools bioinforatics registry.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published