BioInterchange is a tool for generating interchangable RDF data from non-RDF data sources.
Supported input file formats (see examples directory):
Supported RDF output formats:
Ontologies used in the RDF output:
- Generic Feature Format Version 3 Ontology (GFF3O)
- Genome Variation Format Version 1 Ontology (GVF1O)
- Semanticscience Integrated Ontology (SIO)
- Sequence Ontology Feature Annotation (SOFA)
If you like to contribute, you are more than welcome. For implementation ideas, please have a look here.
Four interfaces to BioInterchange are available:
- command-line tool-suite
- API (Ruby gem, Python egg)
- RESTful web-service
- interactive web-site
BioInterchange's command-line tool biointerchange can be installed as a command line tools as follows:
gem install biointerchange
Examples:
biointerchange --input dbcls.catanns.json --rdf rdf.bh12.sio --file examples/pubannotation.10096561.json --name 'Peter Smith' --name_id 'peter.smith@example.com'
biointerchange --input uk.ac.man.pdfx --rdf rdf.bh12.sio --file examples/gb-2007-8-3-R40.xml --name 'Peter Smith' --name_id 'peter.smith@example.com'
Input formats:
biointerchange.gff3biointerchange.gvfdbcls.catanns.jsonuk.ac.man.pdfx
Output formats:
rdf.biointerchange.gff3rdf.biointerchange.gvfrdf.bh12.sio
RDF data produced by BioInterchange can be directly loaded into a triple store. The following gives an example about loading and querying RDF data using Sesame; the commands are entered via Sesame's bin/console.sh:
> create memory.
Please specify values for the following variables:
Repository ID [memory]: testrepo
Repository title [Memory store]: Test Repository
Persist (true|false) [true]: false
Sync delay [0]:
Repository created
> open testrepo.
testrepo> load <path-to-your-rdf-data> .
testrepo> sparql select * where { ?s ?p ?o } .
To list all seqid entries from a GFF3/GVF-file conversion in the store, the following SPARQL query can be used:
testrepo> sparql select * where { ?s <http://www.biointerchange.org/gvf1o#GVF1_0004> ?o } .
Data consistency is verifyable for the output formats rdf.biointerchange.gff3 and rdf.biointerchange.gvf using the BioInterchange ontologies GFF3O and GVF1O. The following is an example of how Jena's command line tools and the HermiT reasoner can be used for conistency verification:
rdfcat <path-to-gff3o/gvf1o> <yourdata.n3> > merged.xml
java -d64 -Xmx4G -jar HermiT.jar -k -v merged.xml
Another approach is to load the data and its related GFF3O/GVF1O ontology into Protege, merge them, and then use the "Explain inconsistent ontology" menu item to inspect possible data inconsistencies.
The following list provides information on the origin of the example-data files in the examples directory:
BovineGenomeChrX.gff3.gz: Gzipped GFF3 file describing a Bos taurus chromosome X. Downloaded from http://bovinegenome.org/?q=download_chromosome_gff3chromosome_BF.gff: GFF3 file of floating contigs from the Baylor Sequencing Centre. Downloaded from http://dictybase.org/Downloadsestd176_Banerjee_et_al_2011.2012-11-29.NCBI36.gvf: GVF file of EBI's DGVa. Downloaded from ftp://ftp.ebi.ac.uk/pub/databases/dgva/estd176_Banerjee_et_al_2011/gvf/estd176_Banerjee_et_al_2011.2012-11-29.NCBI36.gvfgb-2007-8-3-R40.xml: Generated by pdfx from open-access source PDF Sense-antisense pairs in mammals: functional and evolutionary considerations
The Ruby gem is under active development, so the following may or may not work out of the box.
gem install biointerchange
To use BioInterchange in your Ruby projects, include the following line in your code:
require 'biointerchange'
Currently, there are only wrappers to the vocabularies of the ontologies that are used by BioInterchange available.
To install the BioInterchange egg, run:
sudo easy_install rdflib
sudo easy_install http://www.biointerchange.org/eggs/biointerchange-0.1.2-py2.7.egg
Usage examples:
import biointerchange
from biointerchange import *
# Get the URI of an ontology term by label:
GFF3O.seqid()
# Ambiguous labels will return an array of URIs:
# "start" can refer to a sub-property of "feature_properties" or "target_properties"
GFF3O.start()
# "feature_properties" can be either a datatype or object property
GFF3O.feature_properties()
# Use build-in method "is_datatype_property" to resolve ambiguity:
# (Note: there is exactly one item in the result set, so the selection of the first item is acceptable.)
feature_properties = filter(lambda uri: GFF3O.is_datatype_property(uri), GFF3O.feature_properties())[0]
# Use build-in method "with_parent" to pick properties based on their context:
GFF3O.with_parent(GFF3O.start(), feature_properties)
Only vocabulary wrapper classes are provided for the Java API. In order to make use of the RDF generation features in BioInterchange, either use the Ruby implementation or connect Java to BioInterchange's web-services.
To use the BioInterchange artifact, set-up add the following to your Maven POM file:
<repositories>
<repository>
<id>biointerchange</id>
<name>BioInterchange</name>
<url>http://www.biointerchange.org/artifacts</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.biointerchange</groupId>
<artifactId>vocabularies</artifactId>
<version>0.1.2</version>
</dependency>
</dependencies>
Usage examples:
package org.biointerchange;
import com.hp.hpl.jena.rdf.model.*;
import com.hp.hpl.jena.vocabulary.*;
import org.apache.commons.collections.CollectionUtils;
import org.apache.commons.collections.Predicate;
import java.util.Set;
import org.biointerchange.vocabulary.*;
/**
* Demo on how to make use of BioInterchange's vocabulary classes.
*
* @author Joachim Baran
*/
public class App
{
public static void main(String[] args) {
Resource seqid = GFF3O.seqid();
System.out.println("'seqid' property:");
printResource(seqid);
System.out.println("'start' properties:");
Set<Resource> start = GFF3O.start();
for (Resource startSynonym : start)
printResource(startSynonym);
System.out.println("'feature_properties' properties:");
Set<Resource> featureProperties = GFF3O.feature_properties();
for (Resource featurePropertiesSynonym : featureProperties)
printResource(featurePropertiesSynonym);
System.out.println("'feature_properties' properties, which are a datatype property:");
CollectionUtils.filter(featureProperties, new Predicate() {
public boolean evaluate(Object o) {
return GFF3O.isDatatypeProperty((Resource)o);
}
});
for (Resource featurePropertiesSynonym : featureProperties)
printResource(featurePropertiesSynonym);
System.out.println("'start' property with parent datatype property 'feature_properties':");
Set<Resource> startUnderDatatypeFeatureProperties = GFF3O.withParent(start, featureProperties.iterator().next());
for (Resource startSynonym : startUnderDatatypeFeatureProperties)
printResource(startSynonym);
}
private static void printResource(Resource resource) {
System.out.println(" " + resource.toString());
System.out.println(" Namespace: " + resource.getNameSpace());
System.out.println(" Local name: " + resource.getLocalName());
System.out.println(" Jena Property (rather than Resource): " + (resource instanceof Property));
System.out.println(" Ontology class: " + GFF3O.isClass(resource));
System.out.println(" Ontology object property: " + GFF3O.isObjectProperty(resource));
System.out.println(" Ontology datatype property: " + GFF3O.isDatatypeProperty(resource));
}
}
TODO
TODO
This section is only relevant if you are building newer versions of BioInterchange yourself. If you are using the Ruby gem, web-service or interactive web-site, then you can safely skip the steps explained here.
Note that the following set-up only works with Ruby 1.9.2p290 or newer.
Software requirements:
- Ruby 1.9.2p290 or newer
- Bundler gem 1.1.5 or newer
- Rake gem 0.8.7 or newer
With Ruby installed, the following commands install the additional packages:
sudo gem install bundler
sudo gem install rake
bundle
The last step, bundle, will install gem dependencies of BioInterchange automatically.
Building a new version of the Ruby vocabulary classes for GFF3, SIO, SOFA (requires that the OBO files are saves as RDF/XML using Protege):
sudo gem install rdf
sudo gem install rdf-rdfxml
echo -e "require 'rdf'\nmodule BioInterchange\n" > lib/biointerchange/gff3o.rb
ruby generators/rdfxml.rb <path-to-rdf/xml-version-of-gff3o> GFF3O >> lib/biointerchange/gff3o.rb
echo -e "\nend" >> lib/biointerchange/gff3o.rb
echo -e "module BioInterchange\n" > lib/biointerchange/gvf1o.rb
ruby generators/rdfxml.rb <path-to-rdf/xml-version-of-gvf1o> GVF1O >> lib/biointerchange/gvf1o.rb
echo -e "\nend" >> lib/biointerchange/gvf1o.rb
echo -e "module BioInterchange\n" > lib/biointerchange/sio.rb
ruby generators/rdfxml.rb <path-to-rdf/xml-version-of-sio> SIO >> lib/biointerchange/sio.rb
echo -e "\nend" >> lib/biointerchange/sio.rb
echo -e "module BioInterchange\n" > lib/biointerchange/sofa.rb
ruby generators/rdfxml.rb <path-to-rdf/xml-version-of-sofa> SOFA >> lib/biointerchange/sofa.rb
echo -e "\nend" >> lib/biointerchange/sofa.rb
A Geno Ontology external reference (GOxref) vocabulary can be created by directly downloading the latest version of GO.xrf_abbs:
echo -e "module BioInterchange\n" > lib/biointerchange/goxref.rb
curl ftp://ftp.geneontology.org/pub/go/doc/GO.xrf_abbs | ruby generators/GOxrefify.rb
echo -e "\nend" >> lib/biointerchange/goxref.rb
The source-code generation can be skipped, if none of the ontologies that are used by BioInterchange have been changed. Otherwise, the existing Python vocabulary class wrappers can be generated as follows:
ruby generators/rdfxml.rb <path-to-rdf/xml-version-of-gff3o> GFF3O | ruby generators/pythonify.rb > supplemental/python/biointerchange/gff3o.py
ruby generators/rdfxml.rb <path-to-rdf/xml-version-of-gvf1o> GVF1O | ruby generators/pythonify.rb > supplemental/python/biointerchange/gvf1o.py
ruby generators/rdfxml.rb <path-to-rdf/xml-version-of-sio> SIO | ruby generators/pythonify.rb > supplemental/python/biointerchange/sio.py
ruby generators/rdfxml.rb <path-to-rdf/xml-version-of-sofa> SOFA | ruby generators/pythonify.rb > supplemental/python/biointerchange/sofa.py
curl ftp://ftp.geneontology.org/pub/go/doc/GO.xrf_abbs | ruby generators/GOxrefify.rb | ruby generators/pythonify.rb > supplemental/python/biointerchange/goxref.py
Generate the BioInterchange Python vocabulary egg:
cd supplemental/python
python setup.py bdist_egg
The vocabulary wrapper makes used of RDFLib, which does not install automatically with the egg.
- (RDFLib)[https://github.com/RDFLib/rdflib]
The source-code generation can be skipped, if none of the ontologies that are used by BioInterchange have been changed. Otherwise, the existing Java vocabulary class wrappers can be generated as follows:
ruby generators/rdfxml.rb <path-to-rdf/xml-version-of-gff3o> GFF3O | ruby generators/javaify.rb > supplemental/java/biointerchange/src/main/java/org/biointerchange/vocabulary/GFF3O.java
ruby generators/rdfxml.rb <path-to-rdf/xml-version-of-gvf1o> GVF1O | ruby generators/javaify.rb > supplemental/java/biointerchange/src/main/java/org/biointerchange/vocabulary/GVF1O.java
ruby generators/rdfxml.rb <path-to-rdf/xml-version-of-sio> SIO | ruby generators/javaify.rb > supplemental/java/biointerchange/src/main/java/org/biointerchange/vocabulary/SIO.java
ruby generators/rdfxml.rb <path-to-rdf/xml-version-of-sofa> SOFA | ruby generators/javaify.rb "http://purl.obolibrary.org/obo/" > supplemental/java/biointerchange/src/main/java/org/biointerchange/vocabulary/SOFA.java
curl ftp://ftp.geneontology.org/pub/go/doc/GO.xrf_abbs | ruby generators/GOxrefify.rb | ruby generators/javaify.rb > supplemental/java/biointerchange/src/main/java/org/biointerchange/vocabulary/GOXRef.java
Generate the BioInterchange Java vocabulary artifact:
cd supplemental/java/biointerchange
mvn package
The following Java packages will automatically install alongside BioInterchange's Maven artifact:
- (Jena Core)[http://mvnrepository.com/artifact/org.apache.jena/jena-core]
- (Apache Commons Collections)[http://mvnrepository.com/artifact/org.apache.directory.studio/org.apache.commons.collections]
- (SLF4J)[http://mvnrepository.com/artifact/org.slf4j/slf4j-api]
- (Xerces)[http://mvnrepository.com/artifact/xerces/xerces]
- (JUnit)[http://mvnrepository.com/artifact/junit/junit]
bundle exec rake gemspec
bundle exec gem build biointerchange.gemspec
sudo bundle exec gem install biointerchange
If you encounter problems with gem dependencies, then you can try to explictly use Ruby 1.9:
bundle exec gem1.9 build biointerchange.gemspec
sudo bundle exec gem1.9 install biointerchange
BioInterchange uses unit testing using RSpec, where the unit tests are located in the spec directory.
Using bundler, a quick check can be carried out using:
bundle update
bundle exec rake spec
A more verbose is produced by calling rspec directly:
rspec -c -f d
bundle exec rake rdoc
Note: Only BioInterchange package maintainers can deploy the 'biointerchange' gem on Rubygems.
bundle exec rake version:bump:(major | minor | patch)
bundle exec rake gemspec
bundle exec gem build biointerchange.gemspec
bundle exec gem push biointerchange-VERSION.gem
On Mac OS X, you might see this error:
make: /usr/bin/gcc-4.2: No such file or directory
make: *** [generator.o] Error 1
This can be solved by executing:
sudo ln -s /usr/bin/llvm-gcc-4.2 /usr/bin/gcc-4.2
In alphabetical order of the last name:
If you use this software, please cite
- BioInterchange: An Open Source Framework for Transforming Heterogeneous Data Formats Into RDF (in preparation)
and one of the following Biogem publications
- BioRuby: bioinformatics software for the Ruby programming language
- Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics
This Biogem is published at #biointerchange and hosted on its primary site www.biointerchange.org.
The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
See LICENSE file.
