Table of Contents
Note: All text snippets in the introduction are taken from Wikipedia.
The Semantic Web is an extension of the Web through standards by the World Wide Web Consortium (W3C). The standards promote common data formats and exchange protocols on the Web, most fundamentally the Resource Description Framework (RDF).
According to the W3C, "The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries". The term was coined by Tim Berners-Lee for a web of data that can be processed by machines.
The RDF data model is similar to classical conceptual modeling approaches such as entity–relationship or class diagrams, as it is based upon the idea of making statements about resources (in particular web resources) in the form of subject–predicate–object expressions. These expressions are known as triples in RDF terminology. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion "The sky has the color blue" in RDF is as the triple: a subject denoting "the sky", a predicate denoting "has", and an object denoting "the color blue". Therefore, RDF swaps object for subject that would be used in the classical notation of an entity–attribute–value model within object-oriented design; Entity (sky), attribute (color) and value (blue). RDF is an abstract model with several serialization formats (i.e., file formats), and so the particular way in which a resource or triple is encoded varies from format to format.
Apache Jena is an open source Semantic Web framework for Java. It provides an API to extract data from and write to RDF graphs. The graphs are represented as an abstract "model". A model can be sourced with data from files, databases, URLs or a combination of these. A Model can also be queried through SPARQL 1.1.
SPARQL [..] is an RDF query language, that is, a semantic query language for databases, able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web.
This project intends to provide samples demonstrating how to connect various tools to the linked production data cloud (lpdc). The tools in question are all used for film making. The aim is to extract as much meta data from the tools and store it in the lpdc. Next, this meta data is to be exported from the lpdc back to the tools, selecting only the data required by each.
The lpdc stores all data using a custom ontology which is still subject to changes and a triple store. It supports SPARQL queries.
The tools which have been integrated into the project thus far are as follows. We are hoping there will be more as the project progresses.
If you want to use your own conversion tools we provided a web service for the tailr versioning and
triplestore upload. Please see service/README.md
for further details.
For version control we use tailr. Further informations can be found at tailr.
This might be the easiest way to try the lpdc tools.
We provide a prepackaged version here.
You only need a current Java version (1.7) to run the jar file.
The structure is as following:
config/ - contains DwerftConfig.properties
examples/ - some examples to try the tool
ontology/ - provides the ontology file
mapping/ provides different mappings
To use it, run:
java -jar dwerft-tools.jar <your arguments>
For the available arguments see further down this readme.
See the wiki page how to setup the dwerft tools on your own.
For all possible arguments see the cli wiki page.
If you have never worked with Apache Jena before, a good start would be to take a look at the SparqlExample
found in the package examples
. It is a demonstration of how to issue requests to a known end point.
The package tools
contains sample code showing ways of how to transform various data formats like XML into valid RDF and vice versa. As this involves a few steps, each package is dedicated to one task. As with all conversions, one of the main problems is creating an exhaustive mapping. Conveniently, this has already been done for you. For now, mappings exist for Dramaqueen and PreProducer.
tools.general
: The file DwerftTools
contains a main method, required for running the DWerft Suite with arguments.
The sub commands can be found within tools.general.commands
. Every sub command implements a runnable and is called
directly from the airlift cli processor.
tools.rmllib
: This package provides access to the rml processor. Although there is a bunch of preprocessors
for the various input tools, see tools.rmllib.preprocessor
.
The DWERFT tools utilize the rml processor which takes care of converting numerous structured formats into
rdf. Some examples can be found in the resource/rml
folder and in the mappings
folder. All rml mappings
ends with .rml.ttl
. For more detailed information see RML examples.
At the state of writing we support Preproducer and Dramaqueen to rdf as well as CSV, ALE, XML, JSON to RDF.
Some current limitations are:
- we set the input files in the mapping -> no intermixing of different file formats
- no axis within XPath and the mappings are always relativ to the iterator path
Note: While the framework does support numerous conversions, unfortunately not all are invokable via the CLI as mentioned above. The former are marked with an exlamation mark (!) in the workflow described below.
For the following example, let's say our aim is to store a script that has been written using Dramaqueen in the LPDC, edit the script using Preproducer, and finally view the project in LockitNetwork. All interfaces work by storing generated RDF in the lpdc and generating tool specific XML.
-
(!) Dramaqueen to RDF: So firstly, the challenge is to convert a Dramaqueen script to valid RDF. In order to achieve this, we simply feed our source, destination and mappings file to the
DramaqueenToRdf
converter in the packageimporter.dramaqueen
. It will traverse the whole script and convert all attributes into RDF, making use of the mapping along the way. Make sure to take a look at the resulting file, it will give you an understanding of what RDF looks like. -
RDF to triple store: Now that we have proper RDF, we still need to get this into the triple store mentioned earlier on. All classes responsible for uploading content to various APIs can be found in the package
sources
and implement the same interface. This interface consists of merely two methods,get
andsend
. In our case, we'd send our RDF file to the triple store by utilizing thesend
method in the applicable class. -
Triple store to PreProducer Tool: Exporting RDF back to tool-specific formats basically boils down to issuing the right queries and generating valid XML. The abstract class
RdfExporter
found in the packageexporter
provides some handy methods for extracting information from the triple store. It is vital however, that the correct ontology file is provided and the queries are free of errors. ThePreproducerExporter
extends the aformentioned abstract class and builds XML in a recursive manner. Again, take a look at the generated XML. Now, the XML file needs to be sent to the PreProducer API. As we learned before, the packagesources
contains all classes required for this kind of task. Since we can't publish any credentials you'll need your own. Simply add these to theDwerftConfig.properties
file hidden in theresources
folder and pass said file to the constructor of thePreproducerSource
. Calling the methodsend
will now upload the previously generated XML to Preproducer. You can now log on to your PreProducer account and review your upload. The data you sent will require confirmation. Your script is now viewable in the PreProducer frontend. -
(!) PreProducer API to triple store: Assuming extensive project editing has taken place using PreProducer, it is now time to update the lpdc with the latest data. As mentioned before, this requires valid credentials. Firstly, the data from PreProducer has to be converted to RDF. The class
PreProducerToRdf
resembles an extension to the basicAbstractXMLToRDFConverter
. In order for all linking operations to work, the PreProducer API methods must be queried in a special order, which is conveniently stored in thePreProducerToRdf
class and accessible by a getter method. Once the RDF file has been generated, it can now be uploaded to the triple store. This step is not available for use with the CLI. However, there is aTripleStoreSource
class within thesources
package, which provides asend
method for uploading the previously generated RDF file to the lpdc. -
Triple store to LockitNetwork: This is probably the easiest of all steps. Log in to your LockitNetwork account, click the import button, and finally the
DWERFT
tab. Now, simply copy the project`s URI into the given field and enjoy editing the project data from within LockitNetwork.