SesameStream is an in-memory continuous SPARQL query engine built with the Sesame RDF framework. It implements an open-world subset (see below) of the SPARQL query language and uses an incremental technique based on the symmetric hash join, responding to streaming RDF statements as soon as possible with SPARQL query answers and discarding irrelevant statements. SesameStream uses time-to-live to process infinite streams of data: queries time out and are removed unless renewed, while partial solutions to queries exist in the query engine only as long as their shortest-lived statement, making room for fresh data as they expire. SesameStream integrates with LinkedDataSail, following links in response to join operations.
Below is a usage example in Java. See the source code for the full example.
// A query for things written by Douglas Adams which are referenced with a pointing gesture
String query = "PREFIX activity: <http://fortytwo.net/2015/extendo/activity#>\n" +
"PREFIX dbo: <http://dbpedia.org/ontology/>\n" +
"PREFIX dbr: <http://dbpedia.org/resource/>\n" +
"PREFIX foaf: <http://xmlns.com/foaf/0.1/>\n" +
"SELECT ?actor ?indicated WHERE {\n" +
"?a activity:thingIndicated ?indicated .\n" +
"?a activity:actor ?actor .\n" +
"?indicated dbo:author dbr:Douglas_Adams .\n" +
"}";
// An RDF graph representing an event. Normally, this would come from a dynamic data source.
// The example is from the Typeatron keyer (see http://github.com/joshsh/extendo)
String eventData = "@prefix activity: <http://fortytwo.net/2015/extendo/activity#> .\n" +
"@prefix dbr: <http://dbpedia.org/resource/> .\n" +
"@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n" +
"@prefix tl: <http://purl.org/NET/c4dm/timeline.owl#> .\n" +
"@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .\n" +
"\n" +
"<urn:uuid:e6f4c759-712c-448c-96f0-c2ecee2ccb97> a activity:Point ;\n" +
" activity:actor <http://fortytwo.net/josh/things/JdGwZ4n> ;\n" +
" activity:thingIndicated dbr:The_Meaning_of_Liff ;\n" +
" activity:recognitionTime <urn:uuid:a4a2fd8c-ea0d-43bb-bcad-6510f4c9b55a> .\n" +
"\n" +
"<urn:uuid:a4a2fd8c-ea0d-43bb-bcad-6510f4c9b55a> a tl:Instant ;\n" +
" tl:at \"2015-02-13T21:00:12-05:00\"^^xsd:dateTime .";
// Instantiate the query engine.
QueryEngineImpl queryEngine = new QueryEngineImpl();
// Define a time-to-live for the query. It will expire after this many seconds,
// freeing up resources and ceasing to match statements.
int queryTtl = 10 * 60;
// Define a handler for answers to the query.
BindingSetHandler handler = new BindingSetHandler() {
public void handle(final BindingSet answer) {
System.out.println("found an answer to the query: " + answer);
}
};
// Submit the query to the query engine to obtain a subscription.
Subscription sub = queryEngine.addQuery(queryTtl, query, handler);
// create subscriptions for additional queries at any time; queries match in parallel
// Add some data with infinite (= 0) time-to-live.
// Results derived from this data will never expire.
int staticTtl = 0;
// Add some static background knowledge. Alternatively, let SesameStream discover this
// information as Linked Data (see LinkedDataExample.java).
Statement st = new StatementImpl(
new URIImpl("http://dbpedia.org/resource/The_Meaning_of_Liff"),
new URIImpl("http://dbpedia.org/ontology/author"),
new URIImpl("http://dbpedia.org/resource/Douglas_Adams"));
queryEngine.addStatements(staticTtl, st);
// Now define a finite time-to-live of 30 seconds.
// This will be used for the short-lived data of gesture events.
int eventTtl = 30;
RDFFormat format = RDFFormat.TURTLE;
RDFParser parser = Rio.createParser(format);
parser.setRDFHandler(queryEngine.createRDFHandler(eventTtl));
// as new statements are added, computed query answers will be pushed to the BindingSetHandler
parser.parse(new ByteArrayInputStream(eventData.getBytes()), "");
// cancel the query subscription at any time;
// no further answers will be computed/produced for the corresponding query
sub.cancel();
// alternatively, renew the subscription for another 10 minutes
sub.renew(10 * 60);
See also the Linked Data example; here, we replace the above "hard-coded" background semantics with discovered information which the query engine proactively fetches from the Web:
// Create a Linked Data client and metadata store. The Sesame triple store will be used for
// managing caching metadata, while the retrieved Linked Data will be fed into the continuous
// query engine, which will trigger the dereferencing of URIs in response to join operations.
MemoryStore sail = new MemoryStore();
sail.initialize();
LinkedDataCache.DataStore store = new LinkedDataCache.DataStore() {
public RDFSink createInputSink(final SailConnection sc) {
return queryEngine.createRDFSink(staticTtl);
}
};
LinkedDataCache cache = LinkedDataCache.createDefault(sail);
cache.setDataStore(store);
queryEngine.setLinkedDataCache(cache, sail);
For projects which use Maven, SesameStream snapshots and release packages can be imported by adding configuration like the following to the project's POM:
<dependency>
<groupId>edu.rpi.twc.sesamestream</groupId>
<artifactId>sesamestream-impl</artifactId>
<version>1.3-SNAPSHOT</version>
</dependency>
or if you will implement the API (e.g. for an SesameStream proxy),
<dependency>
<groupId>edu.rpi.twc.sesamestream</groupId>
<artifactId>sesamestream-api</artifactId>
<version>1.1-SNAPSHOT</version>
</dependency>
The latest Maven packages can be browsed here. See also:
Send questions or comments to:
SPARQL syntax currently supported by SesameStream includes:
- SELECT queries. SELECT subscriptions in SesameStream produce query answers indefinitely unless cancelled.
- ASK queries. ASK subscriptions produce at most one query answer (indicating a result of true) and then are cancelled automatically, similarly to a SELECT query with a LIMIT of 1.
- CONSTRUCT queries. Each query answer contains "subject", "predicate", and "object" bindings which may be turned into an RDF statement.
- basic graph patterns
- variable projection
- all RDF Term syntax and triple pattern syntax via Sesame
- FILTER constraints, with all SPARQL operator functions supported via Sesame except for EXISTS
- DISTINCT modifier. Use with care if the streaming data source may produce an unlimited number of solutions.
- REDUCED modifier. Similar to DISTINCT, but safe for long streams. Each subscription maintains a solution set which begins to recycle after it reaches a certain size, configurable with
SesameStream.setReducedModifierCapacity()
. - LIMIT clause. Once LIMIT number of answers have been produced, the subscription is cancelled.
- OFFSET clause. Since query answers roughly follow the order in which input statements are received, OFFSET can be practically useful even without ORDER BY (see below)
Syntax explicitly not supported:
- ORDER BY. This is a closed-world operation which requires a finite data set or window; SesameStream queries over a stream of data and an infinite window.
- SPARQL 1.1 aggregates. See above
Syntax not yet supported:
- DESCRIBE query form
- OPTIONAL and UNION patterns, group graph patterns
- RDF Dataset syntax, i.e. the FROM, FROM NAMED, and GRAPH keywords
- SPARQL 1.1's NOT, Property Paths, assignment (BIND / AS / VALUES), subqueries
- SPARQL 1.1 Federated Query syntax