Now with bitemporal modeling of vertices and edges.
Inspired by the popular graph query language Cypher, which is implemented in Neo4j, I started developing an ANTLR grammar to define property graphs. I added the concept of logical graphs into the language to support multiple, possible overlapping property graphs in one database.
GDL is used for unit testing and graph definition in Gradoop, a framework for distributed temporal graph analytics.
The project contains the grammar and a listener implementation which transforms GDL scripts into property graph model elements (i.e. graphs, vertices and edges).
The data model adapts concepts from the Temporal Property Graph Model (TPGM) and contains three elements: graphs, vertices and edges. Any element has an optional label and can have multiple attributes in the form of key-value pairs. Vertices and edges may be contained in an arbitrary number of graphs including zero graphs. Edges are binary and directed. Each graph element is associated with 2 intervals, transaction- (tx) and valid-time (val) interval, to achieve a bitemporal modeling. Each interval is defined by its start (from) and end (to) timestamp. Thus, vertices and edges have four additional timestamps (tx_from, tx_to, val_from, val_to).
Define a vertex:
()
Define a vertex and assign it to variable alice
:
(alice)
Define a vertex with label User
:
(:User)
Define a vertex with label User
, assign it to variable alice
and give it some properties:
(alice:User {name : "Alice", age : 23})
Property values can also be null:
(alice:User {name : "Alice", age : 23, city : NULL})
Numeric property values can have specific data types:
(alice:User {name : "Alice", age : 23L, height : 1.82f, weight : 42.7d})
Property values can also be ordered lists:
(alice:User {name : "Alice", age : 23, codes: ["Java", "Rust", "Scala"]})
Define an outgoing edge:
(alice)-->()
Define an incoming edge:
(alice)<--()
Define an edge with label knows
, assign it to variable e1
and give it some properties:
(alice)-[e1:knows {since : 2014}]->(bob)
Define multiple outgoing edges from the same source vertex (i.e. alice
):
(alice)-[e1:knows {since : 2014}]->(bob)
(alice)-[e2:knows {since : 2013}]->(eve)
Define paths (four vertices and three edges are created):
()-->()<--()-->()
Define a graph with one vertex (graphs can be empty):
[()]
Define a graph and assign it to variable g
:
g[()]
Define a graph with label Community
:
:Community[()]
Define a graph with label Community
, assign it to variable g
and give it some properties:
g:Community {title : "Graphs", memberCount : 42}[()]
Define mixed path and graph statements (elements in the paths don't belong to a specific graph):
()-->()<--()-->()
[()]
Define a fragmented graph with variable reuse:
g[(a)-->()]
g[(a)-->(b)]
g[(b)-->(c)]
Define three graphs with overlapping vertex sets (e.g. alice
is in g1
and g2
):
g1:Community {title : "Graphs", memberCount : 23}[
(alice:User)
(bob:User)
(eve:User)
]
g2:Community {title : "Databases", memberCount : 42}[
(alice)
]
g2:Community {title : "Hadoop", memberCount : 31}[
(bob)
(eve)
]
Define three graphs with overlapping vertex and edge sets (e
is in g1
and g2
):
g1:Community {title : "Graphs", memberCount : 23}[
(alice:User)-[:knows]->(bob:User),
(bob)-[e:knows]->(eve:User),
(eve)
]
g2:Community {title : "Databases", memberCount : 42}[
(alice)
]
g2:Community {title : "Hadoop", memberCount : 31}[
(bob)-[e]->(eve)
]
As part of his thesis, Max extended the grammar to support MATCH .. WHERE ..
statements analogous to Cypher. Besides defining a graph it is now also possible to formulate a query including
patterns, variable length paths and predicates:
MATCH (alice:Person)-[:knows]->(bob:Person)-[:knows*2..2]->(eve:Person)
WHERE (alice.name = "Alice" AND bob.name = "Bob")
OR (alice.age > bob.age)
OR (alice.age > eve.age)
Note that queries always start with the MATCH
keyword optionally followed by one or more
WHERE
clauses.
Several GDL extensions were added to support TPGM graphs as defined by Rost et al.. Here, valid and transaction intervals are defined for every graph element.
Every graph element (vertex/edge) is associated with 2 intervals, transaction (tx) and valid (val) interval. Each interval is defined by its start (from) and end (to) timestamp.
For a graph element with variable name a
, these four timestamps can be accessed in a property selector-like syntax:
a.tx_from
a.tx_to
a.val_from
a.val_to
Furthermore, the "global" transaction and valid interval are defined as the intersections of all elements' transaction/valid intervals. Their from and to stamps can be accessed by omitting the variable name:
tx_from
tx_to
val_from
val_to
Timestamp literals can be created in the following formats:
Timestamp(YYYY-MM-DDTHH:MM:SS)
whereT
stands for the literalT
Timestamp(YYYY-MM-DD)
(time is set to 00:00:00)Timestamp(Now)
(current timestamp)
Two types of "complex" timestamps can be created from a set of simple ones (selectors and literals):
MIN(t1,t2,...,tn)
MAX(t1,t2,...,tn)
Note that they can not be nested further, only selectors and literals are valid arguments for MIN
and MAX
.
Timestamps can be compared using the usual comparators <
, <=
, =
, !=
, >=
, and >
t1.before(t2)
: equal tot1 < t2
t1.after(t2)
: equal tot1 > t2
For a graph element with variable name a
, the two intervals can be accessed in a property selector-like syntax:
a.tx
a.val
Furthermore, the global transaction and valid interval are defined as the intersections of all elements' transaction/valid intervals. These intervals can be accessed by omitting the variable name:
tx
val
Custom intervals can be created from two timestamps denoting start and end of the interval:
Interval(t1,t2)
wheret1
andt2
are timestamps
When such an interval is created, a constraint t1 <= t2
is implicitly added.
Two intervals can be merged, i.e. intersected. This yields a new interval. The merge operation is only defined, if the two intervals overlap.
i1.merge(i2)
wherei1
andi2
are intervals. Implicitly, constraints to ensure thati1
andi2
overlap are added.
Two intervals can be merged, i.e. united. This yields a new interval. The join operation is only defined, if the two intervals overlap.
i1.join(i2)
wherei1
andi2
are intervals. Implicitly, constraints to ensure thati1
andi2
overlap are added.
Note that merge
and join
operations can not be nested any further, i.e. something like i1.merge(i2).join(a.tx)
is not possible.
Binary relations between intervals can be stated. They are inspired by the interval relations defined in SQL:2011.
i1.overlaps(i2)
i1.contains(i2)
i1.precedes(i2)
i1.succeeds(i2)
i1.immediatelyPrecedes(i2)
i1.immediatelySucceeds(i2)
i1.equals(i2)
These relations are syntactic sugar, as they can all be expressed as terms using only from and to selectors.
Additionally, relations between an interval and one or two timestamps are possible. Here, i
is an interval, t
, t1
and t2
are timestamps:
i.fromTo(t1,t2)
i.between(t1,t2)
t.precedes(i)
t.succeeds(i)
i.contains(t)
asOf
is a special constraint that refers to the transaction interval of a graph element.
a.asOf(t)
is true iffa.tx_from <= t
anda.tx_to >= t
, wherea
refers to a graph element
Durations of intervals can be referred to by simply referring to the interval, e.g. in the context of a duration predicate (see below) a.tx
would denote the length of a
's transaction time.
Duration constants can be created by the following keywords (number
is an integer literal):
Millis(number)
Seconds(number)
Minutes(number)
Hours(number)
Days(number)
Durations as defined above can be compared:
d1.longerThan(d2)
d1.shorterThan(d2)
d1.lengthAtLeast(d2)
d1.lengthAtMost(d2)
New temporal ComparableExpression
s are added.
The processing of all the temporal constraints described above is encapsulated in a GDLLoaderTemporal.java
that is used by the actual GDLLoader
.
Add dependency to your maven project:
<dependency>
<groupId>com.github.s1ck</groupId>
<artifactId>gdl</artifactId>
<version>0.4.0-SNAPSHOT</version>
</dependency>
Create a database from a GDL string:
GDLHandler handler = new GDLHandler.Builder().buildFromString("g[(alice)-[e1:knows {since : 2014}]->(bob)]");
for (Vertex v : handler.getVertices()) {
// do something
}
// access elements by variable
Graph g = handler.getGraphCache().get("g");
Vertex alice = handler.getVertexCache().get("alice");
Edge e = handler.getEdgeCache().get("e1");
Read predicates from a Cypher query:
GDLHandler handler = new GDLHandler.Builder().buildFromString("MATCH (a:Person)-[e:knows]->(b:Person) WHERE a.age > b.age");
// prints (((a.age > b.age AND a.__label__ = Person) AND b.__label__ = Person) AND e.__label__ = knows)
handler.getPredicates().ifPresent(System.out::println);
Create a database from an InputStream
or an input file:
GDLHandler handler1 = new GDLHandler.Builder().buildFromStream(stream);
GDLHandler handler2 = new GDLHandler.Builder().buildFromFile(fileName);
Append data to a given handler:
GDLHandler handler = new GDLHandler.Builder().buildFromString("g[(alice)-[e1:knows {since : 2014}]->(bob)]");
handler.append("g[(alice)-[:knows]->(eve)]");
Licensed under the Apache License, Version 2.0.