-
Notifications
You must be signed in to change notification settings - Fork 1
Explicit Updates
Update syntax for json-rql, which is redundant with wildcard @delete
and @insert
for a property.
E.g. typical incremental update from a starting value:
{
"@delete": { "@id": "counter", "count": 0 },
"@insert": { "@id": "counter", "count": 1 }
}
Using a binding, this can be expressed as an increment, allowing for any starting value:
{
"@delete": { "@id": "counter", "count": "?count" },
"@insert": { "@id": "counter", "count": "?newCount" },
"@where": {
"@graph": { "@id": "counter", "count": "?count" },
"@bind": { "?newCount": { "@plus": ["?count", 1] } }
}
}
Instead of binding in the @where
, we could provide a syntax to do the binding in-line:
{
"@delete": { "@id": "counter", "count": "?count" },
"@insert": { "@id": "counter", "count": { "@value": "?count", "@plus": 1 } }
}
That is, the use of an infix operator in an insert binds an implicit variable with the result of the operation on the matching (deleted) data.
In this example, the @delete ... ?count
could also be implicit, so the update can be further simplified to:
{
"@insert": { "@id": "counter", "count": { "@plus": 1 } }
}
BUT
- 'Anonymous bind is an Update' does not allow a shorthand update to a fixed value
- The use of the
@insert
key is jarring, since an operator may cause a pure delete - Not easy to intercept with a plugin – have to detect implicit updates
INSTEAD – explicit update syntax:
{
"@update": { "@id": "counter", "count": { "@plus": 1 } }
}
- Always deletes the old value (unless intercepted by a plugin)
- Easy to intercept the whole clause with a plugin
- BUT care needs to be taken not to accidentally delete/insert references, when naively using a nested subject
{
"@update": { "@id": "fred", "age": 41 }
}
In the absence of a constraint or an indirected data type (see below) none of these represent a merge – by default, the prior values are known and must match at other clones, else a new value results, as normal. If a previous update has incurred a 'conflict' (a value array), then all values are operated on.
The update notified to the app is therefore the usual concrete @delete
and @insert
(as above).
Are plugins for a new extension point that allows data to be stored separately to the SU-Set and graph, while appearing to have its full content in the app API.
Requirements for counter
- app sees an integer
- optionally receives addition updates as well as delete/insert
- declared to be a counter in schema
- as long as addition is used, it behaves as an integer sum register
- if some other operation is used, the current integer value is set by deriving an addition
- if the value is outright deleted, the counter no longer exists
Requirements for text
- app sees a string
- but may not be able to filter by its current value
- receives explicit splice updates (not delete/insert)
- declared to be a CRDT in schema
- as long as splice is used, it behaves as a text CRDT register
- if some other operation is used, the current text value is set by deriving a splice
'Current' value is
- not in the SU-Set: that would lead to a TID/triple mismatch
- not purely entailed in the graph: an entailment must be derivable from current state
An indirected datatype is an RDF literal datatype, which may have several representations:
-
id
, which is the indirected lexical value in the SU-Set and is visible to graph queries. If the data is mutable, this is a stable identifier of the mutable state. Equality tests in a query will generally only work for immutable data; other json-rql built-in operators don't work for indirected data unless some comparable value is separately entailed. -
data
, which is the live state corresponding to theid
, and appears to be in the graph when retrieved and updated. - Optionally
value
, an interface to substitute, which must be derivable from thedata
. - A stored serialised representation, which must be available if
lex !== data
.
Binary:
-
id
is a hash of the binary (json-rql equality tests work) -
data
is the binary
Counter:
-
id
is a UUID (see below; json-rql operators don't work) -
data
is the integer value
Text:
-
id
is a UUID (see below; json-rql operators don't work) -
data
is the TSeq -
value
is the string
Additionally, occurrences may be entailed to a different datatype using constraints, usually a built-in, for improved querying.
Declaration: the indirected datatype is not "declared" in the API using @type
in a value object, because this is a shallow homologue of an RDF literal, which expects the @value
to be the lexical form of the data. In a value object "lexical" is softened to any JSON (even objects, if the type is @json
). So, value objects cannot represent abstract datatypes.
External to the engine:
- TSeq values look like strings
- There's no
@type
— they're really just strings
Internal to the engine:
- TSeq values are typed literals, where the lexical form is some ID
Operators always operate on literals
Something declares up front that certain string values will be TSeqs internally
- Possibly using SHACL targeting
Shared datatypes (counters & text) are indirected datatypes with the following additional properties:
- The datatype recognises a set of json-rql operators, which may or may not be selected from the SPARQL operators.
- Use of such operators is detected in json-rql Updates and notified to the datatype implementation, which may translate the operation to a custom operation to be tunneled through the m-ld protocol.
- Custom operations must be fusable and revertible (for voiding).
- The entailed value is 'lazy':
- its value is calculated on demand, from a prior calculated value and subsequent operations stored in a data-instance-specific oplog
- updates are notified using a json-rql operator in an
@update
clause - the implementation can decide at any time to collapse the oplog to a 'current' value – it's redundant with the journal
The id
value is a UUID which is a stable unique identifier of the mutable state. This is necessary because a literal having the shared datatype can be deleted:
clone 1 | clone 2 | |
---|---|---|
INS c = 1 | ||
↘︎ c = 1 | ||
++c = 2 | ||
↘︎ ++c = 2 | ||
re-insert with the same identity | DEL c, INS c = 1 | ++c = 3 |
leads to divergence | ↘︎ ++c = 2 | ↘︎ c = 1 |
The UUID identity means that the property having the shared datatype does not automatically act as a register; it's possible for concurrent inserts to generate multiple shared literals. This is awkward for @update
because the syntax has no way of expressing which literal to update, the UUID being internal and not visible to the app; likewise for any attempt to resolve the conflict.
However, since it's not possible to filter a shared literal on its value, all deletes must necessarily use a variable and blow away anything at the subject-property position. Therefore we also have enforce an effective constraint applying register semantics to shared datatype literals, using the same entailment regime as SHACL maxCount
to hide 'lesser' values from the graph.
This constraint has the subtlety that it can allow non-shared data to occupy the object position at the same time as shared data, because to prevent that would require every transaction to check the shared data state of every affected subject-predicate.
expand
If there exists a constraint associated with the count
property, which knows that addition is commutative, it re-writes updates as follows:
on check
:
- Interim insert assertion is marked as evaluated by expression
{ "@plus": [1, 1] }
– this is done by default in the JrqlGraph - If expression is not addition by
@plus
or@minus
, no custom operation is provided - If expression is addition:
- delete and insert assertions are removed
- delete of prior value and insert of resultant value are entailed
- Creates a custom operation like
counter count "+1"
- Custom operations must be fusable and reversible – they are themselves an extension point
for both check
and apply
:
- MAY entail removal of the old value and insertion of the new, so the app sees a normal update, and queries retrieve the new value (this makes sense for a counter).
- OR just notifies the operator syntax, leaving the app to maintain its own state. Queries retrieve the baseline value (not useful for a counter, but see below).
The use of entailment here ensures convergence in case of explicit assertions concurrent with the use of the operator:
- the SU-Set only knows about the 'baseline' value, prior to additions, if any
- explicit assertions change the baseline, which may incur a 'conflict' (a value array) if the prior baseline has been entailed away
- if a
maxCount
has been set for the property in a SHACL constraint, it will collapse the entailed and asserted values in the normal way
This follows the same pattern as the counter. The update syntax uses a json-rql custom function which represents a splice [index: number, deleteCount: number, insert?: string]
:
{
"@update": { "@id": "document", "text": { "@splice": [0, 5, "Hi"] } }
}
Note:
- We will eventually need a way to invoke custom infix functions, but a keyword is fine here.
- Multiple splices (operator applications) cannot be made atomic. This is consistent with the realtime principle, that updates are expected to be small and frequent. We could eventually offer an alternative 'patch' format ala diff-match-patch.
The text CRDT constraint is configured to match the document text
property, and:
- asserts the baseline value (see below), if not already exists
- derives an operation
document text "<tseq op>"
- stores the operation in its oplog, in KVP
- on apply, applies incoming operations in memory and stores them in its oplog
- notifies the app using
@splice
- on snapshot, and periodically as required, collapses the oplog to a current value
When querying CRDT text, the graph may only know the current entailed value, which may be behind some operations, or not present at all.
- For filtering, it's probably fine that the text appears to be an empty string, or not a string at all. We don't have text indexing anyway; the app can do its own thing if it wants to.
- For retrieval, we want to dynamically invoke the oplog collapse behaviour of the constraint. This could be done with a special RDF literal datatype,
mld:proxied
, which is used for the text CRDT baseline value and detected in query results. (Here we would need to ensure that thestate
lock is always available in the results streaming – which it is already for describes and constructs.)
[...pid, index, deleteCount | insertString]