Skip to content

Explicit Updates

George Svarovsky edited this page May 31, 2023 · 7 revisions

update syntax

Update syntax for json-rql, which is redundant with wildcard @delete and @insert for a property.

default insert with infix operator

E.g. typical incremental update from a starting value:

{
  "@delete": { "@id": "counter", "count": 0 },
  "@insert": { "@id": "counter", "count": 1 }
}

Using a binding, this can be expressed as an increment, allowing for any starting value:

{
  "@delete": { "@id": "counter", "count": "?count" },
  "@insert": { "@id": "counter", "count": "?newCount" },
  "@where": {
    "@graph": { "@id": "counter", "count": "?count" },
    "@bind": { "?newCount": { "@plus": ["?count", 1] } }
  }
}

Instead of binding in the @where, we could provide a syntax to do the binding in-line:

{
  "@delete": { "@id": "counter", "count": "?count" },
  "@insert": { "@id": "counter", "count": { "@value": "?count", "@plus": 1 } }
}

That is, the use of an infix operator in an insert binds an implicit variable with the result of the operation on the matching (deleted) data.

In this example, the @delete ... ?count could also be implicit, so the update can be further simplified to:

{
  "@insert": { "@id": "counter", "count": { "@plus": 1 } }
}

BUT

  • 'Anonymous bind is an Update' does not allow a shorthand update to a fixed value
  • The use of the @insert key is jarring, since an operator may cause a pure delete
  • Not easy to intercept with a plugin – have to detect implicit updates

INSTEAD – explicit update syntax:

{
  "@update": { "@id": "counter", "count": { "@plus": 1 } }
}
  • Always deletes the old value (unless intercepted by a plugin)
  • Easy to intercept the whole clause with a plugin
  • BUT care needs to be taken not to accidentally delete/insert references, when naively using a nested subject
{
  "@update": { "@id": "fred", "age": 41 }
}

In the absence of a constraint or an indirected data type (see below) none of these represent a merge – by default, the prior values are known and must match at other clones, else a new value results, as normal. If a previous update has incurred a 'conflict' (a value array), then all values are operated on.

The update notified to the app is therefore the usual concrete @delete and @insert (as above).

indirected datatypes

Are plugins for a new extension point that allows data to be stored separately to the SU-Set and graph, while appearing to have its full content in the app API.

Requirements for counter

  • app sees an integer
    • optionally receives addition updates as well as delete/insert
  • declared to be a counter in schema
  • as long as addition is used, it behaves as an integer sum register
  • if some other operation is used, the current integer value is set by deriving an addition
  • if the value is outright deleted, the counter no longer exists

Requirements for text

  • app sees a string
    • but may not be able to filter by its current value
    • receives explicit splice updates (not delete/insert)
  • declared to be a CRDT in schema
  • as long as splice is used, it behaves as a text CRDT register
  • if some other operation is used, the current text value is set by deriving a splice

'Current' value is

  • not in the SU-Set: that would lead to a TID/triple mismatch
  • not purely entailed in the graph: an entailment must be derivable from current state

An indirected datatype is an RDF literal datatype, which may have several representations:

  1. id, which is the indirected lexical value in the SU-Set and is visible to graph queries. If the data is mutable, this is a stable identifier of the mutable state. Equality tests in a query will generally only work for immutable data; other json-rql built-in operators don't work for indirected data unless some comparable value is separately entailed.
  2. data, which is the live state corresponding to the id, and appears to be in the graph when retrieved and updated.
  3. Optionally value, an interface to substitute, which must be derivable from the data.
  4. A stored serialised representation, which must be available if lex !== data.

Binary:

  • id is a hash of the binary (json-rql equality tests work)
  • data is the binary

Counter:

  • id is a UUID (see below; json-rql operators don't work)
  • data is the integer value

Text:

  • id is a UUID (see below; json-rql operators don't work)
  • data is the TSeq
  • value is the string

Additionally, occurrences may be entailed to a different datatype using constraints, usually a built-in, for improved querying.

Declaration: the indirected datatype is not "declared" in the API using @type in a value object, because this is a shallow homologue of an RDF literal, which expects the @value to be the lexical form of the data. In a value object "lexical" is softened to any JSON (even objects, if the type is @json). So, value objects cannot represent abstract datatypes.

External to the engine:

  • TSeq values look like strings
  • There's no @type — they're really just strings

Internal to the engine:

  • TSeq values are typed literals, where the lexical form is some ID

Operators always operate on literals

Something declares up front that certain string values will be TSeqs internally

  • Possibly using SHACL targeting

shared datatypes

Shared datatypes (counters & text) are indirected datatypes with the following additional properties:

  1. The datatype recognises a set of json-rql operators, which may or may not be selected from the SPARQL operators.
  2. Use of such operators is detected in json-rql Updates and notified to the datatype implementation, which may translate the operation to a custom operation to be tunneled through the m-ld protocol.
  3. Custom operations must be fusable and revertible (for voiding).
  4. The entailed value is 'lazy':
    • its value is calculated on demand, from a prior calculated value and subsequent operations stored in a data-instance-specific oplog
    • updates are notified using a json-rql operator in an @update clause
    • the implementation can decide at any time to collapse the oplog to a 'current' value – it's redundant with the journal

The id value is a UUID which is a stable unique identifier of the mutable state. This is necessary because a literal having the shared datatype can be deleted:

clone 1 clone 2
INS c = 1
↘︎ c = 1
++c = 2
↘︎ ++c = 2
re-insert with the same identity DEL c, INS c = 1 ++c = 3
leads to divergence ↘︎ ++c = 2 ↘︎ c = 1 ⚠️

The UUID identity means that the property having the shared datatype does not automatically act as a register; it's possible for concurrent inserts to generate multiple shared literals. This is awkward for @update because the syntax has no way of expressing which literal to update, the UUID being internal and not visible to the app; likewise for any attempt to resolve the conflict.

However, since it's not possible to filter a shared literal on its value, all deletes must necessarily use a variable and blow away anything at the subject-property position. Therefore we also have enforce an effective constraint applying register semantics to shared datatype literals, using the same entailment regime as SHACL maxCount to hide 'lesser' values from the graph.

This constraint has the subtlety that it can allow non-shared data to occupy the object position at the same time as shared data, because to prevent that would require every transaction to check the shared data state of every affected subject-predicate.

archive

expand

constrained insert with operator

If there exists a constraint associated with the count property, which knows that addition is commutative, it re-writes updates as follows:

on check:

  • Interim insert assertion is marked as evaluated by expression { "@plus": [1, 1] } – this is done by default in the JrqlGraph
  • If expression is not addition by @plus or @minus, no custom operation is provided
  • If expression is addition:
    • delete and insert assertions are removed
    • delete of prior value and insert of resultant value are entailed
    • Creates a custom operation like counter count "+1"
    • Custom operations must be fusable and reversible – they are themselves an extension point

for both check and apply:

  • MAY entail removal of the old value and insertion of the new, so the app sees a normal update, and queries retrieve the new value (this makes sense for a counter).
  • OR just notifies the operator syntax, leaving the app to maintain its own state. Queries retrieve the baseline value (not useful for a counter, but see below).

The use of entailment here ensures convergence in case of explicit assertions concurrent with the use of the operator:

  • the SU-Set only knows about the 'baseline' value, prior to additions, if any
  • explicit assertions change the baseline, which may incur a 'conflict' (a value array) if the prior baseline has been entailed away
  • if a maxCount has been set for the property in a SHACL constraint, it will collapse the entailed and asserted values in the normal way

text CRDT

This follows the same pattern as the counter. The update syntax uses a json-rql custom function which represents a splice [index: number, deleteCount: number, insert?: string]:

{
  "@update": { "@id": "document", "text": { "@splice": [0, 5, "Hi"] } }
}

Note:

  • We will eventually need a way to invoke custom infix functions, but a keyword is fine here.
  • Multiple splices (operator applications) cannot be made atomic. This is consistent with the realtime principle, that updates are expected to be small and frequent. We could eventually offer an alternative 'patch' format ala diff-match-patch.

The text CRDT constraint is configured to match the document text property, and:

  • asserts the baseline value (see below), if not already exists
  • derives an operation document text "<tseq op>"
  • stores the operation in its oplog, in KVP
  • on apply, applies incoming operations in memory and stores them in its oplog
  • notifies the app using @splice
  • on snapshot, and periodically as required, collapses the oplog to a current value

lazy entailment

When querying CRDT text, the graph may only know the current entailed value, which may be behind some operations, or not present at all.

  • For filtering, it's probably fine that the text appears to be an empty string, or not a string at all. We don't have text indexing anyway; the app can do its own thing if it wants to.
  • For retrieval, we want to dynamically invoke the oplog collapse behaviour of the constraint. This could be done with a special RDF literal datatype, mld:proxied, which is used for the text CRDT baseline value and detected in query results. (Here we would need to ensure that the state lock is always available in the results streaming – which it is already for describes and constructs.)

[...pid, index, deleteCount | insertString]

see also