Gestalt Link Schema Refactoring - Derived from JOSE replacement #492

CMCDragonkai · 2022-11-06T06:36:26Z

Specification

The major change to the GG is how the vertexes and edges are stored.

Originally the vertexes we stored as collection of records. Each record's key would be the neighbouring vertex and the record's value would be null.

This was put into the matrix level path, and this was suited to sparse adjacency matrices.

Now the GG stores the vertexes as pairs, where both vertexes are lifted to the DB key.

Some vertexes do not have any neighbours, in these cases, the GG stores only the vertex with the value being null.

  /**
   * Gestalt adjacency matrix represented as a collection vertex pairs.
   * Each vertex can be `GestaltNodeKey` or `GestaltIdentityKey`.
   * These are the allowable structures:
   * `GestaltGraph/matrix/{GestaltKey} -> null`
   * `GestaltGraph/matrix/{GestaltKey}/{GestaltKey} -> {raw(GestaltLinkId)}`
   */
  public readonly dbMatrixPath: Readonly<LevelPath> = [this.constructor.name, 'matrix'];

The allowed structures are:

Singleton node
Singleton identity
Node to node
Node to identity
Identity to node

Identity to identity is not allowed.

The previous GG design kept all the link information within the node information and identity information for each vertex. This is now replaced by having each vertex pair store a GestaltLinkId. This is a uniquely generated random ID that points to a GestaltLink.

The GestaltLink represents the edge information between vertexes. They wrap the SignedClaim<ClaimLinkIdentity> and SignedClaim<ClaimLinkNode> with additional required metadata. This is far more efficient than storing the entire chain of claims with each vertex, and this way we are able to provide the claim which "proves" the link between vertexes.

  /**
   * Gestalt links.
   * `GestaltGraph/links/{GestaltLinkId} -> {json(GestaltLink)}`
   */
  public readonly dbLinksPath: Readonly<LevelPath> = [this.constructor.name, 'links'];

From these 2 changes, we derive the full impact on the GestaltGraph and downstream usage. Let's go through each of them.

Internal `GestaltKey` Type

In the old GG, the GestaltKey was a JSON serialisation of the GestaltId. And the GestaltId was a tagged union of structures wrapping NodeId or ProviderId and IdentityId.

The GestaltKey is used because it is necessary to disambiguate between the 2 kinds of vertexes that would be stored in under the same level path matrix. The disambiguation relies on a "prefix". In the old GG, this was done through JSON serialisation of an ADT with type key. This has been changed to using a direct prefix with some buffer concatenation.

3 major changes here:

The DB does not store the encoded versions of the NodeId in its key path. This is important because, if we ever change the encoding algorithm, we would not be able to reference old NodeId. Therefore the DB must never use encoded IDs in its key paths.
Instead of using JSON serialisation everywhere, where possible we have used an tuple encoding instead with a prefix as the first element. This is contrasted to using a type key to form tagged unions for richer structures. I believe this tuple structure of ['node', NodeId] and ['identity', ProviderIdentityId] is more suitable for "linear" data, especially data to be used as keys.
The ProviderIdentityId is a tuple of [ProviderId, IdentityId].

The new GestaltKey is just an opaque buffer now. And we have toGestaltKey and fromGestaltKey utility functions to convert between the GestaltKey and the GestaltId.

So just remember GestaltKey is an "internal" type, only to be used within the GestaltGraph. It is never to be used outside the GestaltGraph, which means all input and output types with respect to the GestaltGraph is always the GestaltId or encoded versions of GestaltId. This is why the GestaltKey utility functions are part of gestalts/utils and not ids/index.

The Gestalt type now uses GestaltIdEncoded rather than GestaltKey, because that is just a prefixed versions of NodeIdEncoded and ProviderIdentityIdEncoded, and these are external types.

Binary Data in the DB

The storing of the links into the DB means dealing with the signature buffers in SignedClaim. The solution to this is the same as we do everywhere else. We have to use the JSON encoding of the buffer, but then create XJSON variants of the types we are storing in the DB. When putting to the DB, we can use X, but when we are getting from the B, we have to use XJSON types. Then use a utility function to convert the JSON buffers back to regular buffers. You can see in fromGestaltLinkJSON. This problem occurs because our DB doesn't support structured types with buffers. There's an issue on js-db to integrate a drop-in replacement to JSON, using something like BSON or CBOR so that we don't have to do this JSON encoding work for more common types in JS.

GestaltGraph's ACL

The way the GG uses the ACL does not change. However in the process of developing the above design, I realised that the ACL schema could be improved by introducing a GraphId (could also known as a GestaltId but this conflicts with the existing type GestaltId representing the union of GestaltNodeId and GestaltIdentityId). This GraphId would be a uniquely randomly generated ID that points to a disjoint subgraph in the GG. Then permissions can be assigned to each GraphId. This eliminates the need for PermId and having to maintain a map of all the NodeId in a gestalt to that common PermId.

This could have downstream impacts too, because now it's possible to have a single identifier that refers to a "gestalt", rather than trying to use "any of the vertexes in a gestalt". Many algorithms such as getGestalts can be made more efficient by using the the GraphId instead. This kind of improvement can be done later.

Traversing a Gestalt

Almost all the methods in the GG make use of traversing a gestalt. This is now centralised under 1 single protected method called: GestaltGraph.getGestaltByKey.

Originally this was done through 2 methods GestaltGraph.getGestaltByKey and GestaltGraph.traverseGestalt. These 2 methods are now longer necessary, both functionalities have become part of the new GestaltGraph.getGestaltByKey.

Take note that it takes a visited set which gets mutated during execution. This visited set can be used outside of the function such as in the case of getGestalts.

Setting and Unsetting Vertexes

Setting a vertex and unsetting vertexes are very very similar to before.

You can see this with the already working GestaltGraph.setNode and GestaltGraph.setIdentity.

When vertexes are first set, they are always set as a singleton gestalt.

This means under the matrix level path, only 1 GestaltKey is used and the value is null.

Note that setNode and setIdentity are capable of updating the node or identity information. But they only set the matrix level path if they haven't already been set.

This means even as vertexes are being linked or unlinked, one can independently update their node information or identity information.

However do note that concurreny conflicts can occur here, but this should be bubbled back up to the user for a retry policy.

When a vertex is unset, their permissions in the ACL must be deleted, and all their links must also be broken. This means you have to iterate over all its neighbours and remove those entries in matrix. But you must also remove themselves from the matrix if they are a singleton gestalt.

GestaltGraph.setNode(
  nodeInfo: GestaltNodeInfo,
  tran?: DBTransaction,
): Promise<['node', NodeId]>
;
GestaltGraph.setIdentity(
  identityInfo: GestaltIdentityInfo,
  tran?: DBTransaction,
): Promise<['identity', ProviderIdentityId]>;

GestaltGraph.unsetNode(
  nodeId: NodeId,
  tran?: DBTransaction,
): Promise<void>;

GestaltGraph.unsetIdentity(
  providerIdentityId: ProviderIdentityId,
  tran?: DBTransaction,
): Promise<void>;

// Calls one of `setNode` or `setIdentity`
GestaltGraph.setVertex(
  gestaltInfo: GestaltInfo,
  tran?: DBTransaction,
): Promise<GestaltId>;

// Calls one of `unsetNode` or `unsetIdentity`
GestaltGraph.unsetVertex(
  gestaltId: GestaltId,
  tran?: DBTransaction,
): Promise<void>;

Linking and Unlinking Vertexes

Linking vertexes establishes vertex pair in the matrix level path.

It is also capable of updating the node or identity information.

These are the methods to be updated:

GestaltGraph.linkNodeAndNode(
  nodeInfo1: GestaltNodeInfo,
  nodeInfo2: GestaltNodeInfo,
  linkNode: Omit<GestaltLinkNode, 'id'>,
  tran?: DBTransaction,
): Promise<GestaltLinkId>;

GestaltGraph.linkNodeAndIdentity(
  nodeInfo: GestaltNodeInfo,
  identityInfo: GestaltIdentityInfo,
  linkIdentity: Omit<GestaltLinkIdentity, 'id'>,
  tran?: DBTransaction,
): Promise<GestaltLinkId>;

// Overloaded version of linkNodeAndNode and linkNodeAndIdentity
GestaltGraph.linkVertexAndVertex(
  gestaltInfo1: GestaltInfo,
  gestaltInfo2: GestaltInfo,
  link: [GestaltLink[0], Omit<GestaltLink[1], 'id'>]
  tran?: DBTransaction,
): Promise<GestaltLinkId>;

GestaltGraph.unlinkNodeAndNode(
  nodeId1: NodeId,
  nodeId2: NodeId,
  tran?: DBTransaction,
): Promise<void>;

GestaltGraph.unlinkNodeAndIdentity(
  nodeId: NodeId,
  providerIdentityId: ProviderIdentityId,
  tran?: DBTransaction,
): Promise<void>;

// Overlaoded version of unlinkNodeAndNode and unlinkNodeAndIdentity
GestaltGraph.unlinkVertexAndVertex(
  gestaltId1: GestaltId,
  gestaltId2: GestaltId,
  tran?: DBTransaction,
): Promise<void>;

The linking methods now also take a GestaltLink and it will check that the link's claim data matches the vertexes being linked. Currently only checkLinkNodeMatches utility function is available. One must be built also for node to identity links.

However it will not be checking the signatures of the claims in the links. It is expected that the signatures will already be checked by whatever is submitting the link to the GG. This is likely the job of the discovery domain.

Linking vertexes together has several effects:

Update the permission with the union of the 2 gestalt permissions in the ACL.
Update the link information.
Set the vertex pair.
Remove any singleton entry.
Update node or identity information.
Return new GestaltLinkId.

In the case that the vertexes are already linked, it will only:

Update the link information.
Update node or identity information.
Return existing GestaltLinkId.

Gestalt Actions

The GG depends on the ACL. This means ACL changes actually occur in the GG, not directly on the ACL.

The gestalt actions should remain the same. Only adjust according to the new data schema in GG.

These are the relevant methods to be re-enabled:

GestaltGraph.getGestaltActions(
  gestaltId: GestaltId,
  tran?: DBTransaction
): Promise<GestaltActions | undefined>:

GestaltGraph.setGestaltActions(
  gestaltId: GestaltId,
  action: GestaltAction,
  tran?: DBTransaction
): Promise<void>:

GestaltGraph.unsetGestaltActions(
  gestaltId: GestaltId,
  action: GestaltAction,
  tran?: DBTransaction
): Promise<void>:

Getting Information

Methods to create/update:

GestaltGraph.getGestalts(
  tran?: DBTransaction
): AsyncGenerator<Gestalt>;

GestaltGraph.getGestalt(
  gestaltId: GestaltId,
  tran?: DBTransaction
): Promise<Gestalt | undefined>;

GestaltGraph.getNode(
  nodeId: NodeId,
  tran?: DBTransaction,
): Promise<GestaltNodeInfo | undefined>;

GestaltGraph.getIdentity(
  providerIdentityId: ProviderIdentityId,
  tran?: DBtransaction,
): Promise<GestaltIdentityInfo | undefined>;

// Overloaded getVertex

GestaltGraph.getVertex(
  gestaltId: ['node', NodeId],
  tran?: DBTransaction,
): Promise<['node', GestaltNodeInfo] | undefined>;
GestaltGraph.getVertex(
  gestaltId: ['identity', ProviderIdentityId],
  tran?: DBTransaction,
): Promise<['identity', GestaltIdentityInfo] | undefined>;
GestaltGraph.getVertex(
  gestaltId: GestaltId,
  tran?: DBTransaction,
): Promise<GestaltInfo | undefined>;

// Overloaded getLink

GestaltGraph.getLink(
  linkId: GestaltLinkId,
  tran?: DBTransaction,
): Promise<GestaltLink | undefined>;
GestaltGraph.getLink(
  gestaltId1: ['node', NodeId],
  gestaltId2: ['node', NodeId],
  tran?: DBTransaction,
): Promise<['node', GestaltNodeLink] | undefined>;
GestaltGraph.getLink(
  gestaltId1: ['identity', ProviderIdentityId],
  gestaltId2: ['node', NodeId],
  tran?: DBTransaction,
): Promise<['identity', GestaltIdentityLink] | undefined>;
GestaltGraph.getLink(
  gestaltId1: ['node', NodeId],
  gestaltId2: ['identity', ProviderIdentityId],
  tran?: DBTransaction,
): Promise<['identity', GestaltIdentityLink] | undefined>;
GestaltGraph.getLink(
  gestaltId1: GestaltId,
  gestaltId2: GestaltId,
  tran?: DBTransaction,
): Promise<GestaltLink | undefined>;

Usage

What we would expect is that the links are being discovered by discovery and these links are being checked, and then the necessary updates are made on the GG.

The GG is also how the discovery starts its discovery process. At the moment there is no timestamp to any of the GG data, except perhaps the links.

Future improvements should add an expiry/timestamp all the data in the GG so we know how old the data is. When it was last verified. This can aid the discovery and deciding what to prioritize for discovery.

Additional context

Replace JOSE with our own tokens domain and specialise tokens for Sigchain, Notifications, Identities and Sessions #481

Tasks

Refer to the spec above, and implement the spec with respect to the GestaltGraph.ts
Add in fast-check testing to the methods of GestaltGraph, attempt model testing similar to the CertManager.test.ts.
Remove all old tests that are unnecessary and clunky.
Update any relevant GRPC handlers that are using the GG.
Move parseGestaltId into parseGestaltIdHuman in gestalts/utils as per https://github.com/MatrixAI/MatrixAI-Graph/issues/44

The text was updated successfully, but these errors were encountered:

tegefaulkes · 2022-11-11T08:59:15Z

All the methods have been implemented now. Next step is updating the tests.

CMCDragonkai · 2022-11-15T07:46:19Z

Adding a note here about GraphId. The number of operations don't change between using ACL with NodeId and GestaltGraph with GraphId. You still have to store GraphId -> GestaltId and GestaltId -> GraphId. And when you are joining 2 gestalts together, one of the gestalt's GraphId has to be re-assigned in a transaction. Ideally you choose the smallest gestalt to do so, however it's not really possible to know which gestalt is the smallest unless you maintain a counter for the number of vertices in the gestalt. This is of course possible once you do have a notion of GraphId, whereas you couldn't do this with the current ACL design.

So in the future we should switch to using GraphId but it is not a priority right now.

CMCDragonkai · 2022-11-15T07:46:46Z

Testing the GG should use the fast check system. Pretty much from now on, all of our tests should be fast check driven. It's tooling is vastly superior to writing individual unit tests.

CMCDragonkai · 2022-11-25T05:37:46Z

Example of a test prop for setNode.

  testProp.only(
    'getNode, setNode and unsetNode',
    [
      testsIdsUtils.nodeIdArb.chain(testsGestaltsUtils.gestaltNodeInfoArb)
    ],
    async (gestaltNodeInfo) => {
      const gestaltGraph = await GestaltGraph.createGestaltGraph({
        db,
        acl,
        logger,
      });
      expect(await gestaltGraph.setNode(gestaltNodeInfo)).toEqual([
        'node',
        gestaltNodeInfo.nodeId,
      ]);
      expect(await gestaltGraph.getNode(gestaltNodeInfo.nodeId)).toEqual(
        gestaltNodeInfo
      );
      await gestaltGraph.unsetNode(gestaltNodeInfo.nodeId);
      expect(await gestaltGraph.getNode(gestaltNodeInfo.nodeId)).toBeUndefined();
      await gestaltGraph.stop();
    }
  );

CMCDragonkai · 2022-11-25T07:07:39Z

I've added GestaltGraph.getLinkById to get a gestalt link by an gestalt link id.

CMCDragonkai added the development Standard development label Nov 6, 2022

CMCDragonkai mentioned this issue Nov 6, 2022

Updating Crypto to use WebCrypto API and to replace RSA with ECC #446

Merged

75 tasks

CMCDragonkai assigned tegefaulkes Nov 6, 2022

tegefaulkes closed this as completed in #446 Dec 8, 2022

CMCDragonkai added the r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy label Jul 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gestalt Link Schema Refactoring - Derived from JOSE replacement #492

Gestalt Link Schema Refactoring - Derived from JOSE replacement #492

CMCDragonkai commented Nov 6, 2022 •

edited

Loading

tegefaulkes commented Nov 11, 2022

CMCDragonkai commented Nov 15, 2022

CMCDragonkai commented Nov 15, 2022

CMCDragonkai commented Nov 25, 2022 •

edited

Loading

CMCDragonkai commented Nov 25, 2022

Gestalt Link Schema Refactoring - Derived from JOSE replacement #492

Gestalt Link Schema Refactoring - Derived from JOSE replacement #492

Comments

CMCDragonkai commented Nov 6, 2022 • edited Loading

Specification

Internal GestaltKey Type

Binary Data in the DB

GestaltGraph's ACL

Traversing a Gestalt

Setting and Unsetting Vertexes

Linking and Unlinking Vertexes

Gestalt Actions

Getting Information

Usage

Additional context

Tasks

tegefaulkes commented Nov 11, 2022

CMCDragonkai commented Nov 15, 2022

CMCDragonkai commented Nov 15, 2022

CMCDragonkai commented Nov 25, 2022 • edited Loading

CMCDragonkai commented Nov 25, 2022

CMCDragonkai commented Nov 6, 2022 •

edited

Loading

Internal `GestaltKey` Type

CMCDragonkai commented Nov 25, 2022 •

edited

Loading