Skip to content

ID standardisation for all projects and domains #1

Closed
@CMCDragonkai

Description

@CMCDragonkai

Specification

ID generation is used in many places in PK. But the IDs must have different properties depending on the usecase.

The properties we care about are:

  • Decentralised vs Centralised - the appending of a "machine ID" makes the IDs decentralised and prevents collisions and is coordination-free
  • k-Sortable vs Random - k-sortable IDs are sortable lexically which means it can be used in a ordered key-value database as an index, while random identifiers are intended to have no-order, and is often important to ensure unguessability, note that sortable IDs can also have a random component but there's a tradeoff in how random vs how sortable when there's limited amount of space
  • Limited byte size representation - UUIDs use 128 bits, it appears most IDs are 128 bits or 16 bytes, this means all properties have to be encoded within 128 bits, some properties may not fit within that 128 bits, for example decentralised ids may need to use the machine id which itself may be larger than 128 bits, and slicing a smaller amount would actually increase the likelihood of collision, in such cases compound ID formats will be required
  • Buffer Representation and Base Encoding - IDs should have a original binary form, and can then be encoded in various formats for display, buffer representation is superior as we can make use of the full bitspace, and it will be shorter compared to the base encoded for textual display, we could support different textual displays for different reasons, but the UUID display is a nice way of displaying the IDs which can help memorisation
  • Strict Monotonicity - for sortable IDs, it's essential that IDs generated are monotonic to prevent collisions or ambiguity, these have to be sortable across clock resets, and process restarts of the program

IDs compared to petnames give us the secure and decentralized properties, but not human-meaningful. Human meaningful names can be generated by mapping to a mnemonic. But that is outside the scope of this for now.

There are roughly 4 usecases in PK:

  • Decentralised Sortable IDs - claim ID in sigchain should be decentralized, but also sortable, they are public information so they point to a claim at a point in time
  • Centralised Sortable IDs - notification ID in PK notifications, these are IDs local to the PK, but should also be sortable, they are public information so they don't have to be cryptographically random
  • Decentralised Random IDs - vault ID, these should be random, as they should not leak the order of creation information
  • Centralised Random IDs - permission ID, these should be random because order doesn't matter, but leakage isn't really an issue either

To resolve decentralised vs centralised, rather than assuming that the machine id should be embedded in the identifier, we would instead always expect the machine id to be appended as as suffix. Appending is superior to prepending to ensure that sorting is still done.

We can default to using 128 bit sizes, but allow the user to specify higher or smaller sizes.

We can use a default CSPRNG, but also allow users to submit a custom CSPRNG for random number generation.

To ensure monotonicity, we want to allow the external system to save a clock state and give it to us, so we can ensure that ids are always monotonic.

We may expect that our IDs to be later encoded with multibase, we should allow this library to be composed with multibase later.

Note that ID generation is different when it's meant to be backed by a public key. That is out side of the scope of this library. These IDs are not public keys!

There are places in PK where we use https://github.com/substack/lexicographic-integer, in those cases we may keep using that instead of this library. However those are when it is truly that we are trying to store a number like the inode indexes in EFS, in the sigchain, what we really want is IdSortable

Additional context

Tasks

  1. - Play around with uuid library
  2. - Review uuidv7 and uuidv8 spec
  3. - Review https://github.com/kripod/uuidv7 implementation
  4. - Integrate multibase
  5. - Transform uuidv7 to uuidv8 constructor
  6. - Use uuiv8 as a skeleton to build our multi-property ids needed by PK
  7. - Integrate performance.now() and performance.timeOrigin APIs (make it browser possible by testing with a dynamic import?)
  8. - Consider both synchronous and asynchronous API due to asynchronous crypto CSPRNG
  9. - Implement IdRandom
  10. - Implement IdDeterministic
  11. - Implement IdSortable
  12. - Add tests for lexical order for both binary and encoded forms
  13. [ ] - Port tests over from https://github.com/uuid6/prototypes/tree/main/python for IdSortable - created our own tests
  14. - Add in multibase encoding utilities to allow easy way of encoding and decoding the ids using any base
  15. [ ] - Test that it is actually returning ArrayBuffer - can't do this because it doesn't work in jest

Metadata

Metadata

Assignees

Labels

designRequires design (architecture, protocol, specification and task list requires further work)developmentStandard developmentr&d:polykey:core activity 3Peer to Peer Federated Hierarchy

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions