Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change from JSON encoding to using CBOR/BSON encoding #58

Open
CMCDragonkai opened this issue Oct 26, 2022 · 7 comments
Open

Change from JSON encoding to using CBOR/BSON encoding #58

CMCDragonkai opened this issue Oct 26, 2022 · 7 comments
Labels
development Standard development r&d:polykey:supporting activity Supporting core activity

Comments

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Oct 26, 2022

Specification

Currently the DB uses JSON encoding by default for storing structured data.

This encoding is lossy. Not all of JS types can be represented using JSON, and in other cases it can be quite fat when encoding binary data.

Sometimes we want to store structured data that may include binary data and other useful things like Dates.

Remember things like undefined gets turned into null when in arrays, so that can be surprising.

Consider checking out CBOR (which seems an evolution from message pack and BSON).

Additional context

Tasks

  1. Compare the encoding of bufffers, typed arrays, dates, and undefined
  2. Compare the performance with JSON encoding
  3. Ensure that we get roundtrip isomorphism, what goes in, is what comes out, for random JS objects
  4. Ensure that CBOR supports additional JS "data types", and ultimately produces an ArrayBuffer that is accepted by the NAPI into rocksdb.
@CMCDragonkai
Copy link
Member Author

Currently we are using things like:

  • SignatureJSON
  • TokenSignatureJSON
  • SignedClaimJSON
  • GestaltLinkNodeJSON
  • ...

And more just to represent the type that actually that comes out of the DB after we submit JSON because of how buffers are encoded. This adds quite a bit of unnecessary noise.

If the DB could support binary data, and support types that is native to JS like Buffer, Uint8Array... etc, it would be easier to avoid needing to have these types, and it would also be possible to discard the raw option entirely since data would be efficiently stored no matter what.

Non-native JS types like Buffer could be something that is explicitly supported by this DB, since it already uses Buffer alot.

Other JS types that could be supported include things like Set and Map... but that's unnecessary atm.

@CMCDragonkai
Copy link
Member Author

This type would be particularly important:

/**
 * Strict JSON values.
 * These are the only types that JSON can represent.
 * All input values are encoded into JSON.
 * Take note that `undefined` values are not allowed.
 * `JSON.stringify` automatically converts `undefined` to `null.
 */
type JSONValue =
  { [key: string]: JSONValue } |
  Array<JSONValue> |
  string |
  number |
  boolean |
  null;

These types all need to be supported, and other kinds of values can be added to the list.

We could then create a DBValue type indicating all the types that are supported to be stored in the DB.

@CMCDragonkai
Copy link
Member Author

This coincides with #3.

@CMCDragonkai
Copy link
Member Author

Protobuf btw is not suitable for this. It must be schemaless. Other choices include messagepack too.

@CMCDragonkai
Copy link
Member Author

BSON is old school and not suitable.

Protobuf requires a schema.

CBOR seems the best, but I think the libraries are sort of unmaintained.

This seems suitable: https://github.com/kriszyp/cbor-x

@CMCDragonkai
Copy link
Member Author

This might be a breaking change in relation to #3. However it will make js-db far more user friendly and reduce the amount of encoding/decoding steps in Polykey, especially as we store alot of binary data into js-db like IDs. All of those encoding/decoding procedures could then be entirely eliminated as CBOR takes over.

@CMCDragonkai
Copy link
Member Author

Using the CBOR library could be shared with PK when it needs to use it for binary streaming for mixed messages or chunked processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development r&d:polykey:supporting activity Supporting core activity
Development

No branches or pull requests

1 participant