Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor encodings #102

Closed
14 of 15 tasks
vweevers opened this issue Sep 28, 2021 · 6 comments
Closed
14 of 15 tasks

Refactor encodings #102

vweevers opened this issue Sep 28, 2021 · 6 comments
Labels
discussion Discussion

Comments

@vweevers
Copy link
Member

vweevers commented Sep 28, 2021

As written in #58, there are too many forms of encoding now. Yet for an abstract-leveldown implementation there's no builtin primitive to decode/deserialize data.

So here's a WIP plan for refactoring. I'll be editing this.

  • Add a format property to each encoding
    • This specifies what it encodes to. One of utf8, buffer, view, id
    • For json it will be utf8, for utf8 it will be utf8, etc
  • Rename the binary encoding to buffer, with buffer as an alias (in limited places, i.e. where we already supported this)
  • Add a new view encoding that works with typed arrays (including Buffer because that's a UInt8Array)
    • Only support Uint8Array for now, maybe DataView later
  • Change the utf8 encoding to always return a string, because return types of .encode() must be predictable
  • Add encodings as a feature to abstract-leveldown
    • It will encode keys & values in the public API, and forward encoding.format to the private API
    • For example, put('a', 123, { valueEncoding: 'json' }) could result in _put('a', '123', { valueEncoding: 'utf8' })
    • Like level, the default encoding will be utf8
  • An implementation must advertise its "native" encodings to abstract-leveldown via the manifest
    • For example, memdown will set { encodings: { buffer: true } } which says "I want buffers"
    • This could also be configurable! If you want to use typed arrays instead: memdown({ storeEncoding: 'view' }) which will tell memdown to set { encodings: { view: true } } - and memdown won't have to do any translation itself
  • Implement "transcoder" encodings that translate from e.g. utf8 to buffer.
    • Their .type will be e.g. utf8+buffer, json+view
  • These will be also advertised in the manifest, but by abstract-leveldown.
    • So on memdown you'll see db.supports.encodings.json even though memdown itself doesn't handle json
  • Remove asBuffer, valueAsBuffer, keyAsBuffer
  • Remove _serializeKey, _serializeValue
    • The only remaining use case that I see for serialization is key prefixing
  • When needed, abstract-leveldown will lookup a transcoder.
    • For example on memdown, a db.get(9) will result in _get(<Buffer 39>) by passing it trough the transcoder utf8+buffer.
  • In theory we can also support native encodings. If an implementation advertises e.g. encodings: { json: true }, we can bypass abstract-leveldown encoding. But if the encoding is idempotent (meaning .type === .format) we won't, in order to still normalize user input.
  • Remove lesser-used encodings that can be moved to userland: ascii, ucs2 and utf16le
    • This makes it easier to support typed arrays and to make buffer optional, because utf8 is supported by both Buffer and TextEncoder
  • If an encoding is unsupported, throw an error with { code: LEVEL_ENCODING_NOT_SUPPORTED }, rather than falling back to the id (identity) encoding
  • Expand binary key tests abstract-leveldown#361
@vweevers vweevers added the discussion Discussion label Sep 28, 2021
@vweevers
Copy link
Member Author

vweevers commented Oct 21, 2021

Updated the above with a more complete plan, that removes the need for serialization and asBuffer options, and also paves the way for typed arrays.

Example of a transcoder (some should be hardcoded to optimize them, others like json+buffer can be dynamic):

exports['utf8+view'] = new Encoding({
  encode (data) {
    return ArrayBuffer.isView(data) ? data : textEncoder.encode(data)
  },
  decode (data) {
    return textDecoder.decode(data)
  },
  type: 'utf8+view',
  format: 'view'
})

@vweevers
Copy link
Member Author

Doing this in abstract-leveldown could have performance benefits too. For example, if we know that a particular value encoding on a get() operation is idempotent (like buffer or utf8) and we know that the implementation supports that encoding "natively", then we can do _get(..., callback) without wrapping the callback in another function that decodes the value.

@vweevers
Copy link
Member Author

@vweevers
Copy link
Member Author

Ugh, this works better than I expected. Now subleveldown can just forward operations to a db, without unwrapping or rewrapping that db with encoding-down or levelup. Doesn't matter what encoding options that db used, or whether it stores data as buffers or Uint8Array internally. On the subleveldown db you can use any encoding too, including Uint8Array even though subleveldown internally works with Buffers and strings. It just fucking works.

The following tests are passing (locally; haven't pushed yet):

Click to expand
const test = require('tape')
const suite = require('abstract-leveldown/test')
const memdown = require('memdown')
const subleveldown = require('subleveldown')

// Test abstract-leveldown compliance
function runSuite (factory) {
  suite({ test, factory })
}

// Test basic prefix
runSuite(function factory (opts) {
  return subleveldown(memdown(), 'test', opts)
})

// Test empty prefix
runSuite(function factory (opts) {
  return subleveldown(memdown(), '', opts)
})

// Test custom separator
runSuite(function factory (opts) {
  return subleveldown(memdown(), 'test', { ...opts, separator: '%' })
})

// Test on db with buffer encoding
runSuite(function factory (opts) {
  return subleveldown(memdown({ keyEncoding: 'buffer' }), 'test', opts)
})

// Test on db with view encoding (Uint8Array)
runSuite(function factory (opts) {
  return subleveldown(memdown({ keyEncoding: 'view' }), 'test', opts)
})

// Have memdown internally use views too
runSuite(function factory (opts) {
  return subleveldown(memdown({ keyEncoding: 'view', storeEncoding: 'view' }), 'test', opts)
})

// Lastly, for good measure:
runSuite(function factory (opts) {
  return subleveldown(memdown({ keyEncoding: 'buffer', storeEncoding: 'view' }), 'test', opts)
})

@ralphtheninja @juliangruber @MeirionHughes ARE YOU EXCITED? Because I am! Fuck!

@juliangruber
Copy link
Member

Sounds like this gets us a lot of flexibility and api simplicity as well. Well done! 👏

@vweevers
Copy link
Member Author

Added support of other ecosystem encodings (codecs, abstract-encoding, multiformats) to level-transcoder, fixed some bugs, and updated its README: https://github.com/Level/transcoder. That part is now done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussion
Projects
None yet
Development

No branches or pull requests

2 participants