Skip to content
This repository has been archived by the owner on Sep 3, 2021. It is now read-only.

Do something about JSON between repc and repliache-sdk-js #167

Closed
aboodman opened this issue Sep 6, 2020 · 5 comments
Closed

Do something about JSON between repc and repliache-sdk-js #167

aboodman opened this issue Sep 6, 2020 · 5 comments

Comments

@aboodman
Copy link
Contributor

aboodman commented Sep 6, 2020

Looks like right now the majority of the perf profile for scan is JSON encoding:

Screen Shot 2020-09-05 at 3 09 04 PM

I'm not sure if we can do better by using a better JSON library. But stepping back and thinking about what's happening here:

  1. We read an ArrayBuffer out of IDB
  2. We copy the contents of the AB into a Rust Vector (boo, but this is an easy one - see Avoid to_vec() copy in idbstore::ReadTransaction #166)
  3. We lay a FlatBuffer down onto the Rust Vector (yay)
  4. We iterate the FlatBuffer and create ScanItems. The keys and values inside the items are references. (yay!)
  5. For each ScanItem we create a temporary embed::ScanItem and allocate its key/value string (boo)
  6. We build up a big Vector<embed::ScanItem>(boo)
  7. We JSON serialize the whole shebang to a String (boo)
  8. Then we send the string out to JS
    ===
  9. JS SDK immediately parses the string as JSON (We should probably move away from JSON for our values. #162)
@aboodman
Copy link
Contributor Author

aboodman commented Sep 6, 2020

I have no idea how close we can get to this, but it would be really neat if we could:

  1. Read ArrayBuffer out of IDB
  2. Lay a Chunk view on top of this w/o copy (can def do this - Avoid to_vec() copy in idbstore::ReadTransaction #166)
  3. as today
  4. as today
  5. Send JS an array of views into the FB that keep the FB alive on the Rust side.
    ===
  6. JS SDK passes views up to customer w/o touching them
    ===
  7. Customer can parse data into JSON if they want but that's on them, not us

From looking at the profile we should expect this to be insanely fast, as basically the entire profile is doing things other than reading the thing out of IDB and FlatBufferizing it.

@aboodman
Copy link
Contributor Author

aboodman commented Sep 6, 2020

Update: I do not think it is realistically possible to avoid some of these copies for a few reasons:

However: the big thing in the profile appears to be JSON encoding, and we can get rid of all of that. We don't actually need to send JSON to JS. We're in process! wasm_bindgen supports relatively rich interop, we can send all kinds of things. We can create and integrate with all manner of JsValues, including ArrayBuffer.

First idea:

First of all, we don't need to build up a list of things to return to JS. wasm_bindgen supports reflecting stateful structs with methods. We could have an Iterator in Rust that we return from scan() to JS.

Once we have that, we can just return &[u8] directly to JS for each key/value pair. wasm_bindgen will copy the array into a Uint8Array, then JS will have to decode that into a JS string (sigh), then parse that into JSON (blech). But if we ever discover a way to go from Uint8Array directly to JSON that will get rid of one of those steps, and one can be avoided if user isn't using JSON too.

However, this will get rid of all the JSON serialization, which will probably help a ton. So after this it would look like:

  1. JS calls scan()
  2. Scan creates some a Rust embed::ScanIterator which is #[wasm_bindgen] and returns to JS (wasm_bindgen implicitly allocates a JS Value to wrap it)
  3. JS calls next() on the ScanIterator in a loop, for each pass, it calls key() and value(), which:
  • return a &[u8] from Rust
  • Rust copies the slice to a new JS ArrayBuffer
  • Then JS immediately decodes this ArrayBuffer to a string
  • Then immediately parses that string with JSON.parse

Idea the second

We can go farther. If JS is going to immediately parse the data anyway (into JSON, PB, whatever), it is possible to (unsafely) return a view directly into WASM memory to JS (https://rustwasm.github.io/wasm-bindgen/api/js_sys/struct.Uint8Array.html#method.view). JS must immediately use this data because any Rust allocations can invalidate it.

1-3. same as above. For each loop:

  • return a uint32 from Rust which is a pointer into wasm memory
  • Then JS immediately decodes this memory into a string
  • Then immediately parses that string with JSON.parse

We save one copy but because of the unsafe pointer this only works if JS SDK is in charge of decoding values, which conflicts with #162 .

Idea the third:

We can go even farther. Recall that this data ultimately comes from a JS ArrayBuffer that we got out of IndexedDB. We had to copy it into a Rust Vector so that we could use it with FlatBuffers (boo), but in JS land, ArrayBuffer supports copyless views: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array/Uint8Array.

So.. instead of handing JS copies (idea #1) or unsafe rust pointers (idea #2), we can hand JS ArrayBuffers which are views onto the original ArrayBuffer we read.

This is pretty invasive to our design because it would require for every key/value pair we want to return being able to (a) get back to the original ArrayBuffer we read out of IDB, and (b) being able to find the byte offset for the beginning of that key/value pair in the original ArrayBuffer.

But, if we did this, then it would look like:

1-3. same as above. For each loop:

  • create a JS Uint8Array pointing to right location in original ArrayBuffer backing Chunk and return to JS
  • Then JS immediately decodes this memory into a string
  • Then immediately parses that string with JSON.parse

... and if we implement #162 then the last two steps move outside the JS SDK and are not our problem anymore. Users who don't use JSON can avoid the overhead.

@phritz
Copy link
Contributor

phritz commented Sep 15, 2020

For rn just fix scan and if it turns out to be more general yay

@aboodman
Copy link
Contributor Author

This is basically done.

@aboodman
Copy link
Contributor Author

Where by basically I mean:

For rn just fix scan and if it turns out to be more general yay

This part is done.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants