Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashes as hex strings or buffers? #533

Closed
chjj opened this issue Jul 14, 2018 · 6 comments
Closed

Hashes as hex strings or buffers? #533

chjj opened this issue Jul 14, 2018 · 6 comments

Comments

@chjj
Copy link
Member

chjj commented Jul 14, 2018

This is a huge design error on my part. I originally used hex strings for some things because they were more efficient for JS maps/objects than constantly doing buf.toString('hex'). So, currently in bcoin, anything that needs to be constantly inserted into a hash table is a hex string, anything else is usually a buffer. Dealing with this on a case-by-case basis lead to really good performance in node and the browser, but as a result, the code is more confusing: e.g. "is that parameter supposed to be a hex-hash or a buffer-hash?"

In the past, I tried to rectify this by using a binding to the C++ hash table. This custom hash table object would allow for Buffers as keys, but performance was so horribly bad (due to the overhead of calling out to C++) that I scrapped that idea entirely.

Anyway, it turns out I made a huge oversight: buf.toString('binary') is much more optimal than I thought...

This is what gets called internally during buf.toString('binary'):

https://github.com/nodejs/node/blob/3626944/src/string_bytes.cc#L665

    case LATIN1:
      return ExternOneByteString::NewFromCopy(isolate, buf, buflen, error);

This ends up calling:

https://github.com/nodejs/node/blob/3626944/src/string_bytes.cc#L167

  MaybeLocal<String> str =
      String::NewFromOneByte(isolate,
                             reinterpret_cast<const uint8_t*>(data),
                             v8::NewStringType::kNormal,
                             length);

Which ultimately calls:

https://github.com/nodejs/node/blob/3626944/deps/v8/src/heap/factory.cc#L472

MaybeHandle<String> Factory::NewStringFromOneByte(Vector<const uint8_t> string,
                                                  PretenureFlag pretenure) {
  int length = string.length();
  if (length == 0) return empty_string();
  if (length == 1) return LookupSingleCharacterStringFromCode(string[0]);
  Handle<SeqOneByteString> result;
  ASSIGN_RETURN_ON_EXCEPTION(isolate(), result,
                             NewRawOneByteString(string.length(), pretenure),
                             String);

  DisallowHeapAllocation no_gc;
  // Copy the characters into the new object.
  CopyChars(SeqOneByteString::cast(*result)->GetChars(), string.start(),
            length);
  return result;
}

It's a simple copy.

As for buf.write(str, 'binary') and Buffer.from(str, 'binary'):

https://github.com/nodejs/node/blob/3626944/src/string_bytes.cc#L321

    case LATIN1:
      if (str->IsExternalOneByte()) {
        auto ext = str->GetExternalOneByteStringResource();
        nbytes = std::min(buflen, ext->length());
        memcpy(buf, ext->data(), nbytes);
      } else {
        // The relevant code:
        uint8_t* const dst = reinterpret_cast<uint8_t*>(buf);
        nbytes = str->WriteOneByte(dst, 0, buflen, flags);
      }
      *chars_written = nbytes;
      break;

Another simple copy of bytes. This means buf.copy(out, 0) and out.write(str, 'binary') are roughly the same speed!

So where to go from here? I suggest using Buffers for all hashes in bcoin. We use a "BufferMap" implementation that calls toString('binary') on every key before insertion into the Map. The overhead is very minimal.

Concerns:

  1. A lot of bcoin must be rewritten to support this.
  2. This will result in a huge degradation of speed in the browser. The browser buffer implementation concatenates a string to create the binary string. Internally, v8 will dimension the string wider than it needs to be in order to accomodate future concatenations. This results in higher memory usage... not to mention, it's much slower.
  3. Lock, MapLock, and LRU must allow for this new BufferMap object.
  4. Validator must offer a method to convert a hash to a Buffer.
  5. BDB must return Buffers when parsing hashes in database keys.
  6. If the BufferMap object carries references to the original buffer keys, we must be very careful when slicing hashes out of serialized data (via bio.read(buf, true) or br.readBytes(n, true)).

The second concern is the only unsolvable one I think. If anybody has any ideas please post them. I'm going to start rewriting things on a separate branch, using only Buffers, and see how it does in terms of performance and memory.

cc @tuxcanfly @nodar-chkuaselidze @boymanjor @pinheadmz

@chjj
Copy link
Member Author

chjj commented Jul 14, 2018

All tests are passing. I'll let you guys be the judge, but this seems a bit better on memory and performance.

@nodech
Copy link
Member

nodech commented Jul 14, 2018

  1. This will result in a huge degradation of speed in the browser.

Does not hexSlice also concatenate strings ? Running some benchs in the browser, buffermap outperforms hex buffers.

@bucko13
Copy link
Contributor

bucko13 commented Jul 15, 2018

That's awesome that buffer-map was available on npm

@bucko13
Copy link
Contributor

bucko13 commented Jul 15, 2018

I think the consistency of everything being a buffer rather than depending on the case is a big win from the point of view of working with the API. Also makes things more consistent across modules. It seems like a slight extra annoyance to have to pass in the custom map to LRU, etc but this is something that most people won't be touching. I guess we might build in support for BufferMap directly to those as well if we commit to this?

chjj added a commit that referenced this issue Jul 19, 2018
chjj added a commit that referenced this issue Aug 10, 2018
@braydonf
Copy link
Contributor

This can be closed, was added to master at b92839c

@tuxcanfly
Copy link
Member

Closed in b92839c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants