-
Notifications
You must be signed in to change notification settings - Fork 117
refactor(protobuf): replace chunk-based BinaryWriter with growable Uint8Array buffer and in-place varint writes #1108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…nt8Array buffer and in-place varint writes
Thanks for the PR! We'll allocate time to give this a closer look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of comments below.
I like this change - it's a bit cleaner, and should also make it easier to move to resizable array buffers in the future.
Looking at perf:
# before
toBinary perf-payload.bin x 5,680 ops/sec ±0.33% (96 runs sampled)
toBinary tiny example.User x 1,176,788 ops/sec ±0.19% (100 runs sampled)
toBinary normal example.User x 203,325 ops/sec ±0.54% (94 runs sampled)
toBinary scalar values x 292,358 ops/sec ±0.65% (98 runs sampled)
toBinary repeated scalar values x 101,041 ops/sec ±0.57% (96 runs sampled)
toBinary map with scalar keys and values x 69,991 ops/sec ±1.12% (99 runs sampled)
toBinary repeated field with 1000 messages x 3,812 ops/sec ±2.65% (96 runs sampled)
toBinary map field with 1000 messages x 771 ops/sec ±2.20% (94 runs sampled)
# after
toBinary perf-payload.bin x 5,162 ops/sec ±0.33% (99 runs sampled)
toBinary tiny example.User x 1,252,113 ops/sec ±0.50% (94 runs sampled)
toBinary normal example.User x 244,426 ops/sec ±1.18% (92 runs sampled)
toBinary scalar values x 353,611 ops/sec ±0.45% (99 runs sampled)
toBinary repeated scalar values x 129,307 ops/sec ±0.43% (99 runs sampled)
toBinary map with scalar keys and values x 89,141 ops/sec ±0.46% (96 runs sampled)
toBinary repeated field with 1000 messages x 7,059 ops/sec ±0.29% (100 runs sampled)
toBinary map field with 1000 messages x 1,126 ops/sec ±0.22% (98 runs sampled)
# ran with
cd packages/protobuf-test
npx turbo run build
npx tsx src/perf.ts benchmark 'toBinary'
Nice improvement overall, with a 10% regression on perf-payload.bin
. We've used this case for performance optimization in the past (for example #836), so it's unfortunate that they are getting slower with this change.
I think the payload fields repeated_long_string_field
and repeated_long_bytes_field
(see perf-payload.txt) are responsible. Would be great to understand why, and whether it can be improved.
/** | ||
* Writes a tag (field number and wire type). | ||
* | ||
* Equivalent to `uint32( (fieldNo << 3 | type) >>> 0 )`. | ||
* | ||
* Generated code should compute the tag ahead of time and call `uint32()`. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please restore the doc comment.
/** | ||
* Write a `int32` value, a signed 32 bit varint. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please restore the doc comment.
/** | ||
* Write a `float` value, 32-bit floating point number. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please restore the doc comment.
const tmp: number[] = []; | ||
varint32write(value, tmp); | ||
this.raw(Uint8Array.from(tmp)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not now - we have enough moving parts - but this is worth a closer look later:
Instead of creating an Array and a Uint8Array, we can allocate the max varint size (5 bytes for uint, 10 bytes for int), and encode directly into the buffer. varint32write
is not exported from the package and we are free to change the signature.
const out = this.buffer.subarray(0, this.pos); | ||
// Return a copy to avoid mutation if writer is reused | ||
const result = new Uint8Array(out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const out = this.buffer.subarray(0, this.pos); | |
// Return a copy to avoid mutation if writer is reused | |
const result = new Uint8Array(out); | |
const result = this.buffer.slice(0, this.pos); |
See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray/slice
this.ensureCapacity(4); | ||
new DataView( | ||
this.buffer.buffer, | ||
this.buffer.byteOffset, | ||
this.buffer.byteLength, | ||
).setInt32(this.pos, value, true); | ||
this.pos += 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Can you apply the same to sfixed64
and fixed64
?
This refactor moves away from “chunks + push‐to‐array + concat at the end” toward a single, growable
Uint8Array
buffer with explicit capacity management and in‐place writes. The main benefits are:• Amortized O(1) writes → by doubling the buffer when it’s full (
ensureCapacity
), you avoidfrequent small allocations or large concat operations.
• Lower GC pressure → you no longer build many tiny
Uint8Array
slices or intermediate JS arrays.• Faster varint encoding → the hot path for single‐byte values now early‐returns, and the multi‑byte loop
writes directly into the buffer instead of an intermediate array.
• Simpler fork/join → length‑delimited framing is done by shifting bytes in place rather than flushing/collecting chunks.
• More predictable memory layout → everything lives contiguously in one buffer, so slice/subarray calls are just views.
Together these yield better throughput, reduced pauses for garbage collection, and (often) smaller peak working sets at runtime.
its like #964 but less api changes