Skip to content

Conversation

jlucaso1
Copy link

@jlucaso1 jlucaso1 commented Apr 17, 2025

This refactor moves away from “chunks + push‐to‐array + concat at the end” toward a single, growable Uint8Array buffer with explicit capacity management and in‐place writes. The main benefits are:

• Amortized O(1) writes → by doubling the buffer when it’s full (ensureCapacity), you avoid
frequent small allocations or large concat operations.
• Lower GC pressure → you no longer build many tiny Uint8Array slices or intermediate JS arrays.
• Faster varint encoding → the hot path for single‐byte values now early‐returns, and the multi‑byte loop
writes directly into the buffer instead of an intermediate array.
• Simpler fork/join → length‑delimited framing is done by shifting bytes in place rather than flushing/collecting chunks.
• More predictable memory layout → everything lives contiguously in one buffer, so slice/subarray calls are just views.

Together these yield better throughput, reduced pauses for garbage collection, and (often) smaller peak working sets at runtime.

its like #964 but less api changes

@CLAassistant
Copy link

CLAassistant commented Apr 17, 2025

CLA assistant check
All committers have signed the CLA.

@timostamm
Copy link
Member

Thanks for the PR! We'll allocate time to give this a closer look.

Copy link
Member

@timostamm timostamm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of comments below.

I like this change - it's a bit cleaner, and should also make it easier to move to resizable array buffers in the future.

Looking at perf:

# before
toBinary   perf-payload.bin x 5,680 ops/sec ±0.33% (96 runs sampled)
toBinary   tiny example.User x 1,176,788 ops/sec ±0.19% (100 runs sampled)
toBinary   normal example.User x 203,325 ops/sec ±0.54% (94 runs sampled)
toBinary   scalar values x 292,358 ops/sec ±0.65% (98 runs sampled)
toBinary   repeated scalar values x 101,041 ops/sec ±0.57% (96 runs sampled)
toBinary   map with scalar keys and values x 69,991 ops/sec ±1.12% (99 runs sampled)
toBinary   repeated field with 1000 messages x 3,812 ops/sec ±2.65% (96 runs sampled)
toBinary   map field with 1000 messages x 771 ops/sec ±2.20% (94 runs sampled)

# after
toBinary   perf-payload.bin x 5,162 ops/sec ±0.33% (99 runs sampled)
toBinary   tiny example.User x 1,252,113 ops/sec ±0.50% (94 runs sampled)
toBinary   normal example.User x 244,426 ops/sec ±1.18% (92 runs sampled)
toBinary   scalar values x 353,611 ops/sec ±0.45% (99 runs sampled)
toBinary   repeated scalar values x 129,307 ops/sec ±0.43% (99 runs sampled)
toBinary   map with scalar keys and values x 89,141 ops/sec ±0.46% (96 runs sampled)
toBinary   repeated field with 1000 messages x 7,059 ops/sec ±0.29% (100 runs sampled)
toBinary   map field with 1000 messages x 1,126 ops/sec ±0.22% (98 runs sampled)

# ran with
cd packages/protobuf-test
npx turbo run build
npx tsx src/perf.ts benchmark 'toBinary'

Nice improvement overall, with a 10% regression on perf-payload.bin. We've used this case for performance optimization in the past (for example #836), so it's unfortunate that they are getting slower with this change.

I think the payload fields repeated_long_string_field and repeated_long_bytes_field (see perf-payload.txt) are responsible. Would be great to understand why, and whether it can be improved.

Comment on lines -184 to -190
/**
* Writes a tag (field number and wire type).
*
* Equivalent to `uint32( (fieldNo << 3 | type) >>> 0 )`.
*
* Generated code should compute the tag ahead of time and call `uint32()`.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restore the doc comment.

Comment on lines -223 to -225
/**
* Write a `int32` value, a signed 32 bit varint.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restore the doc comment.

Comment on lines -257 to -259
/**
* Write a `float` value, 32-bit floating point number.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restore the doc comment.

Comment on lines +287 to +289
const tmp: number[] = [];
varint32write(value, tmp);
this.raw(Uint8Array.from(tmp));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not now - we have enough moving parts - but this is worth a closer look later:

Instead of creating an Array and a Uint8Array, we can allocate the max varint size (5 bytes for uint, 10 bytes for int), and encode directly into the buffer. varint32write is not exported from the package and we are free to change the signature.

Comment on lines +125 to +127
const out = this.buffer.subarray(0, this.pos);
// Return a copy to avoid mutation if writer is reused
const result = new Uint8Array(out);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const out = this.buffer.subarray(0, this.pos);
// Return a copy to avoid mutation if writer is reused
const result = new Uint8Array(out);
const result = this.buffer.slice(0, this.pos);

See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray/slice

Comment on lines +271 to +277
this.ensureCapacity(4);
new DataView(
this.buffer.buffer,
this.buffer.byteOffset,
this.buffer.byteLength,
).setInt32(this.pos, value, true);
this.pos += 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Can you apply the same to sfixed64 and fixed64?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants