refactor(protobuf): replace chunk-based BinaryWriter with growable Uint8Array buffer and in-place varint writes #1108

jlucaso1 · 2025-04-17T23:36:12Z

This refactor moves away from “chunks + push‐to‐array + concat at the end” toward a single, growable Uint8Array buffer with explicit capacity management and in‐place writes. The main benefits are:

• Amortized O(1) writes → by doubling the buffer when it’s full (ensureCapacity), you avoid
frequent small allocations or large concat operations.
• Lower GC pressure → you no longer build many tiny Uint8Array slices or intermediate JS arrays.
• Faster varint encoding → the hot path for single‐byte values now early‐returns, and the multi‑byte loop
writes directly into the buffer instead of an intermediate array.
• Simpler fork/join → length‑delimited framing is done by shifting bytes in place rather than flushing/collecting chunks.
• More predictable memory layout → everything lives contiguously in one buffer, so slice/subarray calls are just views.

Together these yield better throughput, reduced pauses for garbage collection, and (often) smaller peak working sets at runtime.

its like #964 but less api changes

…nt8Array buffer and in-place varint writes

CLAassistant · 2025-04-17T23:36:18Z

All committers have signed the CLA.

timostamm · 2025-04-18T08:43:07Z

Thanks for the PR! We'll allocate time to give this a closer look.

timostamm

Left a couple of comments below.

I like this change - it's a bit cleaner, and should also make it easier to move to resizable array buffers in the future.

Looking at perf:

# before
toBinary   perf-payload.bin x 5,680 ops/sec ±0.33% (96 runs sampled)
toBinary   tiny example.User x 1,176,788 ops/sec ±0.19% (100 runs sampled)
toBinary   normal example.User x 203,325 ops/sec ±0.54% (94 runs sampled)
toBinary   scalar values x 292,358 ops/sec ±0.65% (98 runs sampled)
toBinary   repeated scalar values x 101,041 ops/sec ±0.57% (96 runs sampled)
toBinary   map with scalar keys and values x 69,991 ops/sec ±1.12% (99 runs sampled)
toBinary   repeated field with 1000 messages x 3,812 ops/sec ±2.65% (96 runs sampled)
toBinary   map field with 1000 messages x 771 ops/sec ±2.20% (94 runs sampled)

# after
toBinary   perf-payload.bin x 5,162 ops/sec ±0.33% (99 runs sampled)
toBinary   tiny example.User x 1,252,113 ops/sec ±0.50% (94 runs sampled)
toBinary   normal example.User x 244,426 ops/sec ±1.18% (92 runs sampled)
toBinary   scalar values x 353,611 ops/sec ±0.45% (99 runs sampled)
toBinary   repeated scalar values x 129,307 ops/sec ±0.43% (99 runs sampled)
toBinary   map with scalar keys and values x 89,141 ops/sec ±0.46% (96 runs sampled)
toBinary   repeated field with 1000 messages x 7,059 ops/sec ±0.29% (100 runs sampled)
toBinary   map field with 1000 messages x 1,126 ops/sec ±0.22% (98 runs sampled)

# ran with
cd packages/protobuf-test
npx turbo run build
npx tsx src/perf.ts benchmark 'toBinary'

Nice improvement overall, with a 10% regression on perf-payload.bin. We've used this case for performance optimization in the past (for example #836), so it's unfortunate that they are getting slower with this change.

I think the payload fields repeated_long_string_field and repeated_long_bytes_field (see perf-payload.txt) are responsible. Would be great to understand why, and whether it can be improved.

timostamm · 2025-04-25T13:22:09Z