Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance is not ideal #333

Open
dae opened this issue Dec 6, 2022 · 15 comments
Open

Performance is not ideal #333

dae opened this issue Dec 6, 2022 · 15 comments

Comments

@dae
Copy link

dae commented Dec 6, 2022

We currently use protobufjs, and would love to switch over to protobuf-es for the typescript ergonomics and es6/tree shaking that it offers. Parts of our app deal with large messages, so I did a quick performance test, and unfortunately protobuf-es is rather behind at the moment. The test I used was decoding a 6MB binary message, it containing repeated fields with about 150k entries. Protobufjs parses it in about 100ms; protobuf-es takes about 350ms. For reference, the same message as JSON is about 27MB, and JSON.parse() parses it in about 100ms as well. If providing the message/proto would be helpful, please let me know.

@NfNitLoop
Copy link

Hi @dae -- I'm not developer for this project, but I'm a user and just saw your issue. Providing a concrete use case is always helpful in bug reports so I just thought I'd chime in and say yeah, that's useful. :) Especially if you can also provide minimal code to reproduce the performance issue. Then it becomes a de facto test case so any fixes can be tested with before/after code.

@felicio
Copy link

felicio commented Feb 10, 2023

Adding Protobuf-ES to protons's benchmarking suite at ipfs/protons#89, where it also performs the slowest.

@dae
Copy link
Author

dae commented Feb 11, 2023

I suspect any large message will reproduce the issue, but can produce a sample one if the devs request it.

On a positive note, protobuf-es's code generation seems to be a lot faster than what we currently have with protobufjs, where we have to produce a static version, which is then used to generate the ts, and then a separate json version needs to be produced because the static version's resulting code is huge. protobuf-es appears to be clearly sub-second, when protobufjs was taking around 5 seconds to rebuild each time a proto file was changed.

@felicio
Copy link

felicio commented Feb 13, 2023

Reference how protons dealt with perf issues at ipfs/protons#51.

@jcready
Copy link

jcready commented Feb 13, 2023

Protons appears to use protobuf.js's UTF-8 decoder/encoder which has correctness bugs and likely the reason protobuf-es uses the native TextEncoder/Decoder. If you're using the NodeJS runtime for your benchmarks you can comment on this bug to ask for a faster TextEncoder/Decoder: nodejs/node#39879 as Bun and Deno appear to have much faster implementations. Chrome, Firefox, and Safari are also generally much faster using the native TextEncoder/Decoder vs. protobuf.js's implementation (especially once you get above 32 characters) timostamm/protobuf-ts#184 (comment)

@smaye81
Copy link
Member

smaye81 commented Mar 20, 2023

Just to add some context also - It was a design choice to focus on usability, developer experience, and conformance when creating Protobuf-ES.

Consequently, Protobuf-ES is nearly 100% compliant with the Protobuf conformance test suite. To further illustrate this, we've created this Protobuf Conformance repo which shows how other implementations fare against this test suite. You can see that other implementations might be super performant, but the conformance scores are not great.

Nevertheless, we have some ideas for improving performance that are on our roadmap for the future.

@timostamm
Copy link
Member

Just a small update: Thanks to @kbongort, binary read performance received a ~2.5x bump in v1.2.1. See #459 for details.

@dimo414
Copy link

dimo414 commented Oct 17, 2023

Nevertheless, we have some ideas for improving performance that are on our roadmap for the future.

Any chance you could share some more context on the improvements you're considering, or any sense of a timeline on those improvements landing? We haven't done a rigorous benchmark but our initial tests have shown observable performance regressions if we migrate from protobuf.js. It would be helpful to know what's in progress (or done already, as @timostamm called out).

achingbrain added a commit to ipfs/protons that referenced this issue Oct 23, 2023
@smaye81
Copy link
Member

smaye81 commented Oct 23, 2023

Hi @dimo41. We don't have any timeline to report but this is something we plan to address relatively soon. Some additional things we want to explore are first, investigating whether the changes made in #459 could also be applied to binary write performance as well as JSON read/write performance.

In addition, we'd like to investigate potentially supporting the optimize_for proto option for optimizing for speed. The downside to that is that it will increase code/bundle size at the expense of performance and potentially make testing a bit more difficult, so we want to think through how best to implement it.

@malcolmstill
Copy link

malcolmstill commented Dec 21, 2023

I was going to open a new issue for this but since writing / encoding performance is mentioned here latterly, I'll add to this discussion.

We have an issue where we are encoding a large number of f64s (doubles). In a particular example we are overall encoding around 4 million floats across a number of protobuf messages (~100 messages so each one contains ~40,000 doubles)
and that is taking, in this case, 1600 milliseconds overall when running .enc (in particular protoDelimited.enc if it makes a difference).

The proto definition is more or less (there are some other fields but the overwhelming amount of data in each message is the data field):

message MyMessage {
  repeated double data = 1;
}

Profiling protoDelimited.enc for this series of messages shows that all the time is spent in double, with double being invoked separately for each f64 value (and allocating an 8-byte array each time).

In our case I'm able to show a ~30x encoding performance improvement (encoding takes ~50 milliseconds instead of ~1600 milliseconds) by extending the class BinaryWriter (packages/protobuf/src/binary-encoding.ts) with, say, an arrayDouble method:

arrayDouble(values: number[]): IBinaryWriter {
  let chunk = new Uint8Array(8 * values.length);
  const view = new DataView(chunk.buffer);
  for (const [i, value] of values.entries()) {
    view.setFloat64(8*i, value, true);
  }
  return this.raw(chunk);
}

Then in writePacked (packages/protobuf/src/private/binary-format-common.ts), and since I've only implemented this for double, special-case when the scalarTypeInfo returns method double to invoke a single arrayDouble call instead of n double calls:

export function writePacked(
  writer: IBinaryWriter,
  type: ScalarType,
  fieldNo: number,
  value: any[]
): void {
  if (!value.length) {
    return;
  }
  writer.tag(fieldNo, WireType.LengthDelimited).fork();
  let [, method] = scalarTypeInfo(type);

  if (method === "double") {
    writer["arrayDouble"](value as number[]);
  } else {
    for (let i = 0; i < value.length; i++) {
      (writer[method] as any)(value[i]);
    }
  }
  writer.join();
}

This seems to work in our case (I have a test where I encode a bunch of Math.random() values, encode the message, immediately decode and check that the original and decoded data match), but this is my first time looking at the protobuf-es code so I don't know enough to be sure there isn't some gotcha with doing this?

Assuming such a change is sensible this would make sense at least for some of the other primitive types, if not in general?

@timostamm
Copy link
Member

Hey Malcolm, thanks for the comment! This could be applied to all packed protobuf types with fixed size. There's no free lunch though, the downsides are increased bundle size and breaking the IBinaryWriter interface.

Editions will stir up the code paths a bit, so it does not make sense to pull in this performance improvement right now, but it does seem worthwhile to do so after we implemented support for editions. I wonder if an argument to fork() with a size to allocate in advance would be an alternative here. It's also likely that the perf improvement for parsing added in #459 applies to serializing as well.

@timostamm
Copy link
Member

It looks like a similar perf improvement applied for binary read (see #333 (comment)) can also be applied for binary write. Some details we're noted in #674 (comment).

@Ekrekr
Copy link

Ekrekr commented Aug 21, 2024

We currently use protobufjs, and would love to switch over to protobuf-es

Just to add a +1: this is the same for us, we can't switch over because the slowdown for encode and decode we would experience is about ~15x.

This library is awesome though! You write very clean code, that's nice to interface with.

@Ekrekr
Copy link

Ekrekr commented Aug 22, 2024

I set up a repo for comparing encode speeds or protobufs: https://github.com/Ekrekr/prototiming.

Looking into the flame graphs, even the UTF8 string encoding is faster in ProtobufJS, but they do some crazy stuff the ECMA encoding I think is faster, it's the memory copying that slows this down:

It seems like protobufjs is a lot more efficient at writing directly to the buffer, rather than chunking and copying around array buffers.

I'll dig into this a bit more.

@Ekrekr
Copy link

Ekrekr commented Aug 22, 2024

The most clear place to optimize seems to be in finish.

An array buffer is made during chunking:

let chunk = this.encodeUtf8(value);

And a new array buffer is made at finish, and the contents of the chunking copied over to it iteratively:

let bytes = new Uint8Array(len);

If instead of creating a new array buffer at finish, an array buffer could be made at the start and written to directly, this would prevent writing duplication.

Edit: my experiments with this are in #964.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants