Skip to content

Document (or fix?) v8.deserialize' 2gb limitation for input buffer #40059

Closed
@beaugunderson

Description

@beaugunderson

Version

v16.9.0

Platform

Darwin theodore 20.6.0 Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:31 PDT 2021; root:xnu-7195.141.2~5/RELEASE_X86_64 x86_64

Subsystem

v8

What steps will reproduce the bug?

Here is a script that will demonstrate the issue:

#!/usr/bin/env node --max-old-space-size=32768

const v8 = require('v8');

const PADDING_STRING = `Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin non quam in diam laoreet rhoncus condimentum quis neque. Sed luctus arcu eget velit tincidunt rhoncus. Mauris eros libero, lobortis et dolor quis, interdum sagittis dolor. Maecenas sit amet nulla at risus ullamcorper gravida. Sed varius nulla vel faucibus accumsan. Sed luctus purus felis, sagittis vehicula justo sollicitudin sed. Duis laoreet lobortis condimentum. Nunc ac nisi quis dolor malesuada aliquet eget in quam.

Vivamus malesuada leo et nisi feugiat varius. Vivamus ut dapibus tellus. Nunc interdum metus eget odio accumsan efficitur. Donec ac nisl id justo ullamcorper porta. Aliquam molestie dictum purus, non tincidunt mi facilisis non. Praesent lorem felis, pretium at consectetur et, elementum nec nisi. Etiam placerat lorem at maximus vulputate. Donec sollicitudin pretium ligula. Curabitur eu porta leo, sit amet tempor leo. Vivamus venenatis massa metus, at tempor dolor pharetra vel. Cras eget turpis eu nisi elementum dapibus sit amet sit amet nibh. Aliquam rhoncus eros et mauris aliquam, rhoncus condimentum purus placerat. Nam mollis sollicitudin ante, non imperdiet nulla commodo vitae. Mauris sollicitudin quam ut ipsum dignissim, in mollis augue placerat. Morbi suscipit auctor hendrerit. Morbi dictum sagittis nulla nec posuere.

Suspendisse potenti. Proin vehicula est blandit, euismod velit sed, maximus augue. Nullam id rhoncus risus. Donec cursus lobortis porttitor. Donec fringilla, sem ac vehicula finibus, sem nisl ultricies leo, eu finibus sapien metus eu nibh. Fusce ut erat eu arcu aliquet tincidunt. Maecenas tristique enim non ante varius, quis semper justo efficitur. Integer maximus ultrices nisl at molestie.

Proin interdum, quam ut pellentesque congue, magna urna tristique felis, malesuada porta orci metus a purus. Morbi porttitor ex nec arcu mollis luctus. Ut quis tortor purus. Ut eu odio pharetra, fringilla ligula sit amet, sagittis lectus. Aenean ac quam vel ex mollis rutrum. Aliquam nulla leo, varius at mauris eu, porttitor egestas libero. Praesent sed feugiat augue, iaculis feugiat libero.

Suspendisse potenti. Suspendisse blandit ex quis nunc elementum pellentesque. Nam sagittis dui id faucibus faucibus. Pellentesque faucibus augue sit amet lorem cursus cursus. Pellentesque ut venenatis nisl, in placerat quam. Sed in enim at eros condimentum porttitor quis et lectus. Praesent sit amet pulvinar sapien. Quisque sodales mi ante, eu fermentum enim rhoncus sed. Praesent ac arcu eu erat cursus vulputate. Morbi cursus libero lectus, at tempus odio varius eget. Quisque finibus urna sed cursus rhoncus. Vivamus faucibus cursus imperdiet. Phasellus interdum sapien in odio rutrum, ut ornare turpis dictum. Sed mauris dui, molestie in odio nec, vulputate venenatis ex.`;

for (let i = 1; i < 64; i++) {
  const toSerialize = {};

  console.log(`2 ** ${i} (${2 ** i}) attributes`);

  for (let j = 0; j < 2 ** i; j++) {
    toSerialize[j] = PADDING_STRING;
  }

  const buffer = v8.serialize(toSerialize);

  console.log(`buffer length: ${buffer.length}`);

  v8.deserialize(buffer);
}

And here is the output on my system:

$ ./test-case.js

2 ** 1 (2) attributes
buffer length: 5593
2 ** 2 (4) attributes
buffer length: 11181
2 ** 3 (8) attributes
buffer length: 22357
2 ** 4 (16) attributes
buffer length: 44709
2 ** 5 (32) attributes
buffer length: 89413
2 ** 6 (64) attributes
buffer length: 178821
2 ** 7 (128) attributes
buffer length: 357702
2 ** 8 (256) attributes
buffer length: 715462
2 ** 9 (512) attributes
buffer length: 1430982
2 ** 10 (1024) attributes
buffer length: 2862022
2 ** 11 (2048) attributes
buffer length: 5724102
2 ** 12 (4096) attributes
buffer length: 11448262
2 ** 13 (8192) attributes
buffer length: 22896582
2 ** 14 (16384) attributes
buffer length: 45801415
2 ** 15 (32768) attributes
buffer length: 91611079
2 ** 16 (65536) attributes
buffer length: 183230407
2 ** 17 (131072) attributes
buffer length: 366469063
2 ** 18 (262144) attributes
buffer length: 732946375
2 ** 19 (524288) attributes
buffer length: 1465900999
2 ** 20 (1048576) attributes
buffer length: 2931810247
node:v8:344
  der.readHeader();
      ^

Error: Unable to deserialize cloned data.
    at DefaultDeserializer.readHeader (<anonymous>)
    at Object.deserialize (node:v8:344:7)
    at Object.<anonymous> (/Users/beau/p/zed-run/test-case.js:28:6)
    at Module._compile (node:internal/modules/cjs/loader:1101:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1153:10)
    at Module.load (node:internal/modules/cjs/loader:981:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:79:12)
    at node:internal/main/run_main_module:17:47

How often does it reproduce? Is there a required condition?

100% of the time if the buffer to deserialize exceeds 2gb.

What is the expected behavior?

Either the documentation is updated to reflect the limit or the limit is removed.

What do you see instead?

Please see the output above.

Additional information

v8's serialize and deserialize are fantastic and very performant for large datasets; much faster than the alternatives like e.g. msgpackr. I'd love to continue using them as my dataset grows so that I can put off re-architecting things a bit further, but I realize it's probably a very extreme use case. Documenting the limitation will save the next person who hits it some time, however! I spent a lot of time assuming that my file writing code was broken in some way (which I had to rewrite because it also has a 2gb limitation; I hit both limits at the same time but didn't realize deserialize had the same limit).

In the short term I will probably shard keys across multiple files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    v8 moduleIssues and PRs related to the "v8" subsystem.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions