Skip to content

TextDecoder: ERR_ENCODING_INVALID_ENCODED_DATA on very long array buffer #47645

Open
@martian17

Description

@martian17

Version

v18.14.1

Platform

Linux 5.19.0-38-generic #39~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 17 21:16:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

When I try to decode a long utf-16le encoded buffer, ERR_ENCODING_INVALID_ENCODED_DATA is thrown instead of ERR_STRING_TOO_LONG.

new TextDecoder("utf-16le").decode(new Uint16Array(2**27).fill(48))
// Uncaught TypeError: The encoded data was not valid for encoding utf-16le
//     at TextDecoder.decode (node:internal/encoding:448:14) {
//   code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
// }

The default encoding version seems to work correctly, and throws an appropriate error

new TextDecoder().decode(new Uint8Array(2**29).fill(48))
// Uncaught Error: Cannot create a string longer than 0x1fffffe8 characters
//     at TextDecoder.decode (node:internal/encoding:433:16) {
//   code: 'ERR_STRING_TOO_LONG'
// }

Another thing that I realized is that TextDecoder() seems to be capable of consuming an array buffer twice as long as TextDecoder("utf-16le") without throwing error, and produce a string that's 4 times as long.

How often does it reproduce? Is there a required condition?

Confirmed this bug in both normal file execution and node.js repl

What is the expected behavior? Why is that the expected behavior?

new TextDecoder("utf-16le") should be able to create a string up to 0x1fffffe8 characters.
It should throw ERR_STRING_TOO_LONG when this length is exceeded.

What do you see instead?

ERR_ENCODING_INVALID_ENCODED_DATA is thrown when the input Uint16Array length is 2**27

Uncaught TypeError: The encoded data was not valid for encoding utf-16le
    at TextDecoder.decode (node:internal/encoding:448:14) {
  code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
}

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    utilIssues and PRs related to the built-in util module.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions