Description
The byte sequence 237, 166, 164
is not valid utf8, since it encodes a surrogate code point, which is not a valid unicode scalar value. So Buffer.from([237, 166, 164]).toString('utf8')
should error. But instead, it returns a string, effectively implementing wtf-8 rather than utf-8.
Or does Buffer.toString
simply not provide any validity guarantees at all, returning garbage strings if the buffer contains invalid input? In that case, please document this as expected behavior, since it makes the function completely useless for a bunch of use cases.
node -v
: v10.11.0
uname -a
: Linux aljoscha-laptop 4.18.10-arch1-1-ARCH #1 SMP PREEMPT Wed Sep 26 09:48:22 UTC 2018 x86_64 GNU/Linux
See also rust-lang/rust#54845
edit: This also leaks into JSON.parse
, which can accept garbage strings even though ECMA-404 (the json standard prescribed for JSON.parse as defined in ECMAScript) only allows valid utf8 input.