Description
The browser's TextDecoder.prototype.decode
treats surrogates (U+D800
through U+DFFF
) differently to Utf8Decoder.
This makes it difficult to use TextDecoder to accelerate conversion.
Acceleration is highly desirable - it improves one binary protobuf benchmark by 8x.
The main difference is that Utf8Decoder converts surrogates into a code point, but TextDecoder considers a surrogate to be an error and, depending on the fatal
option, either throws an error, or decodes the surrogate to U+FFFD REPLACEMENT CHARACTER.
It is not possible to get acceptable performance for allowMalformed: true
by trying with {fatal: true}
and catching the Error and re-decoding with the slow code. Throwing the error is ~1000x more expensive.
Everything would be simpler if Utf8Decoder
was completely aligned with TextDecoder.decode
.
I have also verified that for other malformed inputs, TextDecoder and Utf8Decoder disagree on the number of U+FFFD replacements generated.