Skip to content

base64 decoder could be 2x faster when decoding wrapped base64 #12114

Closed
@jorangreef

Description

@jorangreef

Node's base64 decoder currently uses a fast decoder and a slow decoder.

The fast decoder decodes 32-bit words at a time. If it sees a line-break or whitespace or garbage, then it switches permanently to the slow decoder which decodes a byte at a time, with a conditional branch per byte, instead of per 32-bit word.

I did some rough benchmarking to compare decoding a 4mb random buffer encoded as base64, and decoding the same base64 but with CRLFs added every 76 chars as per MIME base64:

Decode Fast: 8ms
Decode Fast: 9ms
Decode Fast: 9ms
Decode Slow (wrapped every 76 chars): 30ms
Decode Slow (wrapped every 76 chars): 20ms
Decode Slow (wrapped every 76 chars): 22ms

As far as I can see, there's no reason to switch permanently to the slow decoder. If the fast decoder detects that the 32-bit word contains an invalid character, it could just decode the next few bytes byte-by-byte, and then switch back to fast mode as soon as it has consumed 4 valid base64 characters and outputted 3 bytes. This could all be a sub-branch after the 32-bit word check so it should not affect the performance of the fast decoder in any way.

For base64 decoding MIME data, this should nearly double the throughput since the slow case is triggered only every 76 bytes.

Metadata

Metadata

Assignees

Labels

bufferIssues and PRs related to the buffer subsystem.c++Issues and PRs that require attention from people who are familiar with C++.performanceIssues and PRs related to the performance of Node.js.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions