The C# standard library has fast (SIMD-based) base64 encoding functions, but it lacks really fast base64 decoding function. The initial work that lead to the fast functions in the runtime was carried out by gfoidl.
- There are accelerated base64 functions for UTF-8 inputs in the .NET runtime, but they are not optimal: we can make them 50% faster or more.
- There is no accelerated base64 functions for UTF-16 inputs (e.g.,
stringtypes). We can be several times faster.
The goal of this project is to provide the fast WHATWG forgiving-base64 algorithm already used in major JavaScript runtimes (Node.js and Bun) to C#.
Importantly, we only focus on base64 decoding. It is a more challenging problem than base64 encoding because of the presence of allowable white space characters and the need to validate the input. Indeed, all inputs are valid for encoding, but only some inputs are valid for decoding. Having to skip white space characters makes accelerated decoding somewhat difficult.
We use the enron base64 data for benchmarking, see benchmark/data/email.
We process the data as UTF-8 (ASCII) using the .NET accelerated functions
as a reference (System.Buffers.Text.Base64.DecodeFromUtf8).
| processor | SimdBase64(GB/s) | .NET speed (GB/s) | speed up |
|---|---|---|---|
| Apple M2 processor (ARM) | 6.5 | 3.8 | 1.7 x |
| Intel Ice Lake (AVX2) | 6.6 | 3.4 | 1.9 x |
| Intel Ice Lake (SSSE3) | 4.9 | 3.4 | 1.4 x |
Our results are more impressive when comparing against the standard base64 string decoding
function (Convert.FromBase64String(mystring)), but it is explained in part by the fact
that the .NET team did not accelerated them using SIMD instructions. Thus we omit them, only
comparing with the SIMD-accelerated .NET functions.
We require .NET 9 or better: https://dotnet.microsoft.com/en-us/download/dotnet/9.0
The library only provides Base64 decoding functions, because the .NET library already has fast Base64 encoding functions.
string base64 = "SGVsbG8sIFdvcmxkIQ==";
byte[] buffer = new byte[SimdBase64.Base64.MaximalBinaryLengthFromBase64(base64.AsSpan())];
int bytesConsumed; // gives you the number of characters consumed
int bytesWritten;
var result = SimdBase64.Base64.DecodeFromBase64(base64.AsSpan(), buffer, out bytesConsumed, out bytesWritten, false); // false is for regular base64, true for base64url
// result == OperationStatus.Done
// Encoding.UTF8.GetString(buffer.AsSpan().Slice(0, bytesWritten)) == "Hello, World!"dotnet test
To get a list of available tests, enter the command:
dotnet test --list-tests
To run specific tests, it is helpful to use the filter parameter:
dotnet test -c Release --filter DecodeBase64CasesScalar
To run the benchmarks, run the following command:
cd benchmark
dotnet run -c Release
To run just one benchmark, use a filter:
cd benchmark
dotnet run -c Release --filter "SimdUnicodeBenchmarks.RealDataBenchmark.AVX2DecodingRealDataUTF8(FileName: \"data/email/\")"
If you are under macOS or Linux, you may want to run the benchmarks in privileged mode:
cd benchmark
sudo dotnet run -c Release
cd src
dotnet build
We recommend you use dotnet format. E.g.,
cd test
dotnet format
You can print the content of a vector register like so:
public static void ToString(Vector256<byte> v)
{
Span<byte> b = stackalloc byte[32];
v.CopyTo(b);
Console.WriteLine(Convert.ToHexString(b));
}
public static void ToString(Vector128<byte> v)
{
Span<byte> b = stackalloc byte[16];
v.CopyTo(b);
Console.WriteLine(Convert.ToHexString(b));
}You can convert an integer to a hex string like so: $"0x{MyVariable:X}".
- Be careful:
Vector128.Shuffleis not the same asSsse3.Shufflenor isVector256.Shufflethe same asAvx2.Shuffle. Prefer the latter. - Similarly
Vector128.Shuffleis not the same asAdvSimd.Arm64.VectorTableLookup, use the latter. stackallocarrays should probably not be used in class instances.- In C#,
structmight be preferable toclassinstances as it makes it clear that the data is thread local. - You can ask for an asm dump:
DOTNET_JitDisasm=NEON64HTMLScan dotnet run -c Release. See Viewing JIT disassembly and dumps.
- Wojciech Muła, Daniel Lemire, Base64 encoding and decoding at almost the speed of a memory copy, Software: Practice and Experience 50 (2), 2020.
- Wojciech Muła, Daniel Lemire, Faster Base64 Encoding and Decoding using AVX2 Instructions, ACM Transactions on the Web 12 (3), 2018.
- base64 encoding with simd-support
- gfoidl.Base64: original code that lead to the SIMD-based code in the runtime
- simdutf's base64 decode
- WHATWG forgiving-base64 decode