Closed
Description
Prerequisites
- I have written a descriptive issue title
- I have verified that I am running the latest version of ImageSharp
- I have verified if the problem exist in both
DEBUG
andRELEASE
mode - I have searched open and closed issues to ensure it has not already been reported
ImageSharp version
v3 alpha +
Other ImageSharp packages and versions
NA
Environment (Operating system, version and so on)
NA
.NET Framework version
NA
Description
As described here there are several performance opportunities can be implemented in many of our pixel format types. This should be fairly low hanging fruit with good return.
Notably on .NET 6/7, you could make this even more efficient by doing something like:
public void Pack(Vector4 vector)
{
vector *= MaxBytes;
vector += Half;
vector = Vector4.Clamp(vector, Vector4.Zero, MaxBytes);
Vector128<byte> result = Sse2.ConvertToVector128Int32WithTruncation(vector.AsVector128()).AsByte();
// In .NET 7+ the above can be `result = Vector128.ConvertToInt32(vector.AsVector128()).AsByte()` so it works on Arm64 too
R = result.GetElement(0);
G = result.GetElement(4);
B = result.GetElement(8);
A = result.GetElement(12);
}
This converts all 4 elements at once and then extracts the truncated bytes directly:
vzeroupper
vmovupd xmm0, [0x7ffd160105c0]
vmovaps xmm1, xmm0
vmulps xmm1, xmm1, [rdx]
vmovupd [rdx], xmm1
vmovupd xmm1, [0x7ffd160105d0]
vaddps xmm1, xmm1, [rdx]
vmovupd [rdx], xmm1
vmovupd xmm1, [rdx]
vxorps xmm2, xmm2, xmm2
vmaxps xmm1, xmm1, xmm2
vminps xmm0, xmm1, xmm0
vmovupd [rdx], xmm0
vcvttps2dq xmm0, [rdx]
vpextrb eax, xmm0, 0
mov [rcx+2], al
vpextrb eax, xmm0, 4
mov [rcx+1], al
vpextrb eax, xmm0, 8
mov [rcx], al
vpextrb eax, xmm0, 0xc
mov [rcx+3], al
ret
- We'll also be improving the codegen around
vpextrb
more in the future so it can be justvpextrb [rcx+2], xmm0, 0
instead ofvpextrb eax, xmm0, 0
followed bymov [rcx+2], al
.
You can also optimize in .NET 6+ by directly using Vector128.Create()
. This creates a method local constant and avoids the static initializer entirely:
private static Vector4 MaxBytes => Vector128.Create(255f).AsVector4();
private static Vector4 Half => Vector128.Create(0.5f).AsVector4();
Steps to Reproduce
NA
Images
No response