Skip to content

Performance optimization opportunities in common pixel formats. #2232

Closed
@JimBobSquarePants

Description

@JimBobSquarePants

Prerequisites

  • I have written a descriptive issue title
  • I have verified that I am running the latest version of ImageSharp
  • I have verified if the problem exist in both DEBUG and RELEASE mode
  • I have searched open and closed issues to ensure it has not already been reported

ImageSharp version

v3 alpha +

Other ImageSharp packages and versions

NA

Environment (Operating system, version and so on)

NA

.NET Framework version

NA

Description

As described here there are several performance opportunities can be implemented in many of our pixel format types. This should be fairly low hanging fruit with good return.

Notably on .NET 6/7, you could make this even more efficient by doing something like:

public void Pack(Vector4 vector)
{
    vector *= MaxBytes;
    vector += Half;
    vector = Vector4.Clamp(vector, Vector4.Zero, MaxBytes);

    Vector128<byte> result = Sse2.ConvertToVector128Int32WithTruncation(vector.AsVector128()).AsByte();
    // In .NET 7+ the above can be `result = Vector128.ConvertToInt32(vector.AsVector128()).AsByte()` so it works on Arm64 too

    R = result.GetElement(0);
    G = result.GetElement(4);
    B = result.GetElement(8);
    A = result.GetElement(12);
}

This converts all 4 elements at once and then extracts the truncated bytes directly:

vzeroupper
vmovupd xmm0, [0x7ffd160105c0]
vmovaps xmm1, xmm0
vmulps xmm1, xmm1, [rdx]
vmovupd [rdx], xmm1
vmovupd xmm1, [0x7ffd160105d0]
vaddps xmm1, xmm1, [rdx]
vmovupd [rdx], xmm1
vmovupd xmm1, [rdx]
vxorps xmm2, xmm2, xmm2
vmaxps xmm1, xmm1, xmm2
vminps xmm0, xmm1, xmm0
vmovupd [rdx], xmm0
vcvttps2dq xmm0, [rdx]
vpextrb eax, xmm0, 0
mov [rcx+2], al
vpextrb eax, xmm0, 4
mov [rcx+1], al
vpextrb eax, xmm0, 8
mov [rcx], al
vpextrb eax, xmm0, 0xc
mov [rcx+3], al
ret
  • We'll also be improving the codegen around vpextrb more in the future so it can be just vpextrb [rcx+2], xmm0, 0 instead of vpextrb eax, xmm0, 0 followed by mov [rcx+2], al.

You can also optimize in .NET 6+ by directly using Vector128.Create(). This creates a method local constant and avoids the static initializer entirely:

    private static Vector4 MaxBytes => Vector128.Create(255f).AsVector4();
    private static Vector4 Half => Vector128.Create(0.5f).AsVector4();

Steps to Reproduce

NA

Images

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions