Skip to content

PGO for Memmove unrolling #84192

Closed
Closed

Description

Poll size passed to Buffer.Memmove in Tier0 in order to optimize it in Tier1 for the most popular length at the given callsite. A simulated scenario:

byte[] _src = new byte[50];
byte[] _dst = new byte[50];

[Benchmark]
public void CopyUnknownSize()
{
    _src.AsSpan().CopyTo(_dst);
}

[Benchmark]
public void CopyUnknownSize_PGO()
{
    // simulated PGO (Tier0 realized that we mostly do Memmove here for 50 elements)
    if (_src.Length == 50)
    {
        _src.AsSpan(0, 50).CopyTo(_dst); // unrolled for constant size via AVX
    }
    else
    {
        // fallback
        _src.AsSpan().CopyTo(_dst);
    }
}
Method Mean
CopyUnknownSize 1.6005 ns
CopyUnknownSize_PGO 0.5583 ns

The actual difference should be bigger becuase my simulation does some additional overhead with AsSpan(0, 50).
I think it's pretty straightforward to implement in JIT, the only part I'm not sure is the new PGO schema on vm side. cc @AndyAyersMS

PS: As far I know, native compilers do this sort of optimizations for memset/memcpy with PGO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions