Closed
Description
openedon Mar 31, 2023
Poll size passed to Buffer.Memmove in Tier0 in order to optimize it in Tier1 for the most popular length at the given callsite. A simulated scenario:
byte[] _src = new byte[50];
byte[] _dst = new byte[50];
[Benchmark]
public void CopyUnknownSize()
{
_src.AsSpan().CopyTo(_dst);
}
[Benchmark]
public void CopyUnknownSize_PGO()
{
// simulated PGO (Tier0 realized that we mostly do Memmove here for 50 elements)
if (_src.Length == 50)
{
_src.AsSpan(0, 50).CopyTo(_dst); // unrolled for constant size via AVX
}
else
{
// fallback
_src.AsSpan().CopyTo(_dst);
}
}
Method | Mean |
---|---|
CopyUnknownSize | 1.6005 ns |
CopyUnknownSize_PGO | 0.5583 ns |
The actual difference should be bigger becuase my simulation does some additional overhead with AsSpan(0, 50)
.
I think it's pretty straightforward to implement in JIT, the only part I'm not sure is the new PGO schema on vm side. cc @AndyAyersMS
PS: As far I know, native compilers do this sort of optimizations for memset/memcpy with PGO
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment