-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
I have some code that basically reads and writes to a given memory region, with a offset. I had one for each integer type, and also a generic one for structs etc.
I decided, with the use of spans, check if I could remove all the integer read/write methods and just use the generic method instead. So instead of:
int x = ReadInt32(offset);
I could just do:
int x = Read<int>(offset);
After checking the code generated for both, I noticed that the one generated for the generic variant was almost as good as the specialized one, with the exception of an odd issue, the bounds check was not fully eliminated.
Here's some test code that highlights the issue:
using System;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
public class C
{
private IntPtr _ptr;
public C(IntPtr ptr)
{
_ptr = ptr;
}
public int ReadInt32(int offset)
{
return Read<int>(offset);
}
public unsafe int ReadInt32Fast(int offset)
{
return *(int*)(_ptr + offset);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public T Read<T>(int offset) where T : struct
{
return MemoryMarshal.Cast<byte, T>(GetDataSpan(offset, Unsafe.SizeOf<T>()))[0];
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private unsafe Span<byte> GetDataSpan(int offset, int size)
{
return new Span<byte>((void*)(_ptr + offset), size);
}
}SharpLab link here.
Code generated for the ReadInt32 method:
C.ReadInt32(Int32)
L0000: sub rsp, 0x28
L0004: mov rax, [rcx+0x8]
L0008: movsxd rdx, edx
L000b: add rax, rdx
L000e: mov edx, 0x1
L0013: cmp edx, 0x0
L0016: jbe L0025
L0018: mov eax, [rax]
L001a: add rsp, 0x28
L001e: ret
L001f: call 0x7fff231aed50
L0024: int3
L0025: call 0x7fff231aef00
L002a: int3
This is the odd part:
L000e: mov edx, 0x1
L0013: cmp edx, 0x0
L0016: jbe L0025
It's doing a comparison with constant values. As far I can tell, 1 is the length and 0 the index. Since they are constant, constant folding should evaluate 1U <= 0U == false, and so we know that the branch is never going to be taken, since the condition is always false. So the comparison and jump should be eliminated, and then the other part of the code would become unreachable. It looks like L001f is already unreachable aswell, so I don't know why the code is still there. It looks like that whichever optimization pass is responsible for optimizing length checks runs too late on the compiler pipeline, and so the other optimizations are not kicking in.
The goal is generating the same or as efficient code for ReadInt32 and ReadInt32Fast.
category:cq
theme:value-numbering
skill-level:intermediate
cost:medium