Skip to content

Span bounds checking not fully eliminated in some cases when index is constant and length is known #1115

@gdkchan

Description

@gdkchan

I have some code that basically reads and writes to a given memory region, with a offset. I had one for each integer type, and also a generic one for structs etc.

I decided, with the use of spans, check if I could remove all the integer read/write methods and just use the generic method instead. So instead of:

int x = ReadInt32(offset);

I could just do:

int x = Read<int>(offset);

After checking the code generated for both, I noticed that the one generated for the generic variant was almost as good as the specialized one, with the exception of an odd issue, the bounds check was not fully eliminated.

Here's some test code that highlights the issue:

using System;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

public class C
{
    private IntPtr _ptr;
    
    public C(IntPtr ptr)
    {
        _ptr = ptr;
    }
    
    public int ReadInt32(int offset)
    {
        return Read<int>(offset);
    }
    
    public unsafe int ReadInt32Fast(int offset)
    {
        return *(int*)(_ptr + offset);
    }
    
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public T Read<T>(int offset) where T : struct
    {
        return MemoryMarshal.Cast<byte, T>(GetDataSpan(offset, Unsafe.SizeOf<T>()))[0];
    }
    
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    private unsafe Span<byte> GetDataSpan(int offset, int size)
    {
        return new Span<byte>((void*)(_ptr + offset), size);
    }
}

SharpLab link here.
Code generated for the ReadInt32 method:

C.ReadInt32(Int32)
    L0000: sub rsp, 0x28
    L0004: mov rax, [rcx+0x8]
    L0008: movsxd rdx, edx
    L000b: add rax, rdx
    L000e: mov edx, 0x1
    L0013: cmp edx, 0x0
    L0016: jbe L0025
    L0018: mov eax, [rax]
    L001a: add rsp, 0x28
    L001e: ret
    L001f: call 0x7fff231aed50
    L0024: int3
    L0025: call 0x7fff231aef00
    L002a: int3

This is the odd part:

    L000e: mov edx, 0x1
    L0013: cmp edx, 0x0
    L0016: jbe L0025

It's doing a comparison with constant values. As far I can tell, 1 is the length and 0 the index. Since they are constant, constant folding should evaluate 1U <= 0U == false, and so we know that the branch is never going to be taken, since the condition is always false. So the comparison and jump should be eliminated, and then the other part of the code would become unreachable. It looks like L001f is already unreachable aswell, so I don't know why the code is still there. It looks like that whichever optimization pass is responsible for optimizing length checks runs too late on the compiler pipeline, and so the other optimizations are not kicking in.

The goal is generating the same or as efficient code for ReadInt32 and ReadInt32Fast.

category:cq
theme:value-numbering
skill-level:intermediate
cost:medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions