Skip to content

unreliable codegen around vbroadcastss in .NET 8 release builds #96156

Closed
@twest820

Description

@twest820

Description

Code that runs fine in .NET 7 fails in .NET 8.

Reproduction Steps

This affects a couple helper methods which have been in place for years.

        public unsafe static Vector128<float> BroadcastScalarToVector128(float value)
        {
            return Avx.BroadcastScalarToVector128(&value);
        }

        public unsafe static Vector256<float> BroadcastScalarToVector256(float value)
        {
            return Avx.BroadcastScalarToVector256(&value);
        }

Expected behavior

Numerical simulations remain correct.

Actual behavior

Simulations experience numerical divergence as timesteps proceed, ultimately either completing with wrong results or crashing on ArgumentOutOfRangeExceptions raised by internal consistency checks detecting NaNs or other incorrect values resulting from things like numerical overflows in exponentiation.

Regression?

Yes. .NET 8 debug builds work fine. Flipping projects back to .NET 7 removes this problem. Rolling either of the affected repos I've detected so far back to .NET 7 and then moving projects forward to .NET 8 with no other changes also reproduces the error.

Known Workarounds

Use of Vector128.Create(value) or Vector256.Create(value) instead of calling Avx.BroadcastScalarToVectorNNN() directly has, so far, been successful in avoiding this issue.

Configuration

.NET 8, Win 10 22H2, AMD Zen 3. Yes, specific to .NET 8 and x64 intrinsics.

Other information

In .NET 7 the disassembly is

vmovss      dword ptr [rbp+18h],xmm1  
vbroadcastss xmm0,dword ptr [rbp+18h]
vmovss      dword ptr [rbp+18h],xmm1
vbroadcastss ymm0,dword ptr [rbp+18h]

In .NET 8 it's of the form

vmovss      dword ptr [rbp+18h],xmm1  
cmp         dword ptr [7FFE0A69A0F0h],0  
je          BroadcastScalarToVector128(Single)+025h (07FFE0A744805h)  
call        00007FFE697D0A10  
vbroadcastss xmm0,dword ptr [rbp+18h]  
vmovaps     xmmword ptr [rbp-20h],xmm0  
mov         rax,qword ptr [rbp+10h]  
vmovaps     xmm0,xmmword ptr [rbp-20h]  
vmovups     xmmword ptr [rax],xmm0  

However, the JIT from the apparent super quick workaround of

        public static Vector128<float> BroadcastScalarToVector128(float value)
        {
            return Vector128.Create(value);
        }

appears to be functionally identical, both locally

vmovss      dword ptr [rbp+18h],xmm1  
cmp         dword ptr [7FFDEE0E45C0h],0  
je          BroadcastScalarToVector128(Single)+025h (07FFDEE338F15h)  
call        00007FFE4D290A10  
vbroadcastss xmm0,dword ptr [rbp+18h]  
vmovaps     xmmword ptr [rbp-20h],xmm0  
mov         rax,qword ptr [rbp+10h]  
vmovaps     xmm0,xmmword ptr [rbp-20h]  
vmovups     xmmword ptr [rax],xmm0  

and in the surrounding call sites I've checked. After several hours' investigation my current best guess is there's some small fraction out of hundreds of calls where rbp isn't stable, leading vbroadcastss to sometimes load the wrong value.

I'm not particularly concerned about this issue—looking at what .NET 8 JIT does, removing the entire helper class as legacy code and changing to inline VectorNNN.Create() calls makes sense—but it strikes me as strange enough to be worth documenting, particularly should someone else happen to hit it. Our code's nearly all floating point so I don't have a read on whether the vpbroadcast instructions are also affected.

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions