Description
Description
Code that runs fine in .NET 7 fails in .NET 8.
Reproduction Steps
This affects a couple helper methods which have been in place for years.
public unsafe static Vector128<float> BroadcastScalarToVector128(float value)
{
return Avx.BroadcastScalarToVector128(&value);
}
public unsafe static Vector256<float> BroadcastScalarToVector256(float value)
{
return Avx.BroadcastScalarToVector256(&value);
}
Expected behavior
Numerical simulations remain correct.
Actual behavior
Simulations experience numerical divergence as timesteps proceed, ultimately either completing with wrong results or crashing on ArgumentOutOfRangeException
s raised by internal consistency checks detecting NaN
s or other incorrect values resulting from things like numerical overflows in exponentiation.
Regression?
Yes. .NET 8 debug builds work fine. Flipping projects back to .NET 7 removes this problem. Rolling either of the affected repos I've detected so far back to .NET 7 and then moving projects forward to .NET 8 with no other changes also reproduces the error.
Known Workarounds
Use of Vector128.Create(value)
or Vector256.Create(value)
instead of calling Avx.BroadcastScalarToVectorNNN()
directly has, so far, been successful in avoiding this issue.
Configuration
.NET 8, Win 10 22H2, AMD Zen 3. Yes, specific to .NET 8 and x64 intrinsics.
Other information
In .NET 7 the disassembly is
vmovss dword ptr [rbp+18h],xmm1
vbroadcastss xmm0,dword ptr [rbp+18h]
vmovss dword ptr [rbp+18h],xmm1
vbroadcastss ymm0,dword ptr [rbp+18h]
In .NET 8 it's of the form
vmovss dword ptr [rbp+18h],xmm1
cmp dword ptr [7FFE0A69A0F0h],0
je BroadcastScalarToVector128(Single)+025h (07FFE0A744805h)
call 00007FFE697D0A10
vbroadcastss xmm0,dword ptr [rbp+18h]
vmovaps xmmword ptr [rbp-20h],xmm0
mov rax,qword ptr [rbp+10h]
vmovaps xmm0,xmmword ptr [rbp-20h]
vmovups xmmword ptr [rax],xmm0
However, the JIT from the apparent super quick workaround of
public static Vector128<float> BroadcastScalarToVector128(float value)
{
return Vector128.Create(value);
}
appears to be functionally identical, both locally
vmovss dword ptr [rbp+18h],xmm1
cmp dword ptr [7FFDEE0E45C0h],0
je BroadcastScalarToVector128(Single)+025h (07FFDEE338F15h)
call 00007FFE4D290A10
vbroadcastss xmm0,dword ptr [rbp+18h]
vmovaps xmmword ptr [rbp-20h],xmm0
mov rax,qword ptr [rbp+10h]
vmovaps xmm0,xmmword ptr [rbp-20h]
vmovups xmmword ptr [rax],xmm0
and in the surrounding call sites I've checked. After several hours' investigation my current best guess is there's some small fraction out of hundreds of calls where rbp
isn't stable, leading vbroadcastss
to sometimes load the wrong value.
I'm not particularly concerned about this issue—looking at what .NET 8 JIT does, removing the entire helper class as legacy code and changing to inline VectorNNN.Create()
calls makes sense—but it strikes me as strange enough to be worth documenting, particularly should someone else happen to hit it. Our code's nearly all floating point so I don't have a read on whether the vpbroadcast
instructions are also affected.