Skip to content

Commit 98ddb75

Browse files
lgritzscott-wilson
authored andcommitted
fix(simd.h): fix longstanding probem with 16-wide bitcast for 8-wide HW (AcademySoftwareFoundation#4268)
We've been plagued on and off for a long time with spurious wrong results in our SIMD versions of exp and log. It was really squirrelly, came and went with changes to seemingly unrelated code, and had long periods of time with no symptoms or ability to reproduce it. It just started popping up again in CI in the dev-2.5 branch, consistently. Today I think I finally tracked it down for real. I believe the problem was actually in the SIMD bitcast_to_int and bitcast_to_float functions, for the variety that was 8-wide when running on 4-wide hardware, and the 16-wide one when running on 8-wide (or less) hardware. Background: When real 16-wide HW is not available, we define the 16-wide vectors as an array of two 8-wide vectors. (Same for 8 vs 4, I'll spare you explaining it all twice.) So most of the 16 wide functions boil down to: vfloat16 op (vfloat16 arg, ...) { #if 16 wide HW is available return real_16_wide_intrinsic(arg); #else /* recursively define in terms of 8-wide halves */ return vfloat16(op(arg.lo()), op(arg.hi())); #endif } But for the bitcast operators, we did something different, a shade too clever: we just did a pointer cast, dereference and return: vint16 bitcast_to_int (const vfloat16& x) { #if OIIO_SIMD_AVX >= 512 return _mm512_castps_si512 (x.simd()); #else return *(vint16 *)&x; #endif } I believe that the problem is that this cast is undefined behavior for the case where the vector x was broken into two halves, and by being in the middle of a long code sequence of SIMD ops, the two halves happened to be in registers at that moment. Basically, we are taking the address, and it just doesn't seem to always understand that it needs to materialize the whole thing in memory for the pointer cast to work properly. I removed the pointer cast and replaced with return vfloat16(bitcast_to_float(x.lo()), bitcast_to_float(x.hi())); basically falling back on the usual recursive definition of 16-wide operations to being in terms of 8-wide operations when 16-wide HW is not available. This totally clears up the problem since we're no longer potentially relying on needing the address of something that has no address at that moment. A couple other spots (andnot) also had dubious casts that I fixed in the same way. N.B.: The 16-wide cast from bool to int had to do something slightly different, because it's represented differently and isn't a true bitcast. Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Scott Wilson <scott@propersquid.com>
1 parent 6fa37f5 commit 98ddb75

File tree

2 files changed

+12
-12
lines changed

2 files changed

+12
-12
lines changed

src/include/OpenImageIO/fmath.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1852,7 +1852,7 @@ template<typename T>
18521852
OIIO_FORCEINLINE OIIO_HOSTDEVICE T fast_exp2 (const T& xval) {
18531853
using namespace simd;
18541854
typedef typename T::vint_t intN;
1855-
#if OIIO_SIMD_SSE && !OIIO_MSVS_BEFORE_2022
1855+
#if OIIO_SIMD_SSE || OIIO_SIMD_NEON
18561856
// See float specialization for explanations
18571857
T x = clamp (xval, T(-126.0f), T(126.0f));
18581858
intN m (x); x -= T(m);

src/include/OpenImageIO/simd.h

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5686,7 +5686,7 @@ OIIO_FORCEINLINE vint8 bitcast_to_int (const vbool8& x)
56865686
#if OIIO_SIMD_AVX
56875687
return _mm256_castps_si256 (x.simd());
56885688
#else
5689-
return *(vint8 *)&x;
5689+
return vint8(bitcast_to_int(x.lo()), bitcast_to_int(x.hi()));
56905690
#endif
56915691
}
56925692

@@ -7738,9 +7738,9 @@ OIIO_FORCEINLINE vfloat4 andnot (const vfloat4& a, const vfloat4& b) {
77387738
#if OIIO_SIMD_SSE
77397739
return _mm_andnot_ps (a.simd(), b.simd());
77407740
#else
7741-
const int *ai = (const int *)&a;
7742-
const int *bi = (const int *)&b;
7743-
return bitcast_to_float (vint4(~(ai[0]) & bi[0],
7741+
vint4 ai = bitcast_to_int(a);
7742+
vint4 bi = bitcast_to_int(b);
7743+
return bitcast_to_float(vint4(~(ai[0]) & bi[0],
77447744
~(ai[1]) & bi[1],
77457745
~(ai[2]) & bi[2],
77467746
~(ai[3]) & bi[3]));
@@ -9157,7 +9157,7 @@ OIIO_FORCEINLINE vint8 bitcast_to_int (const vfloat8& x)
91579157
#if OIIO_SIMD_AVX
91589158
return _mm256_castps_si256 (x.simd());
91599159
#else
9160-
return *(vint8 *)&x;
9160+
return vint8(bitcast_to_int(x.lo()), bitcast_to_int(x.hi()));
91619161
#endif
91629162
}
91639163

@@ -9166,7 +9166,7 @@ OIIO_FORCEINLINE vfloat8 bitcast_to_float (const vint8& x)
91669166
#if OIIO_SIMD_AVX
91679167
return _mm256_castsi256_ps (x.simd());
91689168
#else
9169-
return *(vfloat8 *)&x;
9169+
return vfloat8(bitcast_to_float(x.lo()), bitcast_to_float(x.hi()));
91709170
#endif
91719171
}
91729172

@@ -9395,9 +9395,9 @@ OIIO_FORCEINLINE vfloat8 andnot (const vfloat8& a, const vfloat8& b) {
93959395
#if OIIO_SIMD_AVX
93969396
return _mm256_andnot_ps (a.simd(), b.simd());
93979397
#else
9398-
const int *ai = (const int *)&a;
9399-
const int *bi = (const int *)&b;
9400-
return bitcast_to_float (vint8(~(ai[0]) & bi[0],
9398+
vint8 ai = bitcast_to_int(a);
9399+
vint8 bi = bitcast_to_int(b);
9400+
return bitcast_to_float(vint8(~(ai[0]) & bi[0],
94019401
~(ai[1]) & bi[1],
94029402
~(ai[2]) & bi[2],
94039403
~(ai[3]) & bi[3],
@@ -10030,7 +10030,7 @@ OIIO_FORCEINLINE vint16 bitcast_to_int (const vfloat16& x)
1003010030
#if OIIO_SIMD_AVX >= 512
1003110031
return _mm512_castps_si512 (x.simd());
1003210032
#else
10033-
return *(vint16 *)&x;
10033+
return vint16(bitcast_to_int(x.lo()), bitcast_to_int(x.hi()));
1003410034
#endif
1003510035
}
1003610036

@@ -10039,7 +10039,7 @@ OIIO_FORCEINLINE vfloat16 bitcast_to_float (const vint16& x)
1003910039
#if OIIO_SIMD_AVX >= 512
1004010040
return _mm512_castsi512_ps (x.simd());
1004110041
#else
10042-
return *(vfloat16 *)&x;
10042+
return vfloat16(bitcast_to_float(x.lo()), bitcast_to_float(x.hi()));
1004310043
#endif
1004410044
}
1004510045

0 commit comments

Comments
 (0)