SIMDe: update to v0.8.4-rc1+0faa907b2 #1022

mr-c · 2025-07-27T11:50:13Z

For the post v0.8.4-rc1 changes, see simd-everywhere/simde-no-tests@v0.8.4-rc1...0faa907

mr-c · 2025-07-27T12:21:08Z

I'm working on a -pedantic fix over in simd-everywhere/simde#1302

0faa907b2 gcc pedantic: fp16 is not part of ISO C, silence the warning f53a9cf79 gcc pedantic: also silence this other warning about __int128 59f779845 arm neon: Add float16 multi-vectors to native aliases 4b279d62e https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100927 was fixed in GCC 15.x 5c8f50ec1 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95782 was fixed in GCC 13 677f2cbee Avoid undefined behaviour with signed integer multiplication (#1296) 2096f755e arm64 gcc FRINT: skip native call on GCC 3ea330475 x86 sse2 for loongarch: fix GCC build failure (#1287) a532a12ca riscv64: Fallback to autovec without mrvv-vector-bits flag. (#1282) 85632ca82 arm neon riscv64: add min.h and max.h RVV implementations. (#1283) ca1e942d9 neon riscv64: Enable RVV segment load/store only when we have `__riscv_zvlsseg` flag. (#1285) cf8e6a73d riscv64: Enable V feature when both zve64d and zvl128b are present (#1284) c7f26b73b x86 avx for loongarch: use vfcmp_clt to save one instruction in `_mm_cmp_{sd,ss}` and `_mm256_cmp_pd` a8ae10d96 x86 sse2,avx2 loongarch impl: let compiler to generate instructions based on imm8 bb0282e3b x86 misc fixes for AVX512{F,VL}_NATIVE d458d8fdd x86 sse2,sse3, avx: silence some false-positive warnings about unitialized structs 4184e0d42 start preparing to release SIMDe 0.8.4 87ecd64a5 x86 sse2: fix overflow error detected by clang scan-build in simde_mm_srl_epi{16,32,64} when count is too high ca9449c1e Fix incorrect UQRSHL implementation. 8d90b0411 arm neon: fix `cmla{_rot{90,180,270},}_lane` with correct test-suite on ARMv8.3 system 500454a2a arm neon: replace use of SIMDE_ARCH_ARM_CHECK(8+) with feature checks. 02ba92220 arm neon gcc-12 FRINT workaround 02d815773 arm neon FCMLA with 16-bit floats, requires the FP16 feature 8caaee795 arm neon: FRINT{32,64}{X,Z} native calls require ARMv8.5 438ddcff6 remove extraneous semicolons from many macro-defined functions 9f73373ff wasm simd128: fix a FAST_NANS error on arm64 0bd19a993 Fix vqdmulhs_s32 native alias. 62f40d4b8 x86 avx2: small fixes for loongarch d656b4d7e x86 sse2: small fixes for loongarch 8f56d4ff1 Remove incorrect qrdmulh SSE code. 8c421df17 arm neon: define native alias only under the inverse of the conditions of a pass-through 25e70ce71 simde-aes: gcc 13.2+ ignore unused variable warnings 69c9cd5c3 arm neon qdmlal: fix saturation (#1194) 34136823c Fix vqshlud_n_s64 implementation to be 64-bit. 483a4bccf Fix qdmlsl instructions f275fffd9 arm neon qshl: Fix UQSHL to match hardware. Add extensive test vectors. (#1256) d95bd9d76 arm neon qdmull: Fix SQDMULL implementation for 32-bit inputs. (#1255) 4b9007046 x86 sse2: fix `_mm_pause` for RISCV systems 0be41ec7c risc64 gcc-14: Disable uninitialized variable warnings for some ARM neon SM3 functions 70fc574b2 arm: Rename ARM ROL/ROR functions with a SIMDE prefix. a39bd6dde arm neon sli_n: Fix invalid shift warnings (#1253) 7bd2bb70e arm neon `_vext_p6`: reverse logic to avoid GCC14 i586 bug (#1251) b4bf72e14 x86 clmul simde_x_bitreverse_u64: add loongarch implementation (#1249) 04f9b4ca6 x86 avx: reoptimized simde_mm256_addsub_ps/d with lasx 54d352981 x86 fma: add loongarch lasx optimized implementations adefb8dcb x86 f16c: add loongarch lasx optimized implementations b720dcb7d x86 avx512f: added fmaddsub implementation (#1246) 5c9f6aa19 x86 sse4.2: add loongarch lsx optimized implementations 783703714 x86 sse4.1: add loongarch lsx optimized implementations 0bfc2312f x86 ssse3: add loongarch lsx optimized implementations fcae0eee0 x86 sse3: add loongarch lsx optimized implementations af6467260 x86 sse: add loongarch lsx optimized implementations 2ad64c9f7 x86 avx2: add loongarch lasx optimized implementations (#1241) 5cae2261b x86 avx: add loongarch lasx optimized implementations (#1239) 484fcce25 x86 avx: use INT64_C when the destination is i64 (#1238) 5e225b1c6 loongarch: add lsx support for sse2.h 665d7f93b fix clang type redef error b0fcc6176 Whoops, missing comma fe262fb0e loongarch float16: use a portable version to avoid compilation errors 1a09d3bc9 x86: move definition of 'value' to correct branch in _mm_loadl_epi64 aac583326 x86: some better implementations for MSVC and others without SIMDE_STATEMENT_EXPR_ d1afb3db1 arm crc32: define SIMDE_ARCH_ARM_CRC32 and consistently use it 592f8f0c4 _mm256_storeu_pd and _mm256_loadu_pd using 128 bit lanes de4337e8d gcc-14 -O3 complained about some possible unitialized values 8b0937a3e neon/cvz z/Arch: stop using deprecated functions. e18dcd7d0 arm neon: avoid GCC 11 vst1_*_x4 built-in functions 848fb7777 arm neon: fix arm64 gcc11 build excess elements in vector failure 0aaf78298 x86/sse: Fix type convert error for LSX. 29c96207c arm wasm: add vst2_u8 translation to Wasm SIMD 375ad48fd arm wasm: add vshll translations to Wasm SIMD d5697fa99 arm wasm: add vst4_u8 translation to Wasm SIMD e235b2eb1 math: typo fix, check SIMDE_MATH_NANF instead of the old-style SIMDE_NANF cb4b08c47 wasm AltiVec: add u16x8 and u8x16 avgr translations 90237caba wasm NEON: add u16x8 and u8x16 avgr translations 6050906e9 arm neon vminnmv_f16: remove duplicate statement (#1208) a3d20d145 x86 wasm: Wasm SIMD version of `_mm_sad_epu8` 32650204e msvc: add simde_MemoryBarrier to avoid including <windows.h> 7ca5a3e0b x86/fma: Use 128 bit fnmadd_pd to do 256 bit fnmadd_pd (#1197) 2ec1f51f8 pow: consistently use simde_math_pow 80f655739 x86: remove redundant mm_add_pd translation for WASM (#1190) 249b9dc03 arm/neon riscv64: additional RVV implementations - part 2. (#1189) 408d06a35 arm/neon riscv64: additional RVV implementations - part1 (#1188) da5cf1f54 Use _Float16 in C++ on aarch64 with GCC 13+ 39f436a9e Don't use _Float16 on non-SSE2 x86 985c27100 Don't use _Float16 on s390x 787830467 x86: Apply half tabular method in _mm_crc32 family d8a0c764f arm: improve performance in vqadd and vmvn in risc-v 99c63a427 neon: avoid warnings when "__ARM_NEON_FP" is not defined. e98cbcc70 start next development cycle: v0.8.3 3442dbf2d prepare to release 0.8.0 e6afb7bec arm neon: Fully remove the problematic FCVTZS/FCVTMS/FCVTPS/FCVTNS family intrinsics fb73a3182 arm: improve performance in vabd_xxx for risc-v 8a4ff7a8b arm: improve performance in vhadd_xxx for risc-v 52f1087ad arm: Add neon2rvv support in vand series intrinsics 737e3b33f arm: fix some neon2rvv intrinsic function error 5242a77dc arm: enable more intrinsic function for armv7 8f123e5c0 wasm x86 impl: some were incorrectly marked SSE instead of SSE2 2b9b01269 arm x86 implementations: allow _m128 access from SSE 6679ff018 svml: SSE is good enough for native m128i and m128d types & functions 68aac3b9a sse2 MSVC `_mm_pause` implementaiton for x86 e76f4331e typo fixes from codespell 73160356b x86 xop: fix some native functions 4ecf271be emscripten; use `__builtin_roundeven{f,}` from version 3.1.43 onwards 347e2b699 arm 32 bits: native def fixes; workarounds for gcc 61d1addce apple clang arm64: ignore SHA2 b58359225 arm platform: cleanup feature detection. e38f25685 arm neon sm3: check constant range ac2b229a1 arm neon: disable some FCVTZS/FCVTMS/FCVTPS/FCVTNS family intrinsics 1d7848cf9 arm neon clang: skip vrnd native before clang v18 bb11054b5 clang: detect versions 18 & 19 647bb87de Initial Support for the RISC-V Vector Extension in ARM NEON (#1130) 83479bd70 start next development cycle: v0.8.1 22a493c26 arm/neon abs: negating INT_MIN is undefined behavior 453dec209 simde-detect-clang.h: add clang 17 detection (#1132) e6fab1296 Update simde-detect-clang.h (#1131) e29a4fab5 typo: XCode -> Xcode (#1129) 8392c69a1 Improve performance of simde_mm512_add_epi32 (#1126) ddaab3759 neon {u,s}addh apply arm64 windows workaround only on msvc<1938 (#1121) 8e9d432a6 correction of simde_mm256_sign_epi{8,16,32}. (#1123) 43ec909bb avx512 abs: refine GCC compiler checks for `_mm512{,_mask}_abs_pd` (#1118) 24be11d00 gh-actions: test mips64el using qemu on gcc12/clang16 f0bd155cf wasm relaxed: add f{32x4,64x2}_relaxed_{min,max} 459abf9f7 wasm simd128/relaxed: begin MIPS implementations ffe050ce9 wasm relaxed: updated names; reordered FMA operations 762d7ad22 wasm: detect support for Relaxed SIMD mode e96949e3f prepare to release 0.8.0 f73d72e4e NEON: implement all bf16-related intrinsics (#1110) 72f6d30fe neon: add enable vmlaq_laneq_f32 and vcvtq_n_f64_u64 d6271b3fe NEON: implement all intrinsics supported by architecture A64-remaining part (#1093) 7904fc3cf sse2 mm_pause: more archs, add a basic test 260adca59 arm neon ld2: silence warnings at -O3 on gcc risc-v d1578a0ce simde_float16: prefer __fp16 if available 064b80493 svml: don't enable SIMDE_X86_SVML_NATIVE for ClangCl 790e8d6c3 fp16: don't use _Float16 on ClangCL if not supported 87f7d3317 neon: Modified simde_float16 to simde_float16_t (#1100) 2c58d6d05 Reuse unoptimized implementations of vaesimcq_u8 from x86 be7e377cb [NEON] Add AES instructions. a686efde4 x86 sse4.1 mm_testz_si128: fix backwards short circuit logic bc206e4fa wasm f{32x4,64x2}_min: add workaround for a gcc<6 issue f2e82c961 x86 pclmul: fix natives, some require VPCLMULQDQ 0adef454b avx512 gather: add MSVC native fallbacks ecc469297 avx512 set: add simde_x_mm512_set_m256{,d} 70f702627 NEON: part 1 of implement all intrinsics supported by architecture A64 (#1090) aefb342e9 avx512 types: avoid using native AVX512 types on MSVC unless required 833a87750 svml: enable SIMDE_X86_SVML_NATIVE for MSVC 2019+ db326c75c sse{,2,4.1}, avx{,2} *_stream_{,load}: use __builtin_nontemporal_{load,store} 3f0321ba2 sse _mm_movemask_ps: remove unused code 4c7c77217 gh-actions: test with clang-16 87cf105ab neon/st1{,q}_*_x{2,3,4}: initial implementation (#1082) 5634cec09 NEON: more fp16 using intrinsics supported by architecture v7 (skip version) (#1081) cfd917230 arm neon: Complex operations from Armv8.3-a (#1077) 389f360a6 arm aes: add neon implementation using the crypto extension 4adb6591f arm: use SIMDE_ARCH_ARM_FMA faeb00a70 avx512: fix many native aliases 98eb64b85 sse: implement _mm_movelh_ps for Arm64 57197e8db aes: initial implementation of most aes instructions (#1072) 2fbc63391 NEON: Implement some f16XN types and f16 related intrinsics. (#1071) 64f94c681 avx: simde_mm256_shuffle_pd fix for natural vector size < 128 d41408997 Add workaround for GCC bug 111609 665676042 Extend constant range in simde_vshll_n_XXX intrinsics (#1064) 33c4480de Remove non-working MMX specialization from simde_vmin_s16 6f4afd634 Fix issues related to MXCSR register (#1060) 82be3395b fix SIMDE_ARCH_X86_SSE4_2 define 38580983e riscv64 clang: doesn't support _Float16 or __fp16 properly yet a39f2c3b3 avx512/shuffle: mm512_{shuffle_epi32,shuffle{hi,lo}_epi16} a202d0116 avx512/gather: mm512_{mask_,}i64gather_{epi32,epi64,ps,pd) 95a6d0813 avx512 new families started: gather/reduce + other additional funcs ef8931287 avx512 cmp,cvt,cvts,cvtt,cvtus,gather,kand,permutex,rcp: new ops for intgemm 4a29d21ff avx512: start supporting AVX512FP16 / m512h f686d38f1 clang wasm: SIMDE_BUG_CLANG_60655 is fixed in the upcoming 17.0 release 7760aabd1 GCC AVX512F: SIMDE_BUG_GCC_95399 was fixed in GCC 9.5, 10.4, 11.4, 12+ 436dd4cc1 GCC x86/x64: SIMDE_BUG_GCC_98521 was fixed in 10.3 843112308 GCC x86: SIMDE_BUG_GCC_94482 was fixed in 8.5, 9.4, 10+ e140ac4e2 x86/avx512 fpclass: improve fallback implementation 5950c402c gh-actions: re-order ccache; add old clang/gcc versions faf228937 avx512/loadu: fix native detection b3341922b simde-f16: improve _Float16 usage; better INFHF/NANHF defs 5e632b09d avx512: naive implementation of fpclass b71b58c27 [NEON/A32V7]: Don't trust clang for load multiple on A32V7 c5de4d090 neon: Add qtbl/qtbx polyfills for A32V7 3bda0d7c6 neon/cvtn: vcvtnq_u32_f32 is a V8 function 73910b60c msa neon impl: float64x2_t is not avail in A32V7 0540d7fc2 clang aarch64: optimization bug 45541 was fixed in clang-15 d315aac71 clmul: aarch64 clang has difficulties with poly64x1_t a2eeb9ef1 sse4.1: use logical OR instead of bitwise OR in neon impl of _mm_testnzc_si128 e676d9982 clang powerpc: vec_bperm bug was fixed in clang-14 0e3290e86 neon/st1: disable last remaining AltiVec implementation db0649e1d wasm simd128: more powerpc fixes bbdb2a1f5 sse2,wasm simd128: skip SIMDE_CONVERT_VECTOR_ impementations on PowerPC c6b6ac500 wasm/simd128: add missing unsigned functions 78faeab11 wasm/simd128: fix altivec_p7 version of wasm_f64x2_pmin 1f359106b We are in a dev period again: v0.7.7 9135bd049 neon/cvtn: vcvtnq_{s32_f32,s64_f64}: add SSE & AVX512 optimized implementations 6a1db3a5a neon/cvtn: basic implementation of a few functions 1cf65cb0a mmx: loogson impl promotions over SIMDE_SHUFFLE_VECTOR_ 4ab8749df sse{,2,3,4.1},avx: more WASM shuffle implementations b49fa29d5 avx512: arghhh: really fix typedef of __mmask64 6244ab92e avx512: typo fix for typedef of __mmask64 20c5200d6 avx512/madd: fix native alias arguments for _mm512_madd_epi16 cc476f364 neon/qabs: restore SSE2 impl for vqabsq_s8 a7682611d neon/abd,ext,cmla{,_rot{180,270,90}}: additional wasm128 implementations ca523adb7 sse: allow native _mm_loadh_pi on MSVC x64 ac526659e test: appease GCC 5.x & clang 01ea9a8d3 start release process for 0.7.6 28a6001f6 x86/sse*,avx: add additional SIMD128 implementations aca2f0ae6 neon/shl,rshl: fix avx include to unbreak amalgamated hearders f60a9d8df neon/mla_lane: initial implementation using mla+dup f982cfd51 Update clang version detection for 14..16 and add link b45a14ccc simde-arch: include hedley for setting F16C for MSVC 2022+ with AVX2 3ce91d4cd 0.7.5 dev cycle on the road to 0.7.6/0.8.0 02c7a67ed sse: remove unbalanced HEDLEY_DIAGNOSTIC_PUSH b0b370a4b x86/sse: Add LoongArch LSX support 2338f175d arch: Add LoongArch LASX/LSX support 90d95fae4 avx512: define __mask64 & __mask32 if not yet defined 42a43fa57 sve/true,whilelt,cmplt,ld1,st1,sel,and: skip AVX512 native implementations on MSVC 2017 20f98da6f sve/whilelt: correct type-o in __mmask32 initialization 47a1500f7 sve/ptest: _BitScanForward64 and __builtin_ctzll is not available in MSVC 2017 cd93fcc9e avx512/knot,kxor: native calls not availabe on MSVC 2017 ba6324b6b avx512/loadu: _mm{,256}_loadu_epi{8,16,32,64} skip native impl on MSVC < 2019 2f6fe9c64 sse2/avx: move some native aliases around to satisfy MSVC 2017 /ARCH:AVX512 91fda2cc9 axv512/insert: unroll SIMDE_CONSTIFY for testing macro implemented functions a397b74b3 __builtin_signbit: add cast to double for old Clang versions e016050b2 clmul: _mm512_clmulepi64_epi128 implicitly requires AVX512F 7e353c009 Wasm q15mulr_sat_s: match Wasm spec ce375861c Wasm f32/f64 nearest: match Wasm spec 96d5e0346 Wasm f32/f64 floor/ceil/trunc/sqrt: match Wasm spec 5676a1ba7 Wasm f32/f64 abs: match Wasm spec aa299c08b Wasm f32/f64 max: match Wasm spec 433d2b951 Wasm f32/f64 min: match Wasm spec cf1ac40b8 avx{,2}: some intrinsics are missing from older MSVC versions bff9b1b3c simd128: move unary minus to appease msvc native arm64 efc512a49 neon/ext: unroll SIMDE_CONSTIFY for testing macro implemented functions 091250e81 neon/addlv: disable SSSE3 impl of _vaddlvq_s16 for MSVC 4b3053606 neon/ext: simde_*{to,from}_m64 reqs MMX_NATIVE 2dedbd9bf skip many mm{,_mask,_maskz}_roundscale_round_{ss,sd} testing on MSVC + AVX a04ea7bc9 f16c: rounding not yet implemented for simde_mm{256,}_cvtps_ph e8ee041ab ci appveyor: build tests with AVX{,2}, but don't run them 2188c9728 arm/neon/add{l,}v: SSE2/SSSE3 opts _vadd{lvq_s8, lvq_s16, lvq_u8, vq_u8} 186f12f17 axv512: add simde_mm512_{cvtepi32_ps,extractf32x8_ps,_cmpgt_epi16_mask} 6a40fdeb5 arm/neon/rnd: use correct SVML function for simde_vrndq_f64 9a0705b06 svml: simde_mm256_{clog,csqrt}_ps native reqs AVX not SSE c298a7ec2 msvc avx512/roundscale_round: quiet a false positive warning 01d9c5def sse: remove errant MMX requirement from simde_mm_movemask_ps c675aa08d x86/avx{,2}: use SIMDE_FLOAT{32,64}_C to fix warnings from msvc 097af509e msvc 2022: enable F16C if AVX2 present 91cd7b64b avx{,2}: fix maskload illegal mem access 2caa25b85 Fixed simde_mm_prefetch warnings 96bdf5234 Fixed parameters to _mm_clflush 4d560e418 emscripten; don't use __builtin_roundeven{f,} even if defined 511a01e7d avx512/compress: Mitigate poor compressstore performance on AMD Zen 4 a22b63dc9 avx512/{knot,kxor,cmp,cmpeq,compress,cvt,loadu,shuffle,storeu} Additional AVX512{F,BW,VBMI2,VL} ops 3d87469f6 wasm simd128: correct trunc_sat _FAST_CONVERSION_RANGE target type 56ca5bd89 Suppress min/max macro definitions from windows.h f2cea4d33 arm/neon/qdmulh s390 gcc-12: __builtin_shufflevector is misbehaving 3698cef9b neon/cvt: clang bug 46844 was fixed in clang 12.0 9369cea4a simd128: clang 13 fixed bugs affecting simde_wasm_{v128_load8_lane,i64x2_load32x2} ce27bd09a gcc power: vec_cpsgn argument reversal fixed in 12.0 20fd5b94b gcc power: bugs 1007[012] fixed in GCC 12.1 5e25de133 gcc sse2: bug 99754 was fixed in GCC 12.1 e69796025 gcc i686 mm*_dpbf16_ps: skip vector ops due to rounding error 359c3ff47 clang wasm simde: add workaround to fix wasm_i64x2_shl bug b767f5edc arm/neon: workaround on ARM64 windows bug 599b1fbf4 mips/msa: fix for Windows ARM64 c6f4821ed arm64 windows: fix simd128.h build error 782e7c73e prepare to release 0.7.4 6e9ac2457 fix A32V7 version of _mm_test{nz,}c_si128 776f7a699 test with Debian default flags, also for armel a240d951a x86: fix AVX native → SSE4.2 native 5a73c2ce5 _mm_insert_ps: incorrect handling of the control 597a1c9e4 neon/ld1[q]_*_x2: initial implementation 4550faeac wasm: f32x4 and f64x2 nearest roundeven 5e0686459 Add missing `static const` in simde-math.h. NFC da02f2cee avx512/setzero: fix native aliases 89762e11b Fixed FMA detection macro on msvc b0fda5cf2 avx512/load_pd: initial implementation a61af0778 avx512/load_ps: initial implementation 4126bde01 Properly map __mm functions to __simde_mm 2e76b7a69 neon ld2: gcc-12 fixes 604a53de3 fix wrong size e5e085ff8 AVX: add native calls for _mm256_insertf128_{pd,ps,si256} ee3bd005b aarch64 + clang-1[345] fix for "implicit conversion changes signedness" a060c461a wasm: load lane memcpy instead of cast to address UBSAN issues git-subtree-dir: lib/simde/simde git-subtree-split: 0faa907b261001f89ac89becaea20beddd675468

….8.4-rc1+59f7798

mr-c added 2 commits July 27, 2025 18:08

Merge commit 'b1caca297f0495371edaaeaafe13754c25fa62e8' into simde_v0…

55699b9

….8.4-rc1+59f7798

mr-c force-pushed the simde_v0.8.4-rc1+59f7798 branch from b8c6585 to 55699b9 Compare July 27, 2025 16:09

mr-c changed the title ~~SIMDe: update to v0.8.4-rc1+59f7798~~ SIMDe: update to v0.8.4-rc1+0faa907b2 Jul 27, 2025

remove unused code

0ab36ec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SIMDe: update to v0.8.4-rc1+0faa907b2 #1022

SIMDe: update to v0.8.4-rc1+0faa907b2 #1022

Uh oh!

mr-c commented Jul 27, 2025 •

edited

Loading

Uh oh!

mr-c commented Jul 27, 2025

Uh oh!

Uh oh!

SIMDe: update to v0.8.4-rc1+0faa907b2 #1022

Are you sure you want to change the base?

SIMDe: update to v0.8.4-rc1+0faa907b2 #1022

Uh oh!

Conversation

mr-c commented Jul 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mr-c commented Jul 27, 2025

Uh oh!

Uh oh!

mr-c commented Jul 27, 2025 •

edited

Loading