Skip to content

Conversation

mr-c
Copy link
Contributor

@mr-c mr-c commented Jul 27, 2025

@mr-c
Copy link
Contributor Author

mr-c commented Jul 27, 2025

I'm working on a -pedantic fix over in simd-everywhere/simde#1302

mr-c added 2 commits July 27, 2025 18:08
0faa907b2 gcc pedantic: fp16 is not part of ISO C, silence the warning
f53a9cf79 gcc pedantic: also silence this other warning about __int128
59f779845 arm neon: Add float16 multi-vectors to native aliases
4b279d62e https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100927 was fixed in GCC 15.x
5c8f50ec1 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95782 was fixed in GCC 13
677f2cbee Avoid undefined behaviour with signed integer multiplication (#1296)
2096f755e arm64 gcc FRINT: skip native call on GCC
3ea330475 x86 sse2 for loongarch: fix GCC build failure (#1287)
a532a12ca riscv64: Fallback to autovec without mrvv-vector-bits flag. (#1282)
85632ca82 arm neon riscv64: add min.h and max.h RVV implementations. (#1283)
ca1e942d9 neon riscv64: Enable RVV segment load/store only when we have `__riscv_zvlsseg` flag. (#1285)
cf8e6a73d riscv64: Enable V feature when both zve64d and zvl128b are present (#1284)
c7f26b73b x86 avx for loongarch: use vfcmp_clt to save one instruction in `_mm_cmp_{sd,ss}` and `_mm256_cmp_pd`
a8ae10d96 x86 sse2,avx2 loongarch impl: let compiler to generate instructions based on imm8
bb0282e3b x86 misc fixes for AVX512{F,VL}_NATIVE
d458d8fdd x86 sse2,sse3, avx: silence some false-positive warnings about unitialized structs
4184e0d42 start preparing to release SIMDe 0.8.4
87ecd64a5 x86 sse2: fix overflow error detected by clang scan-build in simde_mm_srl_epi{16,32,64} when count is too high
ca9449c1e Fix incorrect UQRSHL implementation.
8d90b0411 arm neon: fix `cmla{_rot{90,180,270},}_lane` with correct test-suite on ARMv8.3 system
500454a2a arm neon: replace use of SIMDE_ARCH_ARM_CHECK(8+) with feature checks.
02ba92220 arm neon gcc-12 FRINT workaround
02d815773 arm neon FCMLA with 16-bit floats, requires the FP16 feature
8caaee795 arm neon: FRINT{32,64}{X,Z} native calls require ARMv8.5
438ddcff6 remove extraneous semicolons from many macro-defined functions
9f73373ff wasm simd128: fix a FAST_NANS error on arm64
0bd19a993 Fix vqdmulhs_s32 native alias.
62f40d4b8 x86 avx2: small fixes for loongarch
d656b4d7e x86 sse2: small fixes for loongarch
8f56d4ff1 Remove incorrect qrdmulh SSE code.
8c421df17 arm neon: define native alias only under the inverse of the conditions of a pass-through
25e70ce71 simde-aes: gcc 13.2+ ignore unused variable warnings
69c9cd5c3 arm neon qdmlal: fix saturation (#1194)
34136823c Fix vqshlud_n_s64 implementation to be 64-bit.
483a4bccf Fix qdmlsl instructions
f275fffd9 arm neon qshl: Fix UQSHL to match hardware.  Add extensive test vectors. (#1256)
d95bd9d76 arm neon qdmull: Fix SQDMULL implementation for 32-bit inputs. (#1255)
4b9007046 x86 sse2: fix `_mm_pause` for RISCV systems
0be41ec7c risc64 gcc-14: Disable uninitialized variable warnings for some ARM neon SM3 functions
70fc574b2 arm: Rename ARM ROL/ROR functions with a SIMDE prefix.
a39bd6dde arm neon sli_n: Fix invalid shift warnings  (#1253)
7bd2bb70e arm neon `_vext_p6`: reverse logic to avoid GCC14 i586 bug (#1251)
b4bf72e14 x86 clmul simde_x_bitreverse_u64: add loongarch implementation (#1249)
04f9b4ca6 x86 avx: reoptimized simde_mm256_addsub_ps/d with lasx
54d352981 x86 fma: add loongarch lasx optimized implementations
adefb8dcb x86 f16c: add loongarch lasx optimized implementations
b720dcb7d x86 avx512f: added fmaddsub implementation (#1246)
5c9f6aa19 x86 sse4.2: add loongarch lsx optimized implementations
783703714 x86 sse4.1: add loongarch lsx optimized implementations
0bfc2312f x86 ssse3: add loongarch lsx optimized implementations
fcae0eee0 x86 sse3: add loongarch lsx optimized implementations
af6467260 x86 sse: add loongarch lsx optimized implementations
2ad64c9f7 x86 avx2: add loongarch lasx optimized implementations (#1241)
5cae2261b x86 avx: add loongarch lasx optimized implementations (#1239)
484fcce25 x86 avx: use INT64_C when the destination is i64 (#1238)
5e225b1c6 loongarch: add lsx support for sse2.h
665d7f93b fix clang type redef error
b0fcc6176 Whoops, missing comma
fe262fb0e loongarch float16: use a portable version to avoid compilation errors
1a09d3bc9 x86: move definition of 'value' to correct branch in _mm_loadl_epi64
aac583326 x86: some better implementations for MSVC and others without SIMDE_STATEMENT_EXPR_
d1afb3db1 arm crc32: define SIMDE_ARCH_ARM_CRC32 and consistently use it
592f8f0c4 _mm256_storeu_pd and _mm256_loadu_pd using 128 bit lanes
de4337e8d gcc-14 -O3 complained about some possible unitialized values
8b0937a3e neon/cvz z/Arch: stop using deprecated functions.
e18dcd7d0 arm neon: avoid GCC 11 vst1_*_x4 built-in functions
848fb7777 arm neon: fix arm64 gcc11 build excess elements in vector failure
0aaf78298 x86/sse: Fix type convert error for LSX.
29c96207c arm wasm: add vst2_u8 translation to Wasm SIMD
375ad48fd arm wasm: add vshll translations to Wasm SIMD
d5697fa99 arm wasm: add vst4_u8 translation to Wasm SIMD
e235b2eb1 math: typo fix, check SIMDE_MATH_NANF instead of the old-style SIMDE_NANF
cb4b08c47 wasm AltiVec: add u16x8 and u8x16 avgr translations
90237caba wasm NEON: add u16x8 and u8x16 avgr translations
6050906e9 arm neon vminnmv_f16: remove duplicate statement (#1208)
a3d20d145 x86 wasm: Wasm SIMD version of `_mm_sad_epu8`
32650204e msvc: add simde_MemoryBarrier to avoid including <windows.h>
7ca5a3e0b x86/fma: Use 128 bit fnmadd_pd to do 256 bit fnmadd_pd (#1197)
2ec1f51f8 pow: consistently use simde_math_pow
80f655739 x86: remove redundant mm_add_pd translation for WASM (#1190)
249b9dc03  arm/neon riscv64: additional RVV implementations - part 2. (#1189)
408d06a35 arm/neon riscv64: additional RVV implementations - part1 (#1188)
da5cf1f54 Use _Float16 in C++ on aarch64 with GCC 13+
39f436a9e Don't use _Float16 on non-SSE2 x86
985c27100 Don't use _Float16 on s390x
787830467 x86: Apply half tabular method in _mm_crc32 family
d8a0c764f arm: improve performance in vqadd and vmvn in risc-v
99c63a427 neon: avoid warnings when "__ARM_NEON_FP" is not defined.
e98cbcc70 start next development cycle: v0.8.3
3442dbf2d prepare to release 0.8.0
e6afb7bec arm neon: Fully remove the problematic FCVTZS/FCVTMS/FCVTPS/FCVTNS family intrinsics
fb73a3182 arm: improve performance in vabd_xxx for risc-v
8a4ff7a8b arm: improve performance in vhadd_xxx for risc-v
52f1087ad arm: Add neon2rvv support in vand series intrinsics
737e3b33f arm: fix some neon2rvv intrinsic function error
5242a77dc arm: enable more intrinsic function for armv7
8f123e5c0 wasm x86 impl: some were incorrectly marked SSE instead of SSE2
2b9b01269 arm x86 implementations: allow _m128 access from SSE
6679ff018 svml: SSE is good enough for native m128i and m128d types & functions
68aac3b9a sse2 MSVC `_mm_pause` implementaiton for x86
e76f4331e typo fixes from codespell
73160356b x86 xop: fix some native functions
4ecf271be emscripten; use `__builtin_roundeven{f,}` from version 3.1.43 onwards
347e2b699 arm 32 bits: native def fixes; workarounds for gcc
61d1addce apple clang arm64: ignore SHA2
b58359225 arm platform: cleanup feature detection.
e38f25685 arm neon sm3: check constant range
ac2b229a1 arm neon: disable some FCVTZS/FCVTMS/FCVTPS/FCVTNS family intrinsics
1d7848cf9 arm neon clang: skip vrnd native before clang v18
bb11054b5 clang: detect versions 18 & 19
647bb87de Initial Support for the RISC-V Vector Extension in ARM NEON (#1130)
83479bd70 start next development cycle: v0.8.1
22a493c26 arm/neon abs: negating INT_MIN is undefined behavior
453dec209 simde-detect-clang.h: add clang 17 detection (#1132)
e6fab1296 Update simde-detect-clang.h (#1131)
e29a4fab5 typo: XCode -> Xcode (#1129)
8392c69a1 Improve performance of simde_mm512_add_epi32 (#1126)
ddaab3759 neon {u,s}addh apply arm64 windows workaround only on msvc<1938 (#1121)
8e9d432a6 correction of simde_mm256_sign_epi{8,16,32}. (#1123)
43ec909bb avx512 abs: refine GCC compiler checks for `_mm512{,_mask}_abs_pd` (#1118)
24be11d00 gh-actions: test mips64el using qemu on gcc12/clang16
f0bd155cf wasm relaxed: add f{32x4,64x2}_relaxed_{min,max}
459abf9f7 wasm simd128/relaxed: begin MIPS implementations
ffe050ce9 wasm relaxed: updated names; reordered FMA operations
762d7ad22 wasm: detect support for Relaxed SIMD mode
e96949e3f prepare to release 0.8.0
f73d72e4e NEON: implement all bf16-related intrinsics (#1110)
72f6d30fe neon: add enable vmlaq_laneq_f32 and vcvtq_n_f64_u64
d6271b3fe NEON: implement all intrinsics supported by architecture A64-remaining part (#1093)
7904fc3cf sse2 mm_pause: more archs, add a basic test
260adca59 arm neon ld2: silence warnings at -O3 on gcc risc-v
d1578a0ce simde_float16: prefer __fp16 if available
064b80493 svml: don't enable SIMDE_X86_SVML_NATIVE for ClangCl
790e8d6c3 fp16: don't use _Float16 on ClangCL if not supported
87f7d3317 neon: Modified simde_float16 to simde_float16_t (#1100)
2c58d6d05 Reuse unoptimized implementations of vaesimcq_u8 from x86
be7e377cb [NEON] Add AES instructions.
a686efde4 x86 sse4.1 mm_testz_si128: fix backwards short circuit logic
bc206e4fa wasm f{32x4,64x2}_min: add workaround for a gcc<6 issue
f2e82c961 x86 pclmul: fix natives, some require VPCLMULQDQ
0adef454b avx512 gather: add MSVC native fallbacks
ecc469297 avx512 set: add simde_x_mm512_set_m256{,d}
70f702627 NEON: part 1 of implement all intrinsics supported by architecture A64 (#1090)
aefb342e9 avx512 types: avoid using native AVX512 types on MSVC unless required
833a87750 svml: enable SIMDE_X86_SVML_NATIVE for MSVC 2019+
db326c75c sse{,2,4.1}, avx{,2} *_stream_{,load}: use __builtin_nontemporal_{load,store}
3f0321ba2 sse _mm_movemask_ps: remove unused code
4c7c77217 gh-actions: test with clang-16
87cf105ab neon/st1{,q}_*_x{2,3,4}: initial implementation (#1082)
5634cec09 NEON: more fp16 using intrinsics supported by architecture v7 (skip version) (#1081)
cfd917230 arm neon: Complex operations from Armv8.3-a (#1077)
389f360a6 arm aes: add neon implementation using the crypto extension
4adb6591f arm: use SIMDE_ARCH_ARM_FMA
faeb00a70 avx512: fix many native aliases
98eb64b85 sse: implement _mm_movelh_ps for Arm64
57197e8db aes: initial implementation of most aes instructions (#1072)
2fbc63391 NEON: Implement some f16XN types and f16 related intrinsics. (#1071)
64f94c681 avx: simde_mm256_shuffle_pd fix for natural vector size < 128
d41408997 Add workaround for GCC bug 111609
665676042 Extend constant range in simde_vshll_n_XXX intrinsics (#1064)
33c4480de Remove non-working MMX specialization from simde_vmin_s16
6f4afd634 Fix issues related to MXCSR register (#1060)
82be3395b fix SIMDE_ARCH_X86_SSE4_2 define
38580983e riscv64 clang: doesn't support _Float16 or __fp16 properly yet
a39f2c3b3 avx512/shuffle: mm512_{shuffle_epi32,shuffle{hi,lo}_epi16}
a202d0116 avx512/gather: mm512_{mask_,}i64gather_{epi32,epi64,ps,pd)
95a6d0813 avx512 new families started: gather/reduce + other additional funcs
ef8931287 avx512 cmp,cvt,cvts,cvtt,cvtus,gather,kand,permutex,rcp: new ops for intgemm
4a29d21ff avx512: start supporting AVX512FP16 / m512h
f686d38f1 clang wasm: SIMDE_BUG_CLANG_60655 is fixed in the upcoming 17.0 release
7760aabd1 GCC AVX512F: SIMDE_BUG_GCC_95399 was fixed in GCC 9.5, 10.4, 11.4, 12+
436dd4cc1 GCC x86/x64: SIMDE_BUG_GCC_98521 was fixed in 10.3
843112308 GCC x86: SIMDE_BUG_GCC_94482 was fixed in 8.5, 9.4, 10+
e140ac4e2 x86/avx512 fpclass: improve fallback implementation
5950c402c gh-actions: re-order ccache; add old clang/gcc versions
faf228937 avx512/loadu: fix native detection
b3341922b simde-f16: improve _Float16 usage; better INFHF/NANHF defs
5e632b09d avx512: naive implementation of fpclass
b71b58c27 [NEON/A32V7]: Don't trust clang for load multiple on A32V7
c5de4d090 neon: Add qtbl/qtbx polyfills for A32V7
3bda0d7c6 neon/cvtn: vcvtnq_u32_f32 is a V8 function
73910b60c msa neon impl: float64x2_t is not avail in A32V7
0540d7fc2 clang aarch64: optimization bug 45541 was fixed in clang-15
d315aac71 clmul: aarch64 clang has difficulties with poly64x1_t
a2eeb9ef1 sse4.1: use logical OR instead of bitwise OR in neon impl of _mm_testnzc_si128
e676d9982 clang powerpc: vec_bperm bug was fixed in clang-14
0e3290e86 neon/st1: disable last remaining AltiVec implementation
db0649e1d wasm simd128: more powerpc fixes
bbdb2a1f5 sse2,wasm simd128: skip SIMDE_CONVERT_VECTOR_ impementations on PowerPC
c6b6ac500 wasm/simd128: add missing unsigned functions
78faeab11 wasm/simd128: fix altivec_p7 version of wasm_f64x2_pmin
1f359106b We are in a dev period again: v0.7.7
9135bd049 neon/cvtn: vcvtnq_{s32_f32,s64_f64}: add SSE & AVX512 optimized implementations
6a1db3a5a neon/cvtn: basic implementation of a few functions
1cf65cb0a mmx: loogson impl promotions over SIMDE_SHUFFLE_VECTOR_
4ab8749df sse{,2,3,4.1},avx: more WASM shuffle implementations
b49fa29d5 avx512: arghhh: really fix typedef of __mmask64
6244ab92e avx512: typo fix for typedef of __mmask64
20c5200d6 avx512/madd: fix native alias arguments for _mm512_madd_epi16
cc476f364 neon/qabs: restore SSE2 impl for vqabsq_s8
a7682611d neon/abd,ext,cmla{,_rot{180,270,90}}: additional wasm128 implementations
ca523adb7 sse: allow native _mm_loadh_pi on MSVC x64
ac526659e test: appease GCC 5.x & clang
01ea9a8d3 start release process for 0.7.6
28a6001f6 x86/sse*,avx: add additional SIMD128 implementations
aca2f0ae6 neon/shl,rshl: fix avx include to unbreak amalgamated hearders
f60a9d8df neon/mla_lane: initial implementation using mla+dup
f982cfd51 Update clang version detection for 14..16 and add link
b45a14ccc simde-arch: include hedley for setting F16C for MSVC 2022+ with AVX2
3ce91d4cd 0.7.5 dev cycle on the road to 0.7.6/0.8.0
02c7a67ed sse: remove unbalanced HEDLEY_DIAGNOSTIC_PUSH
b0b370a4b x86/sse: Add LoongArch LSX support
2338f175d arch: Add LoongArch LASX/LSX support
90d95fae4 avx512: define __mask64 & __mask32 if not yet defined
42a43fa57 sve/true,whilelt,cmplt,ld1,st1,sel,and: skip AVX512 native implementations on MSVC 2017
20f98da6f sve/whilelt: correct type-o in __mmask32 initialization
47a1500f7 sve/ptest: _BitScanForward64 and __builtin_ctzll is not available in MSVC 2017
cd93fcc9e avx512/knot,kxor: native calls not availabe on MSVC 2017
ba6324b6b avx512/loadu: _mm{,256}_loadu_epi{8,16,32,64} skip native impl on MSVC < 2019
2f6fe9c64 sse2/avx: move some native aliases around to satisfy MSVC 2017 /ARCH:AVX512
91fda2cc9 axv512/insert: unroll SIMDE_CONSTIFY for testing macro implemented functions
a397b74b3 __builtin_signbit: add cast to double for old Clang versions
e016050b2 clmul: _mm512_clmulepi64_epi128 implicitly requires AVX512F
7e353c009 Wasm q15mulr_sat_s: match Wasm spec
ce375861c Wasm f32/f64 nearest: match Wasm spec
96d5e0346 Wasm f32/f64 floor/ceil/trunc/sqrt: match Wasm spec
5676a1ba7 Wasm f32/f64 abs: match Wasm spec
aa299c08b Wasm f32/f64 max: match Wasm spec
433d2b951 Wasm f32/f64 min: match Wasm spec
cf1ac40b8 avx{,2}: some intrinsics are missing from older MSVC versions
bff9b1b3c simd128: move unary minus to appease msvc native arm64
efc512a49 neon/ext: unroll SIMDE_CONSTIFY for testing macro implemented functions
091250e81 neon/addlv: disable SSSE3 impl of _vaddlvq_s16 for MSVC
4b3053606 neon/ext: simde_*{to,from}_m64 reqs MMX_NATIVE
2dedbd9bf skip many mm{,_mask,_maskz}_roundscale_round_{ss,sd} testing on MSVC + AVX
a04ea7bc9 f16c: rounding not yet implemented for simde_mm{256,}_cvtps_ph
e8ee041ab ci appveyor: build tests with AVX{,2}, but don't run them
2188c9728 arm/neon/add{l,}v: SSE2/SSSE3 opts _vadd{lvq_s8, lvq_s16, lvq_u8, vq_u8}
186f12f17 axv512: add simde_mm512_{cvtepi32_ps,extractf32x8_ps,_cmpgt_epi16_mask}
6a40fdeb5 arm/neon/rnd: use correct SVML function for simde_vrndq_f64
9a0705b06 svml: simde_mm256_{clog,csqrt}_ps native reqs AVX not SSE
c298a7ec2 msvc avx512/roundscale_round: quiet a false positive warning
01d9c5def sse: remove errant MMX requirement from simde_mm_movemask_ps
c675aa08d x86/avx{,2}: use SIMDE_FLOAT{32,64}_C to fix warnings from msvc
097af509e msvc 2022: enable F16C if AVX2 present
91cd7b64b avx{,2}: fix maskload illegal mem access
2caa25b85 Fixed simde_mm_prefetch warnings
96bdf5234 Fixed parameters to _mm_clflush
4d560e418 emscripten; don't use __builtin_roundeven{f,} even if defined
511a01e7d avx512/compress: Mitigate poor compressstore performance on AMD Zen 4
a22b63dc9 avx512/{knot,kxor,cmp,cmpeq,compress,cvt,loadu,shuffle,storeu} Additional AVX512{F,BW,VBMI2,VL} ops
3d87469f6 wasm simd128: correct trunc_sat _FAST_CONVERSION_RANGE target type
56ca5bd89 Suppress min/max macro definitions from windows.h
f2cea4d33 arm/neon/qdmulh s390 gcc-12: __builtin_shufflevector is misbehaving
3698cef9b neon/cvt: clang bug 46844 was fixed in clang 12.0
9369cea4a simd128: clang 13 fixed bugs affecting simde_wasm_{v128_load8_lane,i64x2_load32x2}
ce27bd09a gcc power: vec_cpsgn argument reversal fixed in 12.0
20fd5b94b gcc power: bugs 1007[012] fixed in GCC 12.1
5e25de133 gcc sse2: bug 99754 was fixed in GCC 12.1
e69796025 gcc i686 mm*_dpbf16_ps: skip vector ops due to rounding error
359c3ff47 clang wasm simde: add workaround to fix wasm_i64x2_shl bug
b767f5edc arm/neon: workaround on ARM64 windows bug
599b1fbf4 mips/msa: fix for Windows ARM64
c6f4821ed arm64 windows: fix simd128.h build error
782e7c73e prepare to release 0.7.4
6e9ac2457 fix A32V7 version of _mm_test{nz,}c_si128
776f7a699 test with Debian default flags, also for armel
a240d951a x86: fix AVX native → SSE4.2 native
5a73c2ce5 _mm_insert_ps: incorrect handling of the control
597a1c9e4 neon/ld1[q]_*_x2: initial implementation
4550faeac wasm: f32x4 and f64x2 nearest roundeven
5e0686459 Add missing `static const` in simde-math.h. NFC
da02f2cee avx512/setzero: fix native aliases
89762e11b Fixed FMA detection macro on msvc
b0fda5cf2 avx512/load_pd: initial implementation
a61af0778 avx512/load_ps: initial implementation
4126bde01 Properly map __mm functions to __simde_mm
2e76b7a69 neon ld2: gcc-12 fixes
604a53de3 fix wrong size
e5e085ff8 AVX: add native calls for _mm256_insertf128_{pd,ps,si256}
ee3bd005b aarch64 + clang-1[345] fix for "implicit conversion changes signedness"
a060c461a wasm: load lane memcpy instead of cast to address UBSAN issues

git-subtree-dir: lib/simde/simde
git-subtree-split: 0faa907b261001f89ac89becaea20beddd675468
@mr-c mr-c force-pushed the simde_v0.8.4-rc1+59f7798 branch from b8c6585 to 55699b9 Compare July 27, 2025 16:09
@mr-c mr-c changed the title SIMDe: update to v0.8.4-rc1+59f7798 SIMDe: update to v0.8.4-rc1+0faa907b2 Jul 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant