-
Notifications
You must be signed in to change notification settings - Fork 244
SIMDe: update to v0.8.4-rc1+0faa907b2 #1022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mr-c
wants to merge
3
commits into
soedinglab:master
Choose a base branch
from
mr-c:simde_v0.8.4-rc1+59f7798
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I'm working on a |
0faa907b2 gcc pedantic: fp16 is not part of ISO C, silence the warning f53a9cf79 gcc pedantic: also silence this other warning about __int128 59f779845 arm neon: Add float16 multi-vectors to native aliases 4b279d62e https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100927 was fixed in GCC 15.x 5c8f50ec1 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95782 was fixed in GCC 13 677f2cbee Avoid undefined behaviour with signed integer multiplication (#1296) 2096f755e arm64 gcc FRINT: skip native call on GCC 3ea330475 x86 sse2 for loongarch: fix GCC build failure (#1287) a532a12ca riscv64: Fallback to autovec without mrvv-vector-bits flag. (#1282) 85632ca82 arm neon riscv64: add min.h and max.h RVV implementations. (#1283) ca1e942d9 neon riscv64: Enable RVV segment load/store only when we have `__riscv_zvlsseg` flag. (#1285) cf8e6a73d riscv64: Enable V feature when both zve64d and zvl128b are present (#1284) c7f26b73b x86 avx for loongarch: use vfcmp_clt to save one instruction in `_mm_cmp_{sd,ss}` and `_mm256_cmp_pd` a8ae10d96 x86 sse2,avx2 loongarch impl: let compiler to generate instructions based on imm8 bb0282e3b x86 misc fixes for AVX512{F,VL}_NATIVE d458d8fdd x86 sse2,sse3, avx: silence some false-positive warnings about unitialized structs 4184e0d42 start preparing to release SIMDe 0.8.4 87ecd64a5 x86 sse2: fix overflow error detected by clang scan-build in simde_mm_srl_epi{16,32,64} when count is too high ca9449c1e Fix incorrect UQRSHL implementation. 8d90b0411 arm neon: fix `cmla{_rot{90,180,270},}_lane` with correct test-suite on ARMv8.3 system 500454a2a arm neon: replace use of SIMDE_ARCH_ARM_CHECK(8+) with feature checks. 02ba92220 arm neon gcc-12 FRINT workaround 02d815773 arm neon FCMLA with 16-bit floats, requires the FP16 feature 8caaee795 arm neon: FRINT{32,64}{X,Z} native calls require ARMv8.5 438ddcff6 remove extraneous semicolons from many macro-defined functions 9f73373ff wasm simd128: fix a FAST_NANS error on arm64 0bd19a993 Fix vqdmulhs_s32 native alias. 62f40d4b8 x86 avx2: small fixes for loongarch d656b4d7e x86 sse2: small fixes for loongarch 8f56d4ff1 Remove incorrect qrdmulh SSE code. 8c421df17 arm neon: define native alias only under the inverse of the conditions of a pass-through 25e70ce71 simde-aes: gcc 13.2+ ignore unused variable warnings 69c9cd5c3 arm neon qdmlal: fix saturation (#1194) 34136823c Fix vqshlud_n_s64 implementation to be 64-bit. 483a4bccf Fix qdmlsl instructions f275fffd9 arm neon qshl: Fix UQSHL to match hardware. Add extensive test vectors. (#1256) d95bd9d76 arm neon qdmull: Fix SQDMULL implementation for 32-bit inputs. (#1255) 4b9007046 x86 sse2: fix `_mm_pause` for RISCV systems 0be41ec7c risc64 gcc-14: Disable uninitialized variable warnings for some ARM neon SM3 functions 70fc574b2 arm: Rename ARM ROL/ROR functions with a SIMDE prefix. a39bd6dde arm neon sli_n: Fix invalid shift warnings (#1253) 7bd2bb70e arm neon `_vext_p6`: reverse logic to avoid GCC14 i586 bug (#1251) b4bf72e14 x86 clmul simde_x_bitreverse_u64: add loongarch implementation (#1249) 04f9b4ca6 x86 avx: reoptimized simde_mm256_addsub_ps/d with lasx 54d352981 x86 fma: add loongarch lasx optimized implementations adefb8dcb x86 f16c: add loongarch lasx optimized implementations b720dcb7d x86 avx512f: added fmaddsub implementation (#1246) 5c9f6aa19 x86 sse4.2: add loongarch lsx optimized implementations 783703714 x86 sse4.1: add loongarch lsx optimized implementations 0bfc2312f x86 ssse3: add loongarch lsx optimized implementations fcae0eee0 x86 sse3: add loongarch lsx optimized implementations af6467260 x86 sse: add loongarch lsx optimized implementations 2ad64c9f7 x86 avx2: add loongarch lasx optimized implementations (#1241) 5cae2261b x86 avx: add loongarch lasx optimized implementations (#1239) 484fcce25 x86 avx: use INT64_C when the destination is i64 (#1238) 5e225b1c6 loongarch: add lsx support for sse2.h 665d7f93b fix clang type redef error b0fcc6176 Whoops, missing comma fe262fb0e loongarch float16: use a portable version to avoid compilation errors 1a09d3bc9 x86: move definition of 'value' to correct branch in _mm_loadl_epi64 aac583326 x86: some better implementations for MSVC and others without SIMDE_STATEMENT_EXPR_ d1afb3db1 arm crc32: define SIMDE_ARCH_ARM_CRC32 and consistently use it 592f8f0c4 _mm256_storeu_pd and _mm256_loadu_pd using 128 bit lanes de4337e8d gcc-14 -O3 complained about some possible unitialized values 8b0937a3e neon/cvz z/Arch: stop using deprecated functions. e18dcd7d0 arm neon: avoid GCC 11 vst1_*_x4 built-in functions 848fb7777 arm neon: fix arm64 gcc11 build excess elements in vector failure 0aaf78298 x86/sse: Fix type convert error for LSX. 29c96207c arm wasm: add vst2_u8 translation to Wasm SIMD 375ad48fd arm wasm: add vshll translations to Wasm SIMD d5697fa99 arm wasm: add vst4_u8 translation to Wasm SIMD e235b2eb1 math: typo fix, check SIMDE_MATH_NANF instead of the old-style SIMDE_NANF cb4b08c47 wasm AltiVec: add u16x8 and u8x16 avgr translations 90237caba wasm NEON: add u16x8 and u8x16 avgr translations 6050906e9 arm neon vminnmv_f16: remove duplicate statement (#1208) a3d20d145 x86 wasm: Wasm SIMD version of `_mm_sad_epu8` 32650204e msvc: add simde_MemoryBarrier to avoid including <windows.h> 7ca5a3e0b x86/fma: Use 128 bit fnmadd_pd to do 256 bit fnmadd_pd (#1197) 2ec1f51f8 pow: consistently use simde_math_pow 80f655739 x86: remove redundant mm_add_pd translation for WASM (#1190) 249b9dc03 arm/neon riscv64: additional RVV implementations - part 2. (#1189) 408d06a35 arm/neon riscv64: additional RVV implementations - part1 (#1188) da5cf1f54 Use _Float16 in C++ on aarch64 with GCC 13+ 39f436a9e Don't use _Float16 on non-SSE2 x86 985c27100 Don't use _Float16 on s390x 787830467 x86: Apply half tabular method in _mm_crc32 family d8a0c764f arm: improve performance in vqadd and vmvn in risc-v 99c63a427 neon: avoid warnings when "__ARM_NEON_FP" is not defined. e98cbcc70 start next development cycle: v0.8.3 3442dbf2d prepare to release 0.8.0 e6afb7bec arm neon: Fully remove the problematic FCVTZS/FCVTMS/FCVTPS/FCVTNS family intrinsics fb73a3182 arm: improve performance in vabd_xxx for risc-v 8a4ff7a8b arm: improve performance in vhadd_xxx for risc-v 52f1087ad arm: Add neon2rvv support in vand series intrinsics 737e3b33f arm: fix some neon2rvv intrinsic function error 5242a77dc arm: enable more intrinsic function for armv7 8f123e5c0 wasm x86 impl: some were incorrectly marked SSE instead of SSE2 2b9b01269 arm x86 implementations: allow _m128 access from SSE 6679ff018 svml: SSE is good enough for native m128i and m128d types & functions 68aac3b9a sse2 MSVC `_mm_pause` implementaiton for x86 e76f4331e typo fixes from codespell 73160356b x86 xop: fix some native functions 4ecf271be emscripten; use `__builtin_roundeven{f,}` from version 3.1.43 onwards 347e2b699 arm 32 bits: native def fixes; workarounds for gcc 61d1addce apple clang arm64: ignore SHA2 b58359225 arm platform: cleanup feature detection. e38f25685 arm neon sm3: check constant range ac2b229a1 arm neon: disable some FCVTZS/FCVTMS/FCVTPS/FCVTNS family intrinsics 1d7848cf9 arm neon clang: skip vrnd native before clang v18 bb11054b5 clang: detect versions 18 & 19 647bb87de Initial Support for the RISC-V Vector Extension in ARM NEON (#1130) 83479bd70 start next development cycle: v0.8.1 22a493c26 arm/neon abs: negating INT_MIN is undefined behavior 453dec209 simde-detect-clang.h: add clang 17 detection (#1132) e6fab1296 Update simde-detect-clang.h (#1131) e29a4fab5 typo: XCode -> Xcode (#1129) 8392c69a1 Improve performance of simde_mm512_add_epi32 (#1126) ddaab3759 neon {u,s}addh apply arm64 windows workaround only on msvc<1938 (#1121) 8e9d432a6 correction of simde_mm256_sign_epi{8,16,32}. (#1123) 43ec909bb avx512 abs: refine GCC compiler checks for `_mm512{,_mask}_abs_pd` (#1118) 24be11d00 gh-actions: test mips64el using qemu on gcc12/clang16 f0bd155cf wasm relaxed: add f{32x4,64x2}_relaxed_{min,max} 459abf9f7 wasm simd128/relaxed: begin MIPS implementations ffe050ce9 wasm relaxed: updated names; reordered FMA operations 762d7ad22 wasm: detect support for Relaxed SIMD mode e96949e3f prepare to release 0.8.0 f73d72e4e NEON: implement all bf16-related intrinsics (#1110) 72f6d30fe neon: add enable vmlaq_laneq_f32 and vcvtq_n_f64_u64 d6271b3fe NEON: implement all intrinsics supported by architecture A64-remaining part (#1093) 7904fc3cf sse2 mm_pause: more archs, add a basic test 260adca59 arm neon ld2: silence warnings at -O3 on gcc risc-v d1578a0ce simde_float16: prefer __fp16 if available 064b80493 svml: don't enable SIMDE_X86_SVML_NATIVE for ClangCl 790e8d6c3 fp16: don't use _Float16 on ClangCL if not supported 87f7d3317 neon: Modified simde_float16 to simde_float16_t (#1100) 2c58d6d05 Reuse unoptimized implementations of vaesimcq_u8 from x86 be7e377cb [NEON] Add AES instructions. a686efde4 x86 sse4.1 mm_testz_si128: fix backwards short circuit logic bc206e4fa wasm f{32x4,64x2}_min: add workaround for a gcc<6 issue f2e82c961 x86 pclmul: fix natives, some require VPCLMULQDQ 0adef454b avx512 gather: add MSVC native fallbacks ecc469297 avx512 set: add simde_x_mm512_set_m256{,d} 70f702627 NEON: part 1 of implement all intrinsics supported by architecture A64 (#1090) aefb342e9 avx512 types: avoid using native AVX512 types on MSVC unless required 833a87750 svml: enable SIMDE_X86_SVML_NATIVE for MSVC 2019+ db326c75c sse{,2,4.1}, avx{,2} *_stream_{,load}: use __builtin_nontemporal_{load,store} 3f0321ba2 sse _mm_movemask_ps: remove unused code 4c7c77217 gh-actions: test with clang-16 87cf105ab neon/st1{,q}_*_x{2,3,4}: initial implementation (#1082) 5634cec09 NEON: more fp16 using intrinsics supported by architecture v7 (skip version) (#1081) cfd917230 arm neon: Complex operations from Armv8.3-a (#1077) 389f360a6 arm aes: add neon implementation using the crypto extension 4adb6591f arm: use SIMDE_ARCH_ARM_FMA faeb00a70 avx512: fix many native aliases 98eb64b85 sse: implement _mm_movelh_ps for Arm64 57197e8db aes: initial implementation of most aes instructions (#1072) 2fbc63391 NEON: Implement some f16XN types and f16 related intrinsics. (#1071) 64f94c681 avx: simde_mm256_shuffle_pd fix for natural vector size < 128 d41408997 Add workaround for GCC bug 111609 665676042 Extend constant range in simde_vshll_n_XXX intrinsics (#1064) 33c4480de Remove non-working MMX specialization from simde_vmin_s16 6f4afd634 Fix issues related to MXCSR register (#1060) 82be3395b fix SIMDE_ARCH_X86_SSE4_2 define 38580983e riscv64 clang: doesn't support _Float16 or __fp16 properly yet a39f2c3b3 avx512/shuffle: mm512_{shuffle_epi32,shuffle{hi,lo}_epi16} a202d0116 avx512/gather: mm512_{mask_,}i64gather_{epi32,epi64,ps,pd) 95a6d0813 avx512 new families started: gather/reduce + other additional funcs ef8931287 avx512 cmp,cvt,cvts,cvtt,cvtus,gather,kand,permutex,rcp: new ops for intgemm 4a29d21ff avx512: start supporting AVX512FP16 / m512h f686d38f1 clang wasm: SIMDE_BUG_CLANG_60655 is fixed in the upcoming 17.0 release 7760aabd1 GCC AVX512F: SIMDE_BUG_GCC_95399 was fixed in GCC 9.5, 10.4, 11.4, 12+ 436dd4cc1 GCC x86/x64: SIMDE_BUG_GCC_98521 was fixed in 10.3 843112308 GCC x86: SIMDE_BUG_GCC_94482 was fixed in 8.5, 9.4, 10+ e140ac4e2 x86/avx512 fpclass: improve fallback implementation 5950c402c gh-actions: re-order ccache; add old clang/gcc versions faf228937 avx512/loadu: fix native detection b3341922b simde-f16: improve _Float16 usage; better INFHF/NANHF defs 5e632b09d avx512: naive implementation of fpclass b71b58c27 [NEON/A32V7]: Don't trust clang for load multiple on A32V7 c5de4d090 neon: Add qtbl/qtbx polyfills for A32V7 3bda0d7c6 neon/cvtn: vcvtnq_u32_f32 is a V8 function 73910b60c msa neon impl: float64x2_t is not avail in A32V7 0540d7fc2 clang aarch64: optimization bug 45541 was fixed in clang-15 d315aac71 clmul: aarch64 clang has difficulties with poly64x1_t a2eeb9ef1 sse4.1: use logical OR instead of bitwise OR in neon impl of _mm_testnzc_si128 e676d9982 clang powerpc: vec_bperm bug was fixed in clang-14 0e3290e86 neon/st1: disable last remaining AltiVec implementation db0649e1d wasm simd128: more powerpc fixes bbdb2a1f5 sse2,wasm simd128: skip SIMDE_CONVERT_VECTOR_ impementations on PowerPC c6b6ac500 wasm/simd128: add missing unsigned functions 78faeab11 wasm/simd128: fix altivec_p7 version of wasm_f64x2_pmin 1f359106b We are in a dev period again: v0.7.7 9135bd049 neon/cvtn: vcvtnq_{s32_f32,s64_f64}: add SSE & AVX512 optimized implementations 6a1db3a5a neon/cvtn: basic implementation of a few functions 1cf65cb0a mmx: loogson impl promotions over SIMDE_SHUFFLE_VECTOR_ 4ab8749df sse{,2,3,4.1},avx: more WASM shuffle implementations b49fa29d5 avx512: arghhh: really fix typedef of __mmask64 6244ab92e avx512: typo fix for typedef of __mmask64 20c5200d6 avx512/madd: fix native alias arguments for _mm512_madd_epi16 cc476f364 neon/qabs: restore SSE2 impl for vqabsq_s8 a7682611d neon/abd,ext,cmla{,_rot{180,270,90}}: additional wasm128 implementations ca523adb7 sse: allow native _mm_loadh_pi on MSVC x64 ac526659e test: appease GCC 5.x & clang 01ea9a8d3 start release process for 0.7.6 28a6001f6 x86/sse*,avx: add additional SIMD128 implementations aca2f0ae6 neon/shl,rshl: fix avx include to unbreak amalgamated hearders f60a9d8df neon/mla_lane: initial implementation using mla+dup f982cfd51 Update clang version detection for 14..16 and add link b45a14ccc simde-arch: include hedley for setting F16C for MSVC 2022+ with AVX2 3ce91d4cd 0.7.5 dev cycle on the road to 0.7.6/0.8.0 02c7a67ed sse: remove unbalanced HEDLEY_DIAGNOSTIC_PUSH b0b370a4b x86/sse: Add LoongArch LSX support 2338f175d arch: Add LoongArch LASX/LSX support 90d95fae4 avx512: define __mask64 & __mask32 if not yet defined 42a43fa57 sve/true,whilelt,cmplt,ld1,st1,sel,and: skip AVX512 native implementations on MSVC 2017 20f98da6f sve/whilelt: correct type-o in __mmask32 initialization 47a1500f7 sve/ptest: _BitScanForward64 and __builtin_ctzll is not available in MSVC 2017 cd93fcc9e avx512/knot,kxor: native calls not availabe on MSVC 2017 ba6324b6b avx512/loadu: _mm{,256}_loadu_epi{8,16,32,64} skip native impl on MSVC < 2019 2f6fe9c64 sse2/avx: move some native aliases around to satisfy MSVC 2017 /ARCH:AVX512 91fda2cc9 axv512/insert: unroll SIMDE_CONSTIFY for testing macro implemented functions a397b74b3 __builtin_signbit: add cast to double for old Clang versions e016050b2 clmul: _mm512_clmulepi64_epi128 implicitly requires AVX512F 7e353c009 Wasm q15mulr_sat_s: match Wasm spec ce375861c Wasm f32/f64 nearest: match Wasm spec 96d5e0346 Wasm f32/f64 floor/ceil/trunc/sqrt: match Wasm spec 5676a1ba7 Wasm f32/f64 abs: match Wasm spec aa299c08b Wasm f32/f64 max: match Wasm spec 433d2b951 Wasm f32/f64 min: match Wasm spec cf1ac40b8 avx{,2}: some intrinsics are missing from older MSVC versions bff9b1b3c simd128: move unary minus to appease msvc native arm64 efc512a49 neon/ext: unroll SIMDE_CONSTIFY for testing macro implemented functions 091250e81 neon/addlv: disable SSSE3 impl of _vaddlvq_s16 for MSVC 4b3053606 neon/ext: simde_*{to,from}_m64 reqs MMX_NATIVE 2dedbd9bf skip many mm{,_mask,_maskz}_roundscale_round_{ss,sd} testing on MSVC + AVX a04ea7bc9 f16c: rounding not yet implemented for simde_mm{256,}_cvtps_ph e8ee041ab ci appveyor: build tests with AVX{,2}, but don't run them 2188c9728 arm/neon/add{l,}v: SSE2/SSSE3 opts _vadd{lvq_s8, lvq_s16, lvq_u8, vq_u8} 186f12f17 axv512: add simde_mm512_{cvtepi32_ps,extractf32x8_ps,_cmpgt_epi16_mask} 6a40fdeb5 arm/neon/rnd: use correct SVML function for simde_vrndq_f64 9a0705b06 svml: simde_mm256_{clog,csqrt}_ps native reqs AVX not SSE c298a7ec2 msvc avx512/roundscale_round: quiet a false positive warning 01d9c5def sse: remove errant MMX requirement from simde_mm_movemask_ps c675aa08d x86/avx{,2}: use SIMDE_FLOAT{32,64}_C to fix warnings from msvc 097af509e msvc 2022: enable F16C if AVX2 present 91cd7b64b avx{,2}: fix maskload illegal mem access 2caa25b85 Fixed simde_mm_prefetch warnings 96bdf5234 Fixed parameters to _mm_clflush 4d560e418 emscripten; don't use __builtin_roundeven{f,} even if defined 511a01e7d avx512/compress: Mitigate poor compressstore performance on AMD Zen 4 a22b63dc9 avx512/{knot,kxor,cmp,cmpeq,compress,cvt,loadu,shuffle,storeu} Additional AVX512{F,BW,VBMI2,VL} ops 3d87469f6 wasm simd128: correct trunc_sat _FAST_CONVERSION_RANGE target type 56ca5bd89 Suppress min/max macro definitions from windows.h f2cea4d33 arm/neon/qdmulh s390 gcc-12: __builtin_shufflevector is misbehaving 3698cef9b neon/cvt: clang bug 46844 was fixed in clang 12.0 9369cea4a simd128: clang 13 fixed bugs affecting simde_wasm_{v128_load8_lane,i64x2_load32x2} ce27bd09a gcc power: vec_cpsgn argument reversal fixed in 12.0 20fd5b94b gcc power: bugs 1007[012] fixed in GCC 12.1 5e25de133 gcc sse2: bug 99754 was fixed in GCC 12.1 e69796025 gcc i686 mm*_dpbf16_ps: skip vector ops due to rounding error 359c3ff47 clang wasm simde: add workaround to fix wasm_i64x2_shl bug b767f5edc arm/neon: workaround on ARM64 windows bug 599b1fbf4 mips/msa: fix for Windows ARM64 c6f4821ed arm64 windows: fix simd128.h build error 782e7c73e prepare to release 0.7.4 6e9ac2457 fix A32V7 version of _mm_test{nz,}c_si128 776f7a699 test with Debian default flags, also for armel a240d951a x86: fix AVX native → SSE4.2 native 5a73c2ce5 _mm_insert_ps: incorrect handling of the control 597a1c9e4 neon/ld1[q]_*_x2: initial implementation 4550faeac wasm: f32x4 and f64x2 nearest roundeven 5e0686459 Add missing `static const` in simde-math.h. NFC da02f2cee avx512/setzero: fix native aliases 89762e11b Fixed FMA detection macro on msvc b0fda5cf2 avx512/load_pd: initial implementation a61af0778 avx512/load_ps: initial implementation 4126bde01 Properly map __mm functions to __simde_mm 2e76b7a69 neon ld2: gcc-12 fixes 604a53de3 fix wrong size e5e085ff8 AVX: add native calls for _mm256_insertf128_{pd,ps,si256} ee3bd005b aarch64 + clang-1[345] fix for "implicit conversion changes signedness" a060c461a wasm: load lane memcpy instead of cast to address UBSAN issues git-subtree-dir: lib/simde/simde git-subtree-split: 0faa907b261001f89ac89becaea20beddd675468
b8c6585
to
55699b9
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For the post v0.8.4-rc1 changes, see simd-everywhere/simde-no-tests@v0.8.4-rc1...0faa907