Skip to content

Conversation

@iv-m
Copy link

@iv-m iv-m commented Jan 15, 2026

Fix a couple of errors I found when testing current simde master on my loongarch64 machine with GCC 14.3.1 and -Ofast -mlsx -mlasx in CFLAGS and CXXFLAGS.

iv-m added 2 commits January 15, 2026 14:04
__lsx_vftintrz_w_d accepts two __m128d arguments, so it's
should be called with zero_f64 that is declared.

This fixes the following compilation error that I get when
compiling current simde master for loongarch64-linux-gnu
with gcc 14.3.1 and `-Ofast -mlsx -mlasx` in CFLAGS:

../test/x86/avx512/../../../simde/x86/sse2.h: In function ‘simde__m128i simde_mm_cvttpd_epi32(simde__m128d)’:
../test/x86/avx512/../../../simde/x86/sse2.h:3736:39: error: ‘zero_i64’ was not declared in this scope; did you mean ‘zero_f64’?
 3736 |       r_.lsx_i64 = __lsx_vftintrz_w_d(zero_i64, simde__m128d_to_private(a).lsx_f64);
      |                                       ^~~~~~~~
      |                                       zero_f64

Signed-off-by: Ivan A. Melnikov <iv@altlinux.org>
Similarly to what other architectures do, __lsx_vftintrz_w_s
should be used when both SIMDE_FAST_CONVERSION_RANGE and
SIMDE_FAST_NANS are declared, not just stored to a temporary
and lost.

Signed-off-by: Ivan A. Melnikov <iv@altlinux.org>
@iv-m
Copy link
Author

iv-m commented Jan 15, 2026

@HecaiYuan, @mr-c, please take a look.

Copy link
Collaborator

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iv-m
Copy link
Author

iv-m commented Jan 15, 2026

Please address the issues at [...]

Well, I just submitted fixes for two of the most obvious issues I found(

Other issues require more investigation. For example, the error from the CI:

 ../test/x86/avx512/../../../simde/x86/sse2.h: In function ‘simde_mm_cvtps_epi32’:
../test/x86/avx512/../../../simde/x86/sse2.h:3316:7: note: use ‘-flax-vector-conversions’ to permit conversions between vectors with differing element types or numbers of subparts
 3316 |       r_.lsx_i32 = __lsx_vftintrne_w_s(a_.lsx_f32);
      |       ^~
../test/x86/avx512/../../../simde/x86/sse2.h:3316:20: error: incompatible types when assigning to type ‘v4i32’ from type ‘__m128i’
 3316 |       r_.lsx_i32 = __lsx_vftintrne_w_s(a_.lsx_f32);
      |                    ^~~~~~~~~~~~~~~~~~~

It looks like GCC bug to me: __lsx_vftintrne_w_s, as far as I can know, returns four 32-bit integers packed into __m128, but GCC 14 for some reason thinks otherwise.

I'm not sure I'm qualified enough and have time to address all the issues there rn. I probably can disable the preporcessor branches for optimizations that break CI -- that's more or less what I did for my local simde copy -- and hope that LoongArch community (me included) will eventually implement them back. If this is the way to go, what is the preferred conventions for that? Something along the lines of #if 0 && ... or just plane removal?

@mr-c
Copy link
Collaborator

mr-c commented Jan 15, 2026

If this is the way to go, what is the preferred conventions for that?

Please file issues with GCC, then we can add a entry near

# if defined(SIMDE_ARCH_POWER)
to define SIMDE_BUG_GCC_NNNN only for circumstances where it is active.

Then we can add && !defined(SIMDE_BUG_GCC_NNNN) to

#elif defined(SIMDE_LOONGARCH_LSX_NATIVE) && defined(SIMDE_FAST_CONVERSION_RANGE) && defined(SIMDE_FAST_ROUND_TIES)

You can also experiment with adding cast using HEDLEY_REINTERPRET_CAST if you'd like to use this function before a fix is released for GCC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants