aarch64 op: Enable two‑stage SVE detection in component configuration #13204

vogma · 2025-04-22T10:23:15Z

This pull request refactors the build configuration for the OpenMPI aarch64 op component, aligning it with the existing approach used by the avx component.

Currently, the avx component configuration systematically tests SIMD instruction support (e.g., AVX2, AVX512) by incrementally applying compiler flags until compilation succeeds, independent from the host CPU's capabilities. In contrast, the existing aarch64 configuration lacks this mechanism, performing only basic compilation checks without utilizing additional flags or function attributes.

To address this gap, this pull request enhances the aarch64 configuration by incorporating checks using compiler function attributes (specifically, __attribute__((__target__("+sve")))). This enables SVE code and includes it in OpenMPI regardless of compiler flags provided explicitly via the command line, just like the avx component already does.

If compilation via function attribute is successful, the attribute is automatically inserted via macro into the SVE function templates within op_aarch64_functions.c. Runtime detection of processor capabilities (NEON or SVE) remains unchanged. No API's have been changed, only the build systems is modified along with a macro definition in the source files.

There is a discussion that goes into more detail in the developer user group.

jsquyres

Unless I'm missing something, it looks ok to squash the code changes in this PR down to a single commit.

If you'd like to preserve your whitespace changes, please separate those into a separate commit and label them as such (e.g., 11ff7f0).

ompi/mca/op/aarch64/configure.m4

ompi/mca/op/aarch64/op_aarch64_functions.c

vogma · 2025-04-22T19:16:16Z

I'm working on the failed test. On my AWS Graviton Instance the Tests run fine but i get an Illegal Instruction error when openmpi was compiled with lower -march flags. Looking into it..
EDIT: Actually i have accidentally set the -march parameter too high, with -march=armv8-a the examples run

jsquyres

@vogma The macos CI error appears to be legitimate -- I can reproduce it manually. When I build on my Mac (M3) and try to run the ring_c example:

$ mpirun --map-by ppr:1:core examples/ring_c
--------------------------------------------------------------------------
prterun noticed that process rank 8 with PID 60263 on node Little-Beezle exited on
signal 4 (Illegal instruction: 4).
--------------------------------------------------------------------------

Can you investigate?

vogma · 2025-05-04T15:45:57Z

Thank you for testing it on your machine! I will try to find out why this happens.

vogma · 2025-05-05T15:23:26Z

I do not have access to Apple hardware but the macOS failure likely stems from executing this line in user mode:

__asm__("mrs %0, ID_AA64PFR0_EL1" : "=r"(id_aa64pfr0_el1));

On AArch64, MRS …, ID_AA64PFR0_EL1 is a privileged EL1 register access and will raise SIGILL if executed from EL0.

Linux’s kernel transparently traps SIGILL for certain system registers (including the SVE feature bits) and emulates their values in user space (see https://www.kernel.org/doc/html/latest/arch/arm64/cpu-feature-registers.html#implementation). To my knowledge macOS offers no equivalent SIGILL‐hook emulation of EL1 registers, so any direct MRS ID_AA64PFR0_EL1 from user code will fault. In this PR, the build-system refactoring enabled runtime SVE detection even when the hardware used during compilation did not support it, exercising the inline-asm path that was previously bypassed - hence the new SIGILL.

I updated the configure logic to skip the EL1‐based runtime feature probe on Apple platforms and tested the failed examples in the CI/CD pipeline on all available AWS Graviton generations (Graviton 2-4) and had no problems.

Could you rerun the tests please?

ompi/mca/op/aarch64/op_aarch64.h

ompi/mca/op/aarch64/configure.m4

jsquyres · 2025-05-05T17:48:44Z

Also, could you squash down to 1 commit? We prefer rebasing and squashing over merging from main. Thanks!

jsquyres · 2025-05-05T17:54:37Z

I forgot to mention: testing with the PR code today works on my Mac M3. 🎉

- Introduce AC_CACHE_CHECK probes for ARM Scalable Vector Extension (SVE) using both a default compile test and a second test with __attribute__((__target__("+sve"))). - Define variables op_cv_sve_support and op_cv_sve_add_flags - Update AM_CONDITIONAL and AC_DEFINE to expose SVE support macros (OMPI_MCA_OP_HAVE_SVE, OMPI_MCA_OP_SVE_EXTRA_FLAGS). - Extend final AS_IF to enable the component when either NEON or SVE is available. - Add a preprocessor guard around SVE-specific function attributes - Encapsulate the +sve attribute behind OMPI_MCA_OP_SVE_EXTRA_FLAGS, ensuring that only builds which detected and enabled compiler SVE support will compile with SVE-targeted code paths. - Simplifies later code by using SVE_ATTR in function declarations instead of repeating the attribute clause. - apply SVE_ATTR macro in C source for conditional +sve targeting - sve feature detection only on linux - code review feedback Signed-off-by: Marco Vogel <marco.vogel@fernuni-hagen.de>

jsquyres · 2025-05-05T20:12:43Z

Latest PR works for me on my Mac.

@bosilca Can you review?

bosilca

Seems to run just fine on both M3 and grace.

jsquyres · 2025-05-07T21:21:52Z

@vogma Can you file a PR with a cherry-pick to the v5.0.x PR? (git cherry-pick -x ...hash from this PR...)

Thanks!

github-actions bot added the Target: main label Apr 22, 2025

jsquyres reviewed Apr 22, 2025

View reviewed changes

bosilca reviewed Apr 22, 2025

View reviewed changes

ompi/mca/op/aarch64/op_aarch64_functions.c Outdated Show resolved Hide resolved

vogma force-pushed the aarch64_build_update branch 3 times, most recently from 4d6bf1d to 1268c51 Compare April 22, 2025 17:12

bosilca previously approved these changes Apr 22, 2025

View reviewed changes

jsquyres requested changes May 3, 2025

View reviewed changes

vogma dismissed bosilca’s stale review via 08ccb4b May 5, 2025 15:19

vogma requested a review from jsquyres May 5, 2025 17:01

jsquyres reviewed May 5, 2025

View reviewed changes

ompi/mca/op/aarch64/op_aarch64.h Outdated Show resolved Hide resolved

ompi/mca/op/aarch64/configure.m4 Outdated Show resolved Hide resolved

vogma force-pushed the aarch64_build_update branch from 08ccb4b to 16cb214 Compare May 5, 2025 18:58

vogma requested a review from jsquyres May 5, 2025 19:01

jsquyres requested review from bosilca, ggouaillardet and bwbarrett May 5, 2025 20:12

jsquyres approved these changes May 5, 2025

View reviewed changes

bosilca approved these changes May 7, 2025

View reviewed changes

bosilca merged commit c346328 into open-mpi:main May 7, 2025
15 checks passed

This was referenced May 9, 2025

v5.0.x: Add SVE detection alongside NEON in aarch64 op component #13244

Merged

mca/op: always define aarch64 macros #13246

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

aarch64 op: Enable two‑stage SVE detection in component configuration #13204

aarch64 op: Enable two‑stage SVE detection in component configuration #13204

Uh oh!

vogma commented Apr 22, 2025 •

edited

Loading

Uh oh!

jsquyres left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vogma commented Apr 22, 2025 •

edited

Loading

Uh oh!

jsquyres left a comment

Uh oh!

vogma commented May 4, 2025

Uh oh!

vogma commented May 5, 2025

Uh oh!

Uh oh!

Uh oh!

jsquyres commented May 5, 2025

Uh oh!

jsquyres commented May 5, 2025

Uh oh!

jsquyres commented May 5, 2025

Uh oh!

bosilca left a comment

Uh oh!

Uh oh!

jsquyres commented May 7, 2025

Uh oh!

Uh oh!

aarch64 op: Enable two‑stage SVE detection in component configuration #13204

aarch64 op: Enable two‑stage SVE detection in component configuration #13204

Uh oh!

Conversation

vogma commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jsquyres left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vogma commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jsquyres left a comment

Choose a reason for hiding this comment

Uh oh!

vogma commented May 4, 2025

Uh oh!

vogma commented May 5, 2025

Uh oh!

Uh oh!

Uh oh!

jsquyres commented May 5, 2025

Uh oh!

jsquyres commented May 5, 2025

Uh oh!

jsquyres commented May 5, 2025

Uh oh!

bosilca left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jsquyres commented May 7, 2025

Uh oh!

Uh oh!

vogma commented Apr 22, 2025 •

edited

Loading

vogma commented Apr 22, 2025 •

edited

Loading