Skip to content

[OpenMP] Enable simd in non-reduction composite constructs #146097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mrkajetanp
Copy link
Contributor

Despite currently being ignored with a warning, simd as a leaf in composite constructs behaves as expected when the construct does not contain a reduction. Enable it for those non-reduction constructs.

@llvmbot
Copy link
Member

llvmbot commented Jun 27, 2025

@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-mlir-openmp

@llvm/pr-subscribers-mlir-llvm

Author: Kajetan Puchalski (mrkajetanp)

Changes

Despite currently being ignored with a warning, simd as a leaf in composite constructs behaves as expected when the construct does not contain a reduction. Enable it for those non-reduction constructs.


Full diff: https://github.com/llvm/llvm-project/pull/146097.diff

4 Files Affected:

  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+2-5)
  • (added) mlir/test/Target/LLVMIR/openmp-parallel-do-simd.mlir (+30)
  • (added) mlir/test/Target/LLVMIR/openmp-teams-distribute-parallel-do-simd.mlir (+33)
  • (modified) mlir/test/Target/LLVMIR/openmp-todo.mlir (-14)
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 23140f22555a5..de51ca49649f9 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -2847,11 +2847,8 @@ convertOmpSimd(Operation &opInst, llvm::IRBuilderBase &builder,
   llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
   auto simdOp = cast<omp::SimdOp>(opInst);
 
-  // TODO: Replace this with proper composite translation support.
-  // Currently, simd information on composite constructs is ignored, so e.g.
-  // 'do/for simd' will be treated the same as a standalone 'do/for'. This is
-  // allowed by the spec, since it's equivalent to using a SIMD length of 1.
-  if (simdOp.isComposite()) {
+  // TODO: Replace this once simd + reduction is properly supported
+  if (simdOp.isComposite() && simdOp.getReductionByref().has_value()) {
     if (failed(convertIgnoredWrapper(simdOp, moduleTranslation)))
       return failure();
 
diff --git a/mlir/test/Target/LLVMIR/openmp-parallel-do-simd.mlir b/mlir/test/Target/LLVMIR/openmp-parallel-do-simd.mlir
new file mode 100644
index 0000000000000..9ea0ff590144f
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-parallel-do-simd.mlir
@@ -0,0 +1,30 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+// Check that omp.simd as a leaf of a composite construct still generates
+// the appropriate loop vectorization attribute.
+
+// CHECK-LABEL: define internal void @test_parallel_do_simd..omp_par
+// CHECK: ![[VAL:.*]] = !{!"llvm.loop.vectorize.enable", i1 true}
+
+llvm.func @test_parallel_do_simd() {
+  %0 = llvm.mlir.constant(1 : i64) : i64
+  %1 = llvm.alloca %0 x i32 {bindc_name = "i"} : (i64) -> !llvm.ptr
+  %2 = llvm.mlir.constant(1000 : i32) : i32
+  %3 = llvm.mlir.constant(1 : i32) : i32
+  %4 = llvm.mlir.constant(1 : i64) : i64
+  omp.parallel {
+    %5 = llvm.mlir.constant(1 : i64) : i64
+    %6 = llvm.alloca %5 x i32 {bindc_name = "i", pinned} : (i64) -> !llvm.ptr
+    %7 = llvm.mlir.constant(1 : i64) : i64
+    omp.wsloop {
+      omp.simd {
+        omp.loop_nest (%arg0) : i32 = (%3) to (%2) inclusive step (%3) {
+          llvm.store %arg0, %6 : i32, !llvm.ptr
+          omp.yield
+        }
+      } {omp.composite}
+    } {omp.composite}
+    omp.terminator
+  }
+  llvm.return
+}
diff --git a/mlir/test/Target/LLVMIR/openmp-teams-distribute-parallel-do-simd.mlir b/mlir/test/Target/LLVMIR/openmp-teams-distribute-parallel-do-simd.mlir
new file mode 100644
index 0000000000000..108a99cf7c40f
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-teams-distribute-parallel-do-simd.mlir
@@ -0,0 +1,33 @@
+// RUN: mlir-translate --mlir-to-llvmir %s | FileCheck %s
+
+// Check that omp.simd as a leaf of a composite construct still generates
+// the appropriate loop vectorization attribute.
+
+// CHECK-LABEL: define internal void @test_teams_distribute_parallel_do_simd..omp_par
+// CHECK: ![[VAL:.*]] = !{!"llvm.loop.vectorize.enable", i1 true}
+
+omp.private {type = private} @_QFEi_private_i32 : i32
+llvm.func @test_teams_distribute_parallel_do_simd() {
+  %0 = llvm.mlir.constant(1 : i64) : i64
+  %1 = llvm.alloca %0 x i32 {bindc_name = "i"} : (i64) -> !llvm.ptr
+  %2 = llvm.mlir.constant(1000 : i32) : i32
+  %3 = llvm.mlir.constant(1 : i32) : i32
+  %4 = llvm.mlir.constant(1 : i64) : i64
+  omp.teams {
+    omp.parallel {
+      omp.distribute {
+        omp.wsloop {
+          omp.simd private(@_QFEi_private_i32 %1 -> %arg0 : !llvm.ptr) {
+            omp.loop_nest (%arg1) : i32 = (%3) to (%2) inclusive step (%3) {
+              llvm.store %arg1, %arg0 : i32, !llvm.ptr
+              omp.yield
+            }
+          } {omp.composite}
+        } {omp.composite}
+      } {omp.composite}
+      omp.terminator
+    } {omp.composite}
+    omp.terminator
+  }
+  llvm.return
+}
diff --git a/mlir/test/Target/LLVMIR/openmp-todo.mlir b/mlir/test/Target/LLVMIR/openmp-todo.mlir
index 97608ca3b4df1..c515e6e70b264 100644
--- a/mlir/test/Target/LLVMIR/openmp-todo.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-todo.mlir
@@ -26,20 +26,6 @@ llvm.func @atomic_hint(%v : !llvm.ptr, %x : !llvm.ptr, %expr : i32) {
 
 // -----
 
-llvm.func @do_simd(%lb : i32, %ub : i32, %step : i32) {
-  omp.wsloop {
-    // expected-warning@below {{simd information on composite construct discarded}}
-    omp.simd {
-      omp.loop_nest (%iv) : i32 = (%lb) to (%ub) step (%step) {
-        omp.yield
-      }
-    } {omp.composite}
-  } {omp.composite}
-  llvm.return
-}
-
-// -----
-
 llvm.func @distribute_allocate(%lb : i32, %ub : i32, %step : i32, %x : !llvm.ptr) {
   // expected-error@below {{not yet implemented: Unhandled clause allocate in omp.distribute operation}}
   // expected-error@below {{LLVM Translation failed for operation: omp.distribute}}

@llvmbot
Copy link
Member

llvmbot commented Jun 27, 2025

@llvm/pr-subscribers-flang-openmp

Author: Kajetan Puchalski (mrkajetanp)

Changes

Despite currently being ignored with a warning, simd as a leaf in composite constructs behaves as expected when the construct does not contain a reduction. Enable it for those non-reduction constructs.


Full diff: https://github.com/llvm/llvm-project/pull/146097.diff

4 Files Affected:

  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+2-5)
  • (added) mlir/test/Target/LLVMIR/openmp-parallel-do-simd.mlir (+30)
  • (added) mlir/test/Target/LLVMIR/openmp-teams-distribute-parallel-do-simd.mlir (+33)
  • (modified) mlir/test/Target/LLVMIR/openmp-todo.mlir (-14)
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 23140f22555a5..de51ca49649f9 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -2847,11 +2847,8 @@ convertOmpSimd(Operation &opInst, llvm::IRBuilderBase &builder,
   llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
   auto simdOp = cast<omp::SimdOp>(opInst);
 
-  // TODO: Replace this with proper composite translation support.
-  // Currently, simd information on composite constructs is ignored, so e.g.
-  // 'do/for simd' will be treated the same as a standalone 'do/for'. This is
-  // allowed by the spec, since it's equivalent to using a SIMD length of 1.
-  if (simdOp.isComposite()) {
+  // TODO: Replace this once simd + reduction is properly supported
+  if (simdOp.isComposite() && simdOp.getReductionByref().has_value()) {
     if (failed(convertIgnoredWrapper(simdOp, moduleTranslation)))
       return failure();
 
diff --git a/mlir/test/Target/LLVMIR/openmp-parallel-do-simd.mlir b/mlir/test/Target/LLVMIR/openmp-parallel-do-simd.mlir
new file mode 100644
index 0000000000000..9ea0ff590144f
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-parallel-do-simd.mlir
@@ -0,0 +1,30 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+// Check that omp.simd as a leaf of a composite construct still generates
+// the appropriate loop vectorization attribute.
+
+// CHECK-LABEL: define internal void @test_parallel_do_simd..omp_par
+// CHECK: ![[VAL:.*]] = !{!"llvm.loop.vectorize.enable", i1 true}
+
+llvm.func @test_parallel_do_simd() {
+  %0 = llvm.mlir.constant(1 : i64) : i64
+  %1 = llvm.alloca %0 x i32 {bindc_name = "i"} : (i64) -> !llvm.ptr
+  %2 = llvm.mlir.constant(1000 : i32) : i32
+  %3 = llvm.mlir.constant(1 : i32) : i32
+  %4 = llvm.mlir.constant(1 : i64) : i64
+  omp.parallel {
+    %5 = llvm.mlir.constant(1 : i64) : i64
+    %6 = llvm.alloca %5 x i32 {bindc_name = "i", pinned} : (i64) -> !llvm.ptr
+    %7 = llvm.mlir.constant(1 : i64) : i64
+    omp.wsloop {
+      omp.simd {
+        omp.loop_nest (%arg0) : i32 = (%3) to (%2) inclusive step (%3) {
+          llvm.store %arg0, %6 : i32, !llvm.ptr
+          omp.yield
+        }
+      } {omp.composite}
+    } {omp.composite}
+    omp.terminator
+  }
+  llvm.return
+}
diff --git a/mlir/test/Target/LLVMIR/openmp-teams-distribute-parallel-do-simd.mlir b/mlir/test/Target/LLVMIR/openmp-teams-distribute-parallel-do-simd.mlir
new file mode 100644
index 0000000000000..108a99cf7c40f
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-teams-distribute-parallel-do-simd.mlir
@@ -0,0 +1,33 @@
+// RUN: mlir-translate --mlir-to-llvmir %s | FileCheck %s
+
+// Check that omp.simd as a leaf of a composite construct still generates
+// the appropriate loop vectorization attribute.
+
+// CHECK-LABEL: define internal void @test_teams_distribute_parallel_do_simd..omp_par
+// CHECK: ![[VAL:.*]] = !{!"llvm.loop.vectorize.enable", i1 true}
+
+omp.private {type = private} @_QFEi_private_i32 : i32
+llvm.func @test_teams_distribute_parallel_do_simd() {
+  %0 = llvm.mlir.constant(1 : i64) : i64
+  %1 = llvm.alloca %0 x i32 {bindc_name = "i"} : (i64) -> !llvm.ptr
+  %2 = llvm.mlir.constant(1000 : i32) : i32
+  %3 = llvm.mlir.constant(1 : i32) : i32
+  %4 = llvm.mlir.constant(1 : i64) : i64
+  omp.teams {
+    omp.parallel {
+      omp.distribute {
+        omp.wsloop {
+          omp.simd private(@_QFEi_private_i32 %1 -> %arg0 : !llvm.ptr) {
+            omp.loop_nest (%arg1) : i32 = (%3) to (%2) inclusive step (%3) {
+              llvm.store %arg1, %arg0 : i32, !llvm.ptr
+              omp.yield
+            }
+          } {omp.composite}
+        } {omp.composite}
+      } {omp.composite}
+      omp.terminator
+    } {omp.composite}
+    omp.terminator
+  }
+  llvm.return
+}
diff --git a/mlir/test/Target/LLVMIR/openmp-todo.mlir b/mlir/test/Target/LLVMIR/openmp-todo.mlir
index 97608ca3b4df1..c515e6e70b264 100644
--- a/mlir/test/Target/LLVMIR/openmp-todo.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-todo.mlir
@@ -26,20 +26,6 @@ llvm.func @atomic_hint(%v : !llvm.ptr, %x : !llvm.ptr, %expr : i32) {
 
 // -----
 
-llvm.func @do_simd(%lb : i32, %ub : i32, %step : i32) {
-  omp.wsloop {
-    // expected-warning@below {{simd information on composite construct discarded}}
-    omp.simd {
-      omp.loop_nest (%iv) : i32 = (%lb) to (%ub) step (%step) {
-        omp.yield
-      }
-    } {omp.composite}
-  } {omp.composite}
-  llvm.return
-}
-
-// -----
-
 llvm.func @distribute_allocate(%lb : i32, %ub : i32, %step : i32, %x : !llvm.ptr) {
   // expected-error@below {{not yet implemented: Unhandled clause allocate in omp.distribute operation}}
   // expected-error@below {{LLVM Translation failed for operation: omp.distribute}}

@mrkajetanp
Copy link
Contributor Author

Question for reviewers who have worked on this before - the current behaviour is fairly inconsistent, that is to say, using "simd reduction" on its own results in a crash because it's not implemented, whereas using "simd reduction" as part of a composite construct simply ignores the simd. Is it desirable to keep this behaviour? For instance, the following program will crash:

integer :: sum
sum = 0
!$omp simd reduction(+:sum)
do i=1, 1000
    sum = sum + 1
end do

While this one will only emit a warning.

integer :: sum
sum = 0
!$omp do simd reduction(+:sum)
do i=1, 1000
    sum = sum + 1
end do

There are several tests that rely on this, which is why I kept the current warning for composite constructs with reduction.

Additionally, if someone knows of any scenarios which I've overlooked that this might break for, please point them out. I was expecting something else to require fixing in order to do this, but I've not found anything that breaks so far.

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please could you elaborate on the testing you've done to be sure this is correctly supported.

@mrkajetanp
Copy link
Contributor Author

I ran both gfortran and Fujitsu test suites (alongside the LLVM one) to make sure this does not introduce regressions in those. For a number of test Fortran programs using composite constructs with simd, I manually inspected the IR and the final binary to make sure that with this change, vectorisable instructions inside these constructs turn into vectorised instructions even with -fno-vectorize, where without this change they do not.

@tblah
Copy link
Contributor

tblah commented Jun 30, 2025

I believe the reason for not applying the "not yet implemented" message to simd reduction for composite constructs is that if the reduction clause is used on the composite construct, that reduction is logically applied to most of the constructs comprising the composite construct. So disabling simd reduction would disable reduction for any composite construct containing simd.

SIMD is unusual because not implementing the reduction won't produce incorrect results (except some subtle effects around the re-presentable range of numbers) - it is just less likely to vectorize. I will post a patch implementing this soon so I don't think we need to worry.

@skatrak
Copy link
Member

skatrak commented Jun 30, 2025

Question for reviewers who have worked on this before - the current behaviour is fairly inconsistent, that is to say, using "simd reduction" on its own results in a crash because it's not implemented, whereas using "simd reduction" as part of a composite construct simply ignores the simd. Is it desirable to keep this behaviour? For instance, the following program will crash:

I was actually planning on proposing changes to this soon, so I can just explain from my side what the thinking behind some of these differences are. Currently, we entirely ignore simd within a composite construct, which means that we don't check whether there are unimplemented simd clauses being specified. In the case of standalone simd, though, there is some existing support, so there are checks for unimplemented features. The result of that is that do simd reduction(+: x) compiles and runs as if it was just do reduction(+: x), whereas simd reduction(+: x) triggers a not-yet-implemented error for the reduction clause.

The reason for ignoring simd information on composite constructs was to prevent the introduction of simd from causing not-yet-implemented errors when it would be legal by the spec to always codegen for SIMD lanes of 1 element, especially when there are reductions involved, as @tblah described. So, composite constructs behave as if they always resulted in 1-element SIMD lanes. This is very arbitrary, so I like your proposal to handle simd in composite constructs. At the moment I'm most concerned about what to do about unsupported clauses.

How much of an issue would it be to always emit warnings instead of errors for all unsupported simd clauses? Does this become an issue only after we generate 2+ element SIMD lanes? I'm thinking that from the target side we don't currently want to be doing anything with simd constructs, so we might still want to have the option to ignore them.

Despite currently being ignored with a warning, simd as a leaf in
composite constructs behaves as expected when the construct does not
contain a reduction. Enable it for those non-reduction constructs.

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
@mrkajetanp
Copy link
Contributor Author

Thank you for that background, the reasoning seems sensible to me.

How much of an issue would it be to always emit warnings instead of errors for all unsupported simd clauses?

Currently not supported clauses are reduction, if, linear, nontemporal and order based on my testing. Of those, with the change I just pushed, both if and reduction are ignored with a warning. The remaining 3 clauses result in either a crash or a semantics error, but that is also the case without this PR as the crash happens before the convertOmpSimd codepath is reached. Since that is unaffected by this PR, I think addressing it would best be done in a separate one.

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but please wait for approval from @skatrak

@skatrak
Copy link
Member

skatrak commented Jul 1, 2025

Currently not supported clauses are reduction, if, linear, nontemporal and order based on my testing. Of those, with the change I just pushed, both if and reduction are ignored with a warning.

I think that's a reasonable middle ground. My concern is that this still results in different handling for !$omp {do, distribute} simd reduction(...) vs !$omp simd reduction(...). Would it be an issue to only emit warnings for reduction and if clauses when processing standalone simd constructs too?

That should only require small tweaks to the checkImplementationStatus function and forwarding reduction variables (our downstream fork already does it, in case it serves as reference).

@mrkajetanp
Copy link
Contributor Author

mrkajetanp commented Jul 1, 2025

Would it be an issue to only emit warnings for reduction and if clauses when processing standalone simd constructs too?

The current behaviour is that if we have simd with one of those clauses in a composite construct, then simd gets ignored along with the clauses. Are you saying we should do the same for standalone simd? That would effectively make standalone simd with one of said clauses a no-op. If so then it's very easy to implement, we can just achieve it with this:

-  if (simdOp.isComposite() &&
-      (simdOp.getReductionByref().has_value() || simdOp.getIfExpr())) {
+  if (simdOp.getReductionByref().has_value() || simdOp.getIfExpr()) {

@skatrak
Copy link
Member

skatrak commented Jul 2, 2025

The current behaviour is that if we have simd with one of those clauses in a composite construct, then simd gets ignored along with the clauses. Are you saying we should do the same for standalone simd? That would effectively make standalone simd with one of said clauses a no-op.

What I'm suggesting is not making simd a no-op in any of these cases. Instead, to warn about missing support for reduction and if clauses if they are present, but still process the construct. That would be handled by making some slight changes to checkImplementationStatus and by forwarding reduction arguments, rather than calling convertIgnoredWrapper (we would only want to call that one when compiling for the device, at least for now). And we would do this for both standalone and composite simd.

I'm fine with something else if this is not the behavior that we want. I'm just thinking that this would remove the current divergence between the handling of standalone and composite simd while being flexible enough to not trigger not-yet-implemented errors for constructs like !$omp do simd reduction(...) or !$omp distribute parallel do simd if(...), even though that means !$omp simd if(.false.) would just result in a warning and always producing vector code.

tblah added a commit to tblah/llvm-project that referenced this pull request Jul 2, 2025
This replicates clang's implementation. Basically:
  - A private copy of the reduction variable is created, initialized to
    the reduction neutral value (using regions from the reduction
    declaration op).
  - The body of the loop is lowered as usual, with accesses to the
    reduction variable mapped to the private copy.
  - After the loop, we inline the reduction region from the declaration
    op to combine the privatized variable into the original variable.
  - As usual with the SIMD construct, attributes are added to encourage
    vectorization of the loop and to assert that memory accesses in the
    loop don't alias across iterations.

I have verified that simple scalar examples do vectorize at -O3 and the
tests I could find in the Fujitsu test suite produce correct results. I
tested on top of llvm#146097 and this seemed to work for composite constructs
as well.

Fixes llvm#144290
@mrkajetanp
Copy link
Contributor Author

I think that's a reasonable approach, but could we do that in a separate PR since it's a bit out of scope for this one? I'll be away for a week and there's this simd reduction PR that should ideally go on top of this one.

Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Co-authored-by: Sergio Afonso <safonsof@amd.com>
tblah added a commit that referenced this pull request Jul 2, 2025
This replicates clang's implementation. Basically:
- A private copy of the reduction variable is created, initialized to
the reduction neutral value (using regions from the reduction
declaration op).
- The body of the loop is lowered as usual, with accesses to the
reduction variable mapped to the private copy.
- After the loop, we inline the reduction region from the declaration op
to combine the privatized variable into the original variable.
- As usual with the SIMD construct, attributes are added to encourage
vectorization of the loop and to assert that memory accesses in the loop
don't alias across iterations.

I have verified that simple scalar examples do vectorize at -O3 and the
tests I could find in the Fujitsu test suite produce correct results. I
tested on top of #146097 and this seemed to work for composite
constructs as well.

Fixes #144290
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants