-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[OpenMP] Enable simd in non-reduction composite constructs #146097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-mlir-llvm Author: Kajetan Puchalski (mrkajetanp) ChangesDespite currently being ignored with a warning, simd as a leaf in composite constructs behaves as expected when the construct does not contain a reduction. Enable it for those non-reduction constructs. Full diff: https://github.com/llvm/llvm-project/pull/146097.diff 4 Files Affected:
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 23140f22555a5..de51ca49649f9 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -2847,11 +2847,8 @@ convertOmpSimd(Operation &opInst, llvm::IRBuilderBase &builder,
llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
auto simdOp = cast<omp::SimdOp>(opInst);
- // TODO: Replace this with proper composite translation support.
- // Currently, simd information on composite constructs is ignored, so e.g.
- // 'do/for simd' will be treated the same as a standalone 'do/for'. This is
- // allowed by the spec, since it's equivalent to using a SIMD length of 1.
- if (simdOp.isComposite()) {
+ // TODO: Replace this once simd + reduction is properly supported
+ if (simdOp.isComposite() && simdOp.getReductionByref().has_value()) {
if (failed(convertIgnoredWrapper(simdOp, moduleTranslation)))
return failure();
diff --git a/mlir/test/Target/LLVMIR/openmp-parallel-do-simd.mlir b/mlir/test/Target/LLVMIR/openmp-parallel-do-simd.mlir
new file mode 100644
index 0000000000000..9ea0ff590144f
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-parallel-do-simd.mlir
@@ -0,0 +1,30 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+// Check that omp.simd as a leaf of a composite construct still generates
+// the appropriate loop vectorization attribute.
+
+// CHECK-LABEL: define internal void @test_parallel_do_simd..omp_par
+// CHECK: ![[VAL:.*]] = !{!"llvm.loop.vectorize.enable", i1 true}
+
+llvm.func @test_parallel_do_simd() {
+ %0 = llvm.mlir.constant(1 : i64) : i64
+ %1 = llvm.alloca %0 x i32 {bindc_name = "i"} : (i64) -> !llvm.ptr
+ %2 = llvm.mlir.constant(1000 : i32) : i32
+ %3 = llvm.mlir.constant(1 : i32) : i32
+ %4 = llvm.mlir.constant(1 : i64) : i64
+ omp.parallel {
+ %5 = llvm.mlir.constant(1 : i64) : i64
+ %6 = llvm.alloca %5 x i32 {bindc_name = "i", pinned} : (i64) -> !llvm.ptr
+ %7 = llvm.mlir.constant(1 : i64) : i64
+ omp.wsloop {
+ omp.simd {
+ omp.loop_nest (%arg0) : i32 = (%3) to (%2) inclusive step (%3) {
+ llvm.store %arg0, %6 : i32, !llvm.ptr
+ omp.yield
+ }
+ } {omp.composite}
+ } {omp.composite}
+ omp.terminator
+ }
+ llvm.return
+}
diff --git a/mlir/test/Target/LLVMIR/openmp-teams-distribute-parallel-do-simd.mlir b/mlir/test/Target/LLVMIR/openmp-teams-distribute-parallel-do-simd.mlir
new file mode 100644
index 0000000000000..108a99cf7c40f
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-teams-distribute-parallel-do-simd.mlir
@@ -0,0 +1,33 @@
+// RUN: mlir-translate --mlir-to-llvmir %s | FileCheck %s
+
+// Check that omp.simd as a leaf of a composite construct still generates
+// the appropriate loop vectorization attribute.
+
+// CHECK-LABEL: define internal void @test_teams_distribute_parallel_do_simd..omp_par
+// CHECK: ![[VAL:.*]] = !{!"llvm.loop.vectorize.enable", i1 true}
+
+omp.private {type = private} @_QFEi_private_i32 : i32
+llvm.func @test_teams_distribute_parallel_do_simd() {
+ %0 = llvm.mlir.constant(1 : i64) : i64
+ %1 = llvm.alloca %0 x i32 {bindc_name = "i"} : (i64) -> !llvm.ptr
+ %2 = llvm.mlir.constant(1000 : i32) : i32
+ %3 = llvm.mlir.constant(1 : i32) : i32
+ %4 = llvm.mlir.constant(1 : i64) : i64
+ omp.teams {
+ omp.parallel {
+ omp.distribute {
+ omp.wsloop {
+ omp.simd private(@_QFEi_private_i32 %1 -> %arg0 : !llvm.ptr) {
+ omp.loop_nest (%arg1) : i32 = (%3) to (%2) inclusive step (%3) {
+ llvm.store %arg1, %arg0 : i32, !llvm.ptr
+ omp.yield
+ }
+ } {omp.composite}
+ } {omp.composite}
+ } {omp.composite}
+ omp.terminator
+ } {omp.composite}
+ omp.terminator
+ }
+ llvm.return
+}
diff --git a/mlir/test/Target/LLVMIR/openmp-todo.mlir b/mlir/test/Target/LLVMIR/openmp-todo.mlir
index 97608ca3b4df1..c515e6e70b264 100644
--- a/mlir/test/Target/LLVMIR/openmp-todo.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-todo.mlir
@@ -26,20 +26,6 @@ llvm.func @atomic_hint(%v : !llvm.ptr, %x : !llvm.ptr, %expr : i32) {
// -----
-llvm.func @do_simd(%lb : i32, %ub : i32, %step : i32) {
- omp.wsloop {
- // expected-warning@below {{simd information on composite construct discarded}}
- omp.simd {
- omp.loop_nest (%iv) : i32 = (%lb) to (%ub) step (%step) {
- omp.yield
- }
- } {omp.composite}
- } {omp.composite}
- llvm.return
-}
-
-// -----
-
llvm.func @distribute_allocate(%lb : i32, %ub : i32, %step : i32, %x : !llvm.ptr) {
// expected-error@below {{not yet implemented: Unhandled clause allocate in omp.distribute operation}}
// expected-error@below {{LLVM Translation failed for operation: omp.distribute}}
|
@llvm/pr-subscribers-flang-openmp Author: Kajetan Puchalski (mrkajetanp) ChangesDespite currently being ignored with a warning, simd as a leaf in composite constructs behaves as expected when the construct does not contain a reduction. Enable it for those non-reduction constructs. Full diff: https://github.com/llvm/llvm-project/pull/146097.diff 4 Files Affected:
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 23140f22555a5..de51ca49649f9 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -2847,11 +2847,8 @@ convertOmpSimd(Operation &opInst, llvm::IRBuilderBase &builder,
llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
auto simdOp = cast<omp::SimdOp>(opInst);
- // TODO: Replace this with proper composite translation support.
- // Currently, simd information on composite constructs is ignored, so e.g.
- // 'do/for simd' will be treated the same as a standalone 'do/for'. This is
- // allowed by the spec, since it's equivalent to using a SIMD length of 1.
- if (simdOp.isComposite()) {
+ // TODO: Replace this once simd + reduction is properly supported
+ if (simdOp.isComposite() && simdOp.getReductionByref().has_value()) {
if (failed(convertIgnoredWrapper(simdOp, moduleTranslation)))
return failure();
diff --git a/mlir/test/Target/LLVMIR/openmp-parallel-do-simd.mlir b/mlir/test/Target/LLVMIR/openmp-parallel-do-simd.mlir
new file mode 100644
index 0000000000000..9ea0ff590144f
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-parallel-do-simd.mlir
@@ -0,0 +1,30 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+// Check that omp.simd as a leaf of a composite construct still generates
+// the appropriate loop vectorization attribute.
+
+// CHECK-LABEL: define internal void @test_parallel_do_simd..omp_par
+// CHECK: ![[VAL:.*]] = !{!"llvm.loop.vectorize.enable", i1 true}
+
+llvm.func @test_parallel_do_simd() {
+ %0 = llvm.mlir.constant(1 : i64) : i64
+ %1 = llvm.alloca %0 x i32 {bindc_name = "i"} : (i64) -> !llvm.ptr
+ %2 = llvm.mlir.constant(1000 : i32) : i32
+ %3 = llvm.mlir.constant(1 : i32) : i32
+ %4 = llvm.mlir.constant(1 : i64) : i64
+ omp.parallel {
+ %5 = llvm.mlir.constant(1 : i64) : i64
+ %6 = llvm.alloca %5 x i32 {bindc_name = "i", pinned} : (i64) -> !llvm.ptr
+ %7 = llvm.mlir.constant(1 : i64) : i64
+ omp.wsloop {
+ omp.simd {
+ omp.loop_nest (%arg0) : i32 = (%3) to (%2) inclusive step (%3) {
+ llvm.store %arg0, %6 : i32, !llvm.ptr
+ omp.yield
+ }
+ } {omp.composite}
+ } {omp.composite}
+ omp.terminator
+ }
+ llvm.return
+}
diff --git a/mlir/test/Target/LLVMIR/openmp-teams-distribute-parallel-do-simd.mlir b/mlir/test/Target/LLVMIR/openmp-teams-distribute-parallel-do-simd.mlir
new file mode 100644
index 0000000000000..108a99cf7c40f
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-teams-distribute-parallel-do-simd.mlir
@@ -0,0 +1,33 @@
+// RUN: mlir-translate --mlir-to-llvmir %s | FileCheck %s
+
+// Check that omp.simd as a leaf of a composite construct still generates
+// the appropriate loop vectorization attribute.
+
+// CHECK-LABEL: define internal void @test_teams_distribute_parallel_do_simd..omp_par
+// CHECK: ![[VAL:.*]] = !{!"llvm.loop.vectorize.enable", i1 true}
+
+omp.private {type = private} @_QFEi_private_i32 : i32
+llvm.func @test_teams_distribute_parallel_do_simd() {
+ %0 = llvm.mlir.constant(1 : i64) : i64
+ %1 = llvm.alloca %0 x i32 {bindc_name = "i"} : (i64) -> !llvm.ptr
+ %2 = llvm.mlir.constant(1000 : i32) : i32
+ %3 = llvm.mlir.constant(1 : i32) : i32
+ %4 = llvm.mlir.constant(1 : i64) : i64
+ omp.teams {
+ omp.parallel {
+ omp.distribute {
+ omp.wsloop {
+ omp.simd private(@_QFEi_private_i32 %1 -> %arg0 : !llvm.ptr) {
+ omp.loop_nest (%arg1) : i32 = (%3) to (%2) inclusive step (%3) {
+ llvm.store %arg1, %arg0 : i32, !llvm.ptr
+ omp.yield
+ }
+ } {omp.composite}
+ } {omp.composite}
+ } {omp.composite}
+ omp.terminator
+ } {omp.composite}
+ omp.terminator
+ }
+ llvm.return
+}
diff --git a/mlir/test/Target/LLVMIR/openmp-todo.mlir b/mlir/test/Target/LLVMIR/openmp-todo.mlir
index 97608ca3b4df1..c515e6e70b264 100644
--- a/mlir/test/Target/LLVMIR/openmp-todo.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-todo.mlir
@@ -26,20 +26,6 @@ llvm.func @atomic_hint(%v : !llvm.ptr, %x : !llvm.ptr, %expr : i32) {
// -----
-llvm.func @do_simd(%lb : i32, %ub : i32, %step : i32) {
- omp.wsloop {
- // expected-warning@below {{simd information on composite construct discarded}}
- omp.simd {
- omp.loop_nest (%iv) : i32 = (%lb) to (%ub) step (%step) {
- omp.yield
- }
- } {omp.composite}
- } {omp.composite}
- llvm.return
-}
-
-// -----
-
llvm.func @distribute_allocate(%lb : i32, %ub : i32, %step : i32, %x : !llvm.ptr) {
// expected-error@below {{not yet implemented: Unhandled clause allocate in omp.distribute operation}}
// expected-error@below {{LLVM Translation failed for operation: omp.distribute}}
|
Question for reviewers who have worked on this before - the current behaviour is fairly inconsistent, that is to say, using "simd reduction" on its own results in a crash because it's not implemented, whereas using "simd reduction" as part of a composite construct simply ignores the simd. Is it desirable to keep this behaviour? For instance, the following program will crash:
While this one will only emit a warning.
There are several tests that rely on this, which is why I kept the current warning for composite constructs with reduction. Additionally, if someone knows of any scenarios which I've overlooked that this might break for, please point them out. I was expecting something else to require fixing in order to do this, but I've not found anything that breaks so far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please could you elaborate on the testing you've done to be sure this is correctly supported.
I ran both gfortran and Fujitsu test suites (alongside the LLVM one) to make sure this does not introduce regressions in those. For a number of test Fortran programs using composite constructs with simd, I manually inspected the IR and the final binary to make sure that with this change, vectorisable instructions inside these constructs turn into vectorised instructions even with |
I believe the reason for not applying the "not yet implemented" message to simd reduction for composite constructs is that if the reduction clause is used on the composite construct, that reduction is logically applied to most of the constructs comprising the composite construct. So disabling simd reduction would disable reduction for any composite construct containing simd. SIMD is unusual because not implementing the reduction won't produce incorrect results (except some subtle effects around the re-presentable range of numbers) - it is just less likely to vectorize. I will post a patch implementing this soon so I don't think we need to worry. |
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
Outdated
Show resolved
Hide resolved
I was actually planning on proposing changes to this soon, so I can just explain from my side what the thinking behind some of these differences are. Currently, we entirely ignore simd within a composite construct, which means that we don't check whether there are unimplemented simd clauses being specified. In the case of standalone simd, though, there is some existing support, so there are checks for unimplemented features. The result of that is that The reason for ignoring How much of an issue would it be to always emit warnings instead of errors for all unsupported |
Despite currently being ignored with a warning, simd as a leaf in composite constructs behaves as expected when the construct does not contain a reduction. Enable it for those non-reduction constructs. Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Thank you for that background, the reasoning seems sensible to me.
Currently not supported clauses are reduction, if, linear, nontemporal and order based on my testing. Of those, with the change I just pushed, both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but please wait for approval from @skatrak
I think that's a reasonable middle ground. My concern is that this still results in different handling for That should only require small tweaks to the |
The current behaviour is that if we have simd with one of those clauses in a composite construct, then simd gets ignored along with the clauses. Are you saying we should do the same for standalone simd? That would effectively make standalone simd with one of said clauses a no-op. If so then it's very easy to implement, we can just achieve it with this:
|
What I'm suggesting is not making I'm fine with something else if this is not the behavior that we want. I'm just thinking that this would remove the current divergence between the handling of standalone and composite |
This replicates clang's implementation. Basically: - A private copy of the reduction variable is created, initialized to the reduction neutral value (using regions from the reduction declaration op). - The body of the loop is lowered as usual, with accesses to the reduction variable mapped to the private copy. - After the loop, we inline the reduction region from the declaration op to combine the privatized variable into the original variable. - As usual with the SIMD construct, attributes are added to encourage vectorization of the loop and to assert that memory accesses in the loop don't alias across iterations. I have verified that simple scalar examples do vectorize at -O3 and the tests I could find in the Fujitsu test suite produce correct results. I tested on top of llvm#146097 and this seemed to work for composite constructs as well. Fixes llvm#144290
I think that's a reasonable approach, but could we do that in a separate PR since it's a bit out of scope for this one? I'll be away for a week and there's this simd reduction PR that should ideally go on top of this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Co-authored-by: Sergio Afonso <safonsof@amd.com>
This replicates clang's implementation. Basically: - A private copy of the reduction variable is created, initialized to the reduction neutral value (using regions from the reduction declaration op). - The body of the loop is lowered as usual, with accesses to the reduction variable mapped to the private copy. - After the loop, we inline the reduction region from the declaration op to combine the privatized variable into the original variable. - As usual with the SIMD construct, attributes are added to encourage vectorization of the loop and to assert that memory accesses in the loop don't alias across iterations. I have verified that simple scalar examples do vectorize at -O3 and the tests I could find in the Fujitsu test suite produce correct results. I tested on top of #146097 and this seemed to work for composite constructs as well. Fixes #144290
Despite currently being ignored with a warning, simd as a leaf in composite constructs behaves as expected when the construct does not contain a reduction. Enable it for those non-reduction constructs.