Skip to content

[flang][OpenMP] Extend do concurrent mapping to multi-range loops #127634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ergawy
Copy link
Member

@ergawy ergawy commented Feb 18, 2025

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Feb 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 18, 2025

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

Adds support for converting mulit-range loops to OpenMP (on the host only for now). The changes here "prepare" a loop nest for collapsing by sinking iteration variables to the innermost fir.do_loop op in the nest.


Full diff: https://github.com/llvm/llvm-project/pull/127634.diff

3 Files Affected:

  • (modified) flang/docs/DoConcurrentConversionToOpenMP.md (+29)
  • (modified) flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp (+91)
  • (added) flang/test/Transforms/DoConcurrent/multiple_iteration_ranges.f90 (+72)
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md b/flang/docs/DoConcurrentConversionToOpenMP.md
index 914ace0813f0e..e7665a7751035 100644
--- a/flang/docs/DoConcurrentConversionToOpenMP.md
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -173,6 +173,35 @@ omp.parallel {
 
 <!-- TODO -->
 
+### Multi-range loops
+
+The pass currently supports multi-range loops as well. Given the following
+example:
+
+```fortran
+   do concurrent(i=1:n, j=1:m)
+       a(i,j) = i * j
+   end do
+```
+
+The generated `omp.loop_nest` operation look like:
+
+```
+omp.loop_nest (%arg0, %arg1)
+    : index = (%17, %19) to (%18, %20)
+    inclusive step (%c1_2, %c1_4) {
+  fir.store %arg0 to %private_i#1 : !fir.ref<i32>
+  fir.store %arg1 to %private_j#1 : !fir.ref<i32>
+  ...
+  omp.yield
+}
+```
+
+It is worth noting that we have privatized versions for both iteration
+variables: `i` and `j`. These are locally allocated inside the parallel/target
+OpenMP region similar to what the single-range example in previous section
+shows.
+
 <!--
 More details about current status will be added along with relevant parts of the
 implementation in later upstreaming patches.
diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
index dc797877ac87b..d86b9f822932d 100644
--- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
@@ -245,6 +245,96 @@ mlir::LogicalResult collectLoopNest(fir::DoLoopOp currentLoop,
 
   return mlir::success();
 }
+
+/// Prepares the `fir.do_loop` nest to be easily mapped to OpenMP. In
+/// particular, this function would take this input IR:
+/// ```
+/// fir.do_loop %i_iv = %i_lb to %i_ub step %i_step unordered {
+///   fir.store %i_iv to %i#1 : !fir.ref<i32>
+///   %j_lb = arith.constant 1 : i32
+///   %j_ub = arith.constant 10 : i32
+///   %j_step = arith.constant 1 : index
+///
+///   fir.do_loop %j_iv = %j_lb to %j_ub step %j_step unordered {
+///     fir.store %j_iv to %j#1 : !fir.ref<i32>
+///     ...
+///   }
+/// }
+/// ```
+///
+/// into the following form (using generic op form since the result is
+/// technically an invalid `fir.do_loop` op:
+///
+/// ```
+/// "fir.do_loop"(%i_lb, %i_ub, %i_step) <{unordered}> ({
+/// ^bb0(%i_iv: index):
+///   %j_lb = "arith.constant"() <{value = 1 : i32}> : () -> i32
+///   %j_ub = "arith.constant"() <{value = 10 : i32}> : () -> i32
+///   %j_step = "arith.constant"() <{value = 1 : index}> : () -> index
+///
+///   "fir.do_loop"(%j_lb, %j_ub, %j_step) <{unordered}> ({
+///   ^bb0(%new_i_iv: index, %new_j_iv: index):
+///     "fir.store"(%new_i_iv, %i#1) : (i32, !fir.ref<i32>) -> ()
+///     "fir.store"(%new_j_iv, %j#1) : (i32, !fir.ref<i32>) -> ()
+///     ...
+///   })
+/// ```
+///
+/// What happened to the loop nest is the following:
+///
+/// * the innermost loop's entry block was updated from having one operand to
+///   having `n` operands where `n` is the number of loops in the nest,
+///
+/// * the outer loop(s)' ops that update the IVs were sank inside the innermost
+///   loop (see the `"fir.store"(%new_i_iv, %i#1)` op above),
+///
+/// * the innermost loop's entry block's arguments were mapped in order from the
+///   outermost to the innermost IV.
+///
+/// With this IR change, we can directly inline the innermost loop's region into
+/// the newly generated `omp.loop_nest` op.
+///
+/// Note that this function has a pre-condition that \p loopNest consists of
+/// perfectly nested loops; i.e. there are no in-between ops between 2 nested
+/// loops except for the ops to setup the inner loop's LB, UB, and step. These
+/// ops are handled/cloned by `genLoopNestClauseOps(..)`.
+void sinkLoopIVArgs(mlir::ConversionPatternRewriter &rewriter,
+                    looputils::LoopNestToIndVarMap &loopNest) {
+  if (loopNest.size() <= 1)
+    return;
+
+  fir::DoLoopOp innermostLoop = loopNest.back().first;
+  mlir::Operation &innermostFirstOp = innermostLoop.getRegion().front().front();
+
+  llvm::SmallVector<mlir::Type> argTypes;
+  llvm::SmallVector<mlir::Location> argLocs;
+
+  for (auto &[doLoop, indVarInfo] : llvm::drop_end(loopNest)) {
+    // Sink the IV update ops to the innermost loop. We need to do for all loops
+    // except for the innermost one, hence the `drop_end` usage above.
+    for (mlir::Operation *op : indVarInfo.indVarUpdateOps)
+      op->moveBefore(&innermostFirstOp);
+
+    argTypes.push_back(doLoop.getInductionVar().getType());
+    argLocs.push_back(doLoop.getInductionVar().getLoc());
+  }
+
+  mlir::Region &innermmostRegion = innermostLoop.getRegion();
+  // Extend the innermost entry block with arguments to represent the outer IVs.
+  innermmostRegion.addArguments(argTypes, argLocs);
+
+  unsigned idx = 1;
+  // In reverse, remap the IVs of the loop nest from the old values to the new
+  // ones. We do that in reverse since the first argument before this loop is
+  // the old IV for the innermost loop. Therefore, we want to replace it first
+  // before the old value (1st argument in the block) is remapped to be the IV
+  // of the outermost loop in the nest.
+  for (auto &[doLoop, _] : llvm::reverse(loopNest)) {
+    doLoop.getInductionVar().replaceAllUsesWith(
+        innermmostRegion.getArgument(innermmostRegion.getNumArguments() - idx));
+    ++idx;
+  }
+}
 } // namespace looputils
 
 class DoConcurrentConversion : public mlir::OpConversionPattern<fir::DoLoopOp> {
@@ -267,6 +357,7 @@ class DoConcurrentConversion : public mlir::OpConversionPattern<fir::DoLoopOp> {
                         "Some `do concurent` loops are not perfectly-nested. "
                         "These will be serialzied.");
 
+    looputils::sinkLoopIVArgs(rewriter, loopNest);
     mlir::IRMapping mapper;
     genParallelOp(doLoop.getLoc(), rewriter, loopNest, mapper);
     mlir::omp::LoopNestOperands loopNestClauseOps;
diff --git a/flang/test/Transforms/DoConcurrent/multiple_iteration_ranges.f90 b/flang/test/Transforms/DoConcurrent/multiple_iteration_ranges.f90
new file mode 100644
index 0000000000000..232420fb07a75
--- /dev/null
+++ b/flang/test/Transforms/DoConcurrent/multiple_iteration_ranges.f90
@@ -0,0 +1,72 @@
+! Tests mapping of a `do concurrent` loop with multiple iteration ranges.
+
+! RUN: split-file %s %t
+
+! RUN: %flang_fc1 -emit-hlfir -fopenmp -fdo-concurrent-to-openmp=host %t/multi_range.f90 -o - \
+! RUN:   | FileCheck %s
+
+!--- multi_range.f90
+program main
+   integer, parameter :: n = 20
+   integer, parameter :: m = 40
+   integer, parameter :: l = 60
+   integer :: a(n, m, l)
+
+   do concurrent(i=3:n, j=5:m, k=7:l)
+       a(i,j,k) = i * j + k
+   end do
+end
+
+! CHECK: func.func @_QQmain
+
+! CHECK: %[[C3:.*]] = arith.constant 3 : i32
+! CHECK: %[[LB_I:.*]] = fir.convert %[[C3]] : (i32) -> index
+! CHECK: %[[C20:.*]] = arith.constant 20 : i32
+! CHECK: %[[UB_I:.*]] = fir.convert %[[C20]] : (i32) -> index
+! CHECK: %[[STEP_I:.*]] = arith.constant 1 : index
+
+! CHECK: %[[C5:.*]] = arith.constant 5 : i32
+! CHECK: %[[LB_J:.*]] = fir.convert %[[C5]] : (i32) -> index
+! CHECK: %[[C40:.*]] = arith.constant 40 : i32
+! CHECK: %[[UB_J:.*]] = fir.convert %[[C40]] : (i32) -> index
+! CHECK: %[[STEP_J:.*]] = arith.constant 1 : index
+
+! CHECK: %[[C7:.*]] = arith.constant 7 : i32
+! CHECK: %[[LB_K:.*]] = fir.convert %[[C7]] : (i32) -> index
+! CHECK: %[[C60:.*]] = arith.constant 60 : i32
+! CHECK: %[[UB_K:.*]] = fir.convert %[[C60]] : (i32) -> index
+! CHECK: %[[STEP_K:.*]] = arith.constant 1 : index
+
+! CHECK: omp.parallel {
+
+! CHECK-NEXT: %[[ITER_VAR_I:.*]] = fir.alloca i32 {bindc_name = "i"}
+! CHECK-NEXT: %[[BINDING_I:.*]]:2 = hlfir.declare %[[ITER_VAR_I]] {uniq_name = "_QFEi"}
+
+! CHECK-NEXT: %[[ITER_VAR_J:.*]] = fir.alloca i32 {bindc_name = "j"}
+! CHECK-NEXT: %[[BINDING_J:.*]]:2 = hlfir.declare %[[ITER_VAR_J]] {uniq_name = "_QFEj"}
+
+! CHECK-NEXT: %[[ITER_VAR_K:.*]] = fir.alloca i32 {bindc_name = "k"}
+! CHECK-NEXT: %[[BINDING_K:.*]]:2 = hlfir.declare %[[ITER_VAR_K]] {uniq_name = "_QFEk"}
+
+! CHECK: omp.wsloop {
+! CHECK-NEXT: omp.loop_nest
+! CHECK-SAME:   (%[[ARG0:[^[:space:]]+]], %[[ARG1:[^[:space:]]+]], %[[ARG2:[^[:space:]]+]])
+! CHECK-SAME:   : index = (%[[LB_I]], %[[LB_J]], %[[LB_K]])
+! CHECK-SAME:     to (%[[UB_I]], %[[UB_J]], %[[UB_K]]) inclusive
+! CHECK-SAME:     step (%[[STEP_I]], %[[STEP_J]], %[[STEP_K]]) {
+
+! CHECK-NEXT: %[[IV_IDX_I:.*]] = fir.convert %[[ARG0]]
+! CHECK-NEXT: fir.store %[[IV_IDX_I]] to %[[BINDING_I]]#1
+
+! CHECK-NEXT: %[[IV_IDX_J:.*]] = fir.convert %[[ARG1]]
+! CHECK-NEXT: fir.store %[[IV_IDX_J]] to %[[BINDING_J]]#1
+
+! CHECK-NEXT: %[[IV_IDX_K:.*]] = fir.convert %[[ARG2]]
+! CHECK-NEXT: fir.store %[[IV_IDX_K]] to %[[BINDING_K]]#1
+
+! CHECK:      omp.yield
+! CHECK-NEXT: }
+! CHECK-NEXT: }
+
+! CHECK-NEXT: omp.terminator
+! CHECK-NEXT: }

@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_3_basic_host_support branch from 0ecf2e2 to 06bf9bc Compare February 18, 2025 14:18
@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_4_multi_range_loops branch from 6d040c8 to 4c63b2a Compare February 18, 2025 14:19
@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_3_basic_host_support branch from 06bf9bc to a615d77 Compare February 20, 2025 15:46
@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_4_multi_range_loops branch from 4c63b2a to 40d1415 Compare February 20, 2025 15:52
Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Kareem, some small comments from me.

Comment on lines 108 to 94
/// Collects the op(s) responsible for updating a loop's iteration variable with
/// the current iteration number. For example, for the input IR:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems to do something more generic than that: it collects all of the ops that either take the loop's induction variable as argument or take a value as argument that has been calculated based on the result of another operation that directly or indirectly took the loop's induction variable as argument.

I guess that, similarly to another comment I left at a previous PR in the stack #127633 (comment), it's doing something more general than it states. If, like the other case, the idea is to just store the associated fir.convert and fir.store operations, perhaps it makes more sense to match that pattern specifically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified the function to match the current flang pattern. I will mark the above comments as resolved since they don't apply anymore.

@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_3_basic_host_support branch from a615d77 to d2e3c77 Compare March 4, 2025 07:27
@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_3_basic_host_support branch from 66ce019 to b50be98 Compare March 10, 2025 04:53
@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_4_multi_range_loops branch from 40d1415 to 090ea42 Compare March 10, 2025 04:54
@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_3_basic_host_support branch from fdf28a2 to 70979d8 Compare March 17, 2025 07:26
@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_4_multi_range_loops branch 3 times, most recently from f7322fc to 866276c Compare March 17, 2025 07:54
Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Kareem, LGTM!

/// ```
///
/// into the following form (using generic op form since the result is
/// technically an invalid `fir.do_loop` op:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// technically an invalid `fir.do_loop` op:
/// technically an invalid `fir.do_loop` op):

/// The operation allocating memory for iteration variable.
mlir::Operation *iterVarMemDef;
};
/// the operation(s) updating the iteration variable with the current
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// the operation(s) updating the iteration variable with the current
/// The operation(s) updating the iteration variable with the current

@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_4_multi_range_loops branch from 866276c to 7b60c5b Compare March 21, 2025 05:46
@ergawy
Copy link
Member Author

ergawy commented Mar 21, 2025

Again sorry, GH is acting weird!!

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 27, 2025
…126026)

This PR starts the effort to upstream AMD's internal implementation of `do concurrent` to OpenMP mapping. This replaces llvm#77285 since we extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current status downstream, the upstreaming status, and next steps to make this pass much more useful.

In addition to this document, this PR also contains the skeleton of the pass (no useful transformations are done yet) and some testing for the added command line options.

This looks like a huge PR but a lot of the added stuff is documentation.

It is also worth noting that the downstream pass has been validated on https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived performance speed-ups that match pure OpenMP, for GPU mapping we are still working on extending our support for implicit memory mapping and locality specifiers.

PR stack:
- llvm#126026 (this PR)
- llvm#127595
- llvm#127633
- llvm#127634
- llvm#127635
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 27, 2025
…27595)

Upstreams the next part of do concurrent to OpenMP mapping pass (from
AMD's ROCm implementation). See llvm#126026 for more context.

This PR add loop nest detection logic. This enables us to discover
muli-range do concurrent loops and then map them as "collapsed" loop
nests to OpenMP.

This is a follow up for llvm#126026, only the latest commit is relevant.

This is a replacement for llvm#127478 using a `/user/<username>/<branchname>` branch.

PR stack:
- llvm#126026
- llvm#127595 (this PR)
- llvm#127633
- llvm#127634
- llvm#127635
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 27, 2025
…ructs (llvm#127633)

Upstreams one more part of the ROCm `do concurrent` to OpenMP mapping pass. This PR add support for converting simple loops to the equivalent OpenMP constructs on the host: `omp parallel do`. Towards that end, we have to collect more information about loop nests for which we add new utils in the `looputils` name space.

PR stack:
- llvm#126026
- llvm#127595
- llvm#127633 (this PR)
- llvm#127634
- llvm#127635
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 27, 2025
…lvm#127634)

Adds support for converting mulit-range loops to OpenMP (on the host only for now). The changes here "prepare" a loop nest for collapsing by sinking iteration variables to the innermost `fir.do_loop` op in the nest.

PR stack:
- llvm#126026
- llvm#127595
- llvm#127633
- llvm#127634 (this PR)
- llvm#127635
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 27, 2025
…lvm#127635)

Extends `do concurrent` mapping to handle "loop-local values". A loop-local value is one that is used exclusively inside the loop but allocated outside of it. This usually corresponds to temporary values that are used inside the loop body for initialzing other variables for example. After collecting these values, the pass localizes them to the loop nest by moving their allocations.

PR stack:
- llvm#126026
- llvm#127595
- llvm#127633
- llvm#127634
- llvm#127635 (this PR)
ergawy added a commit that referenced this pull request Apr 2, 2025
This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.

This looks like a huge PR but a lot of the added stuff is documentation.

It is also worth noting that the downstream pass has been validated on
https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived
performance speed-ups that match pure OpenMP, for GPU mapping we are
still working on extending our support for implicit memory mapping and
locality specifiers.

PR stack:
- #126026 (this PR)
- #127595
- #127633
- #127634
- #127635
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 2, 2025
…ping (#126026)

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.

This looks like a huge PR but a lot of the added stuff is documentation.

It is also worth noting that the downstream pass has been validated on
https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived
performance speed-ups that match pure OpenMP, for GPU mapping we are
still working on extending our support for implicit memory mapping and
locality specifiers.

PR stack:
- llvm/llvm-project#126026 (this PR)
- llvm/llvm-project#127595
- llvm/llvm-project#127633
- llvm/llvm-project#127634
- llvm/llvm-project#127635
ergawy added a commit that referenced this pull request Apr 2, 2025
Upstreams the next part of do concurrent to OpenMP mapping pass (from
AMD's ROCm implementation). See
#126026 for more context.

This PR add loop nest detection logic. This enables us to discover
muli-range do concurrent loops and then map them as "collapsed" loop
nests to OpenMP.

This is a follow up for
#126026, only the latest commit
is relevant.

This is a replacement for
#127478 using a
`/user/<username>/<branchname>` branch.

PR stack:
- #126026
- #127595 (this PR)
- #127633
- #127634
- #127635
@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_3_basic_host_support branch from 25b36c6 to 0243c4f Compare April 2, 2025 08:14
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 2, 2025
…on. (#127595)

Upstreams the next part of do concurrent to OpenMP mapping pass (from
AMD's ROCm implementation). See
llvm/llvm-project#126026 for more context.

This PR add loop nest detection logic. This enables us to discover
muli-range do concurrent loops and then map them as "collapsed" loop
nests to OpenMP.

This is a follow up for
llvm/llvm-project#126026, only the latest commit
is relevant.

This is a replacement for
llvm/llvm-project#127478 using a
`/user/<username>/<branchname>` branch.

PR stack:
- llvm/llvm-project#126026
- llvm/llvm-project#127595 (this PR)
- llvm/llvm-project#127633
- llvm/llvm-project#127634
- llvm/llvm-project#127635
ergawy added a commit that referenced this pull request Apr 2, 2025
…ructs (#127633)

Upstreams one more part of the ROCm `do concurrent` to OpenMP mapping
pass. This PR add support for converting simple loops to the equivalent
OpenMP constructs on the host: `omp parallel do`. Towards that end, we
have to collect more information about loop nests for which we add new
utils in the `looputils` name space.

PR stack:
- #126026
- #127595
- #127633 (this PR)
- #127634
- #127635
Base automatically changed from users/ergawy/upstream_do_concurrent_3_basic_host_support to main April 2, 2025 09:27
ergawy added 2 commits April 2, 2025 04:28
Adds support for converting mulit-range loops to OpenMP (on the host
only for now). The changes here "prepare" a loop nest for collapsing by
sinking iteration variables to the innermost `fir.do_loop` op in the
nest.
@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_4_multi_range_loops branch from 7b60c5b to 629305b Compare April 2, 2025 09:33
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 2, 2025
… host constructs (#127633)

Upstreams one more part of the ROCm `do concurrent` to OpenMP mapping
pass. This PR add support for converting simple loops to the equivalent
OpenMP constructs on the host: `omp parallel do`. Towards that end, we
have to collect more information about loop nests for which we add new
utils in the `looputils` name space.

PR stack:
- llvm/llvm-project#126026
- llvm/llvm-project#127595
- llvm/llvm-project#127633 (this PR)
- llvm/llvm-project#127634
- llvm/llvm-project#127635
@ergawy
Copy link
Member Author

ergawy commented Apr 2, 2025

Merging since the only remaining check is the Windows pre-merge check and this has been stuck for a long time (tried restarting the check and still gets stuck).

@ergawy ergawy merged commit ef56b53 into main Apr 2, 2025
11 of 12 checks passed
@ergawy ergawy deleted the users/ergawy/upstream_do_concurrent_4_multi_range_loops branch April 2, 2025 10:43
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 2, 2025
…nge loops (#127634)

Adds support for converting mulit-range loops to OpenMP (on the host
only for now). The changes here "prepare" a loop nest for collapsing by
sinking iteration variables to the innermost `fir.do_loop` op in the
nest.

PR stack:
- llvm/llvm-project#126026
- llvm/llvm-project#127595
- llvm/llvm-project#127633
- llvm/llvm-project#127634 (this PR)
- llvm/llvm-project#127635
ergawy added a commit that referenced this pull request Apr 2, 2025
…127635)

Extends `do concurrent` mapping to handle "loop-local values". A
loop-local value is one that is used exclusively inside the loop but
allocated outside of it. This usually corresponds to temporary values
that are used inside the loop body for initialzing other variables for
example. After collecting these values, the pass localizes them to the
loop nest by moving their allocations.

PR stack:
- #126026
- #127595
- #127633
- #127634
- #127635 (this PR)
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 2, 2025
…nt` nests (#127635)

Extends `do concurrent` mapping to handle "loop-local values". A
loop-local value is one that is used exclusively inside the loop but
allocated outside of it. This usually corresponds to temporary values
that are used inside the loop body for initialzing other variables for
example. After collecting these values, the pass localizes them to the
loop nest by moving their allocations.

PR stack:
- llvm/llvm-project#126026
- llvm/llvm-project#127595
- llvm/llvm-project#127633
- llvm/llvm-project#127634
- llvm/llvm-project#127635 (this PR)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants