Skip to content

[flang][OpenMP] Handle "loop-local values" in do concurrent nests #127635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 2, 2025

Conversation

ergawy
Copy link
Member

@ergawy ergawy commented Feb 18, 2025

Extends do concurrent mapping to handle "loop-local values". A loop-local value is one that is used exclusively inside the loop but allocated outside of it. This usually corresponds to temporary values that are used inside the loop body for initialzing other variables for example. After collecting these values, the pass localizes them to the loop nest by moving their allocations.

PR stack:

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Feb 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 18, 2025

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

Extends do concurrent mapping to handle "loop-local values". A loop-local value is one that is used exclusively inside the loop but allocated outside of it. This usually corresponds to temporary values that are used inside the loop body for initialzing other variables for example. After collecting these values, the pass localizes them to the loop nest by moving their allocations.


Full diff: https://github.com/llvm/llvm-project/pull/127635.diff

3 Files Affected:

  • (modified) flang/docs/DoConcurrentConversionToOpenMP.md (+51)
  • (modified) flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp (+67-1)
  • (added) flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90 (+62)
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md b/flang/docs/DoConcurrentConversionToOpenMP.md
index e7665a7751035..66e12ebc021a5 100644
--- a/flang/docs/DoConcurrentConversionToOpenMP.md
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -202,6 +202,57 @@ variables: `i` and `j`. These are locally allocated inside the parallel/target
 OpenMP region similar to what the single-range example in previous section
 shows.
 
+### Data environment
+
+By default, variables that are used inside a `do concurrent` loop nest are
+either treated as `shared` in case of mapping to `host`, or mapped into the
+`target` region using a `map` clause in case of mapping to `device`. The only
+exceptions to this are:
+  1. the loop's iteration variable(s) (IV) of **perfect** loop nests. In that
+     case, for each IV, we allocate a local copy as shown by the mapping
+     examples above.
+  1. any values that are from allocations outside the loop nest and used
+     exclusively inside of it. In such cases, a local privatized
+     copy is created in the OpenMP region to prevent multiple teams of threads
+     from accessing and destroying the same memory block, which causes runtime
+     issues. For an example of such cases, see
+     `flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90`.
+
+Implicit mapping detection (for mapping to the target device) is still quite
+limited and work to make it smarter is underway for both OpenMP in general 
+and `do concurrent` mapping.
+
+#### Non-perfectly-nested loops' IVs
+
+For non-perfectly-nested loops, the IVs are still treated as `shared` or
+`map` entries as pointed out above. This **might not** be consistent with what
+the Fortran specification tells us. In particular, taking the following
+snippets from the spec (version 2023) into account:
+
+> § 3.35
+> ------
+> construct entity
+> entity whose identifier has the scope of a construct
+
+> § 19.4
+> ------
+>  A variable that appears as an index-name in a FORALL or DO CONCURRENT
+>  construct [...] is a construct entity. A variable that has LOCAL or
+>  LOCAL_INIT locality in a DO CONCURRENT construct is a construct entity.
+> [...]
+> The name of a variable that appears as an index-name in a DO CONCURRENT
+> construct, FORALL statement, or FORALL construct has a scope of the statement
+> or construct. A variable that has LOCAL or LOCAL_INIT locality in a DO
+> CONCURRENT construct has the scope of that construct.
+
+From the above quotes, it seems there is an equivalence between the IV of a `do
+concurrent` loop and a variable with a `LOCAL` locality specifier (equivalent
+to OpenMP's `private` clause). Which means that we should probably
+localize/privatize a `do concurrent` loop's IV even if it is not perfectly
+nested in the nest we are parallelizing. For now, however, we **do not** do
+that as pointed out previously. In the near future, we propose a middle-ground
+solution (see the Next steps section for more details).
+
 <!--
 More details about current status will be added along with relevant parts of the
 implementation in later upstreaming patches.
diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
index d86b9f822932d..ec39cd066796e 100644
--- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
@@ -335,6 +335,64 @@ void sinkLoopIVArgs(mlir::ConversionPatternRewriter &rewriter,
     ++idx;
   }
 }
+
+/// Collects values that are local to a loop: "loop-local values". A loop-local
+/// value is one that is used exclusively inside the loop but allocated outside
+/// of it. This usually corresponds to temporary values that are used inside the
+/// loop body for initialzing other variables for example.
+///
+/// See `flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90` for an
+/// example of why we need this.
+///
+/// \param [in] doLoop - the loop within which the function searches for values
+/// used exclusively inside.
+///
+/// \param [out] locals - the list of loop-local values detected for \p doLoop.
+void collectLoopLocalValues(fir::DoLoopOp doLoop,
+                            llvm::SetVector<mlir::Value> &locals) {
+  doLoop.walk([&](mlir::Operation *op) {
+    for (mlir::Value operand : op->getOperands()) {
+      if (locals.contains(operand))
+        continue;
+
+      bool isLocal = true;
+
+      if (!mlir::isa_and_present<fir::AllocaOp>(operand.getDefiningOp()))
+        continue;
+
+      // Values defined inside the loop are not interesting since they do not
+      // need to be localized.
+      if (doLoop->isAncestor(operand.getDefiningOp()))
+        continue;
+
+      for (auto *user : operand.getUsers()) {
+        if (!doLoop->isAncestor(user)) {
+          isLocal = false;
+          break;
+        }
+      }
+
+      if (isLocal)
+        locals.insert(operand);
+    }
+  });
+}
+
+/// For a "loop-local" value \p local within a loop's scope, localizes that
+/// value within the scope of the parallel region the loop maps to. Towards that
+/// end, this function moves the allocation of \p local within \p allocRegion.
+///
+/// \param local - the value used exclusively within a loop's scope (see
+/// collectLoopLocalValues).
+///
+/// \param allocRegion - the parallel region where \p local's allocation will be
+/// privatized.
+///
+/// \param rewriter - builder used for updating \p allocRegion.
+static void localizeLoopLocalValue(mlir::Value local, mlir::Region &allocRegion,
+                                   mlir::ConversionPatternRewriter &rewriter) {
+  rewriter.moveOpBefore(local.getDefiningOp(), &allocRegion.front().front());
+}
 } // namespace looputils
 
 class DoConcurrentConversion : public mlir::OpConversionPattern<fir::DoLoopOp> {
@@ -357,13 +415,21 @@ class DoConcurrentConversion : public mlir::OpConversionPattern<fir::DoLoopOp> {
                         "Some `do concurent` loops are not perfectly-nested. "
                         "These will be serialzied.");
 
+    llvm::SetVector<mlir::Value> locals;
+    looputils::collectLoopLocalValues(loopNest.back().first, locals);
     looputils::sinkLoopIVArgs(rewriter, loopNest);
+
     mlir::IRMapping mapper;
-    genParallelOp(doLoop.getLoc(), rewriter, loopNest, mapper);
+    mlir::omp::ParallelOp parallelOp =
+        genParallelOp(doLoop.getLoc(), rewriter, loopNest, mapper);
     mlir::omp::LoopNestOperands loopNestClauseOps;
     genLoopNestClauseOps(doLoop.getLoc(), rewriter, loopNest, mapper,
                          loopNestClauseOps);
 
+    for (mlir::Value local : locals)
+      looputils::localizeLoopLocalValue(local, parallelOp.getRegion(),
+                                        rewriter);
+
     mlir::omp::LoopNestOp ompLoopNest =
         genWsLoopOp(rewriter, loopNest.back().first, mapper, loopNestClauseOps,
                     /*isComposite=*/mapToDevice);
diff --git a/flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90 b/flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90
new file mode 100644
index 0000000000000..f82696669eca6
--- /dev/null
+++ b/flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90
@@ -0,0 +1,62 @@
+! Tests that "loop-local values" are properly handled by localizing them to the
+! body of the loop nest. See `collectLoopLocalValues` and `localizeLoopLocalValue`
+! for a definition of "loop-local values" and how they are handled.
+
+! RUN: %flang_fc1 -emit-hlfir -fopenmp -fdo-concurrent-to-openmp=host %s -o - \
+! RUN:   | FileCheck %s
+module struct_mod
+    type test_struct
+        integer, allocatable :: x_
+    end type
+
+    interface test_struct
+        pure module function construct_from_components(x) result(struct)
+            implicit none
+            integer, intent(in) :: x
+            type(test_struct) struct
+        end function
+    end interface
+end module
+
+submodule(struct_mod) struct_sub
+    implicit none
+
+contains
+    module procedure construct_from_components
+        struct%x_ = x
+    end procedure
+end submodule struct_sub
+
+program main
+    use struct_mod, only : test_struct
+
+    implicit none
+    type(test_struct), dimension(10) :: a
+    integer :: i
+    integer :: total
+
+    do concurrent (i=1:10)
+        a(i) = test_struct(i)
+    end do
+
+    do i=1,10
+        total = total + a(i)%x_
+    end do
+
+    print *, "total =", total
+end program main
+
+! CHECK: omp.parallel {
+! CHECK:   %[[LOCAL_TEMP:.*]] = fir.alloca !fir.type<_QMstruct_modTtest_struct{x_:!fir.box<!fir.heap<i32>>}> {bindc_name = ".result"}
+! CHECK:   omp.wsloop {
+! CHECK:     omp.loop_nest {{.*}} {
+! CHECK:       %[[TEMP_VAL:.*]] = fir.call @_QMstruct_modPconstruct_from_components
+! CHECK:       fir.save_result %[[TEMP_VAL]] to %[[LOCAL_TEMP]]
+! CHECK:       %[[EMBOXED_LOCAL:.*]] = fir.embox %[[LOCAL_TEMP]]
+! CHECK:       %[[CONVERTED_LOCAL:.*]] = fir.convert %[[EMBOXED_LOCAL]]
+! CHECK:       fir.call @_FortranADestroy(%[[CONVERTED_LOCAL]])
+! CHECK:       omp.yield
+! CHECK:     }
+! CHECK:   }
+! CHECK:   omp.terminator
+! CHECK: }

@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_4_multi_range_loops branch from 6d040c8 to 4c63b2a Compare February 18, 2025 14:19
@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_5_local_values branch from 99369b7 to 3d1c2e6 Compare February 18, 2025 14:19
@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_4_multi_range_loops branch from 4c63b2a to 40d1415 Compare February 20, 2025 15:52
@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_5_local_values branch from 3d1c2e6 to 2d9fb8f Compare February 20, 2025 15:53
Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of nits, but LGTM otherwise. Thank you!

/// Collects values that are local to a loop: "loop-local values". A loop-local
/// value is one that is used exclusively inside the loop but allocated outside
/// of it. This usually corresponds to temporary values that are used inside the
/// loop body for initialzing other variables for example.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// loop body for initialzing other variables for example.
/// loop body for initializing other variables for example.

Comment on lines +2 to +3
! body of the loop nest. See `collectLoopLocalValues` and `localizeLoopLocalValue`
! for a definition of "loop-local values" and how they are handled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe it's better to just generally point at the DoConcurrentConversion pass for more information, since this comment will easily get out of sync of the actual implementation otherwise.

Comment on lines +394 to +354
for (auto *user : operand.getUsers()) {
if (!doLoop->isAncestor(user)) {
isLocal = false;
break;
}
}

if (isLocal)
locals.insert(operand);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think something like this might be a bit more concise, but feel free to disagree. In that case, the isLocal declaration might be good to move it closer to the loop.

Suggested change
for (auto *user : operand.getUsers()) {
if (!doLoop->isAncestor(user)) {
isLocal = false;
break;
}
}
if (isLocal)
locals.insert(operand);
auto users = operand.getUsers();
if (llvm::find_if(users, [&](mlir::Operation *user) { return !doLoop->isAncestor(user); }) == users.end())
locals.insert(operand);

examples above.
1. any values that are from allocations outside the loop nest and used
exclusively inside of it. In such cases, a local privatized
copy is created in the OpenMP region to prevent multiple teams of threads
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: In the OpenMP parallel region?

Comment on lines +221 to +223
Implicit mapping detection (for mapping to the target device) is still quite
limited and work to make it smarter is underway for both OpenMP in general
and `do concurrent` mapping.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: There's no mapping support at this stage, so maybe state that to avoid misleading anyone reading it.

@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_4_multi_range_loops branch from 40d1415 to 090ea42 Compare March 10, 2025 04:54
@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_5_local_values branch from 2d9fb8f to 1adeaa2 Compare March 10, 2025 04:55
@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_4_multi_range_loops branch 2 times, most recently from f7322fc to 866276c Compare March 17, 2025 07:54
@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_5_local_values branch 2 times, most recently from caa2a30 to 6321731 Compare March 21, 2025 05:46
@ergawy
Copy link
Member Author

ergawy commented Mar 21, 2025

I have no idea how the PR requested reviews from all these poeple!!! Sorry for the noise.

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 27, 2025
…126026)

This PR starts the effort to upstream AMD's internal implementation of `do concurrent` to OpenMP mapping. This replaces llvm#77285 since we extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current status downstream, the upstreaming status, and next steps to make this pass much more useful.

In addition to this document, this PR also contains the skeleton of the pass (no useful transformations are done yet) and some testing for the added command line options.

This looks like a huge PR but a lot of the added stuff is documentation.

It is also worth noting that the downstream pass has been validated on https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived performance speed-ups that match pure OpenMP, for GPU mapping we are still working on extending our support for implicit memory mapping and locality specifiers.

PR stack:
- llvm#126026 (this PR)
- llvm#127595
- llvm#127633
- llvm#127634
- llvm#127635
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 27, 2025
…27595)

Upstreams the next part of do concurrent to OpenMP mapping pass (from
AMD's ROCm implementation). See llvm#126026 for more context.

This PR add loop nest detection logic. This enables us to discover
muli-range do concurrent loops and then map them as "collapsed" loop
nests to OpenMP.

This is a follow up for llvm#126026, only the latest commit is relevant.

This is a replacement for llvm#127478 using a `/user/<username>/<branchname>` branch.

PR stack:
- llvm#126026
- llvm#127595 (this PR)
- llvm#127633
- llvm#127634
- llvm#127635
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 27, 2025
…ructs (llvm#127633)

Upstreams one more part of the ROCm `do concurrent` to OpenMP mapping pass. This PR add support for converting simple loops to the equivalent OpenMP constructs on the host: `omp parallel do`. Towards that end, we have to collect more information about loop nests for which we add new utils in the `looputils` name space.

PR stack:
- llvm#126026
- llvm#127595
- llvm#127633 (this PR)
- llvm#127634
- llvm#127635
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 27, 2025
…lvm#127634)

Adds support for converting mulit-range loops to OpenMP (on the host only for now). The changes here "prepare" a loop nest for collapsing by sinking iteration variables to the innermost `fir.do_loop` op in the nest.

PR stack:
- llvm#126026
- llvm#127595
- llvm#127633
- llvm#127634 (this PR)
- llvm#127635
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 27, 2025
…lvm#127635)

Extends `do concurrent` mapping to handle "loop-local values". A loop-local value is one that is used exclusively inside the loop but allocated outside of it. This usually corresponds to temporary values that are used inside the loop body for initialzing other variables for example. After collecting these values, the pass localizes them to the loop nest by moving their allocations.

PR stack:
- llvm#126026
- llvm#127595
- llvm#127633
- llvm#127634
- llvm#127635 (this PR)
ergawy added a commit that referenced this pull request Apr 2, 2025
This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.

This looks like a huge PR but a lot of the added stuff is documentation.

It is also worth noting that the downstream pass has been validated on
https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived
performance speed-ups that match pure OpenMP, for GPU mapping we are
still working on extending our support for implicit memory mapping and
locality specifiers.

PR stack:
- #126026 (this PR)
- #127595
- #127633
- #127634
- #127635
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 2, 2025
…ping (#126026)

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.

This looks like a huge PR but a lot of the added stuff is documentation.

It is also worth noting that the downstream pass has been validated on
https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived
performance speed-ups that match pure OpenMP, for GPU mapping we are
still working on extending our support for implicit memory mapping and
locality specifiers.

PR stack:
- llvm/llvm-project#126026 (this PR)
- llvm/llvm-project#127595
- llvm/llvm-project#127633
- llvm/llvm-project#127634
- llvm/llvm-project#127635
ergawy added a commit that referenced this pull request Apr 2, 2025
Upstreams the next part of do concurrent to OpenMP mapping pass (from
AMD's ROCm implementation). See
#126026 for more context.

This PR add loop nest detection logic. This enables us to discover
muli-range do concurrent loops and then map them as "collapsed" loop
nests to OpenMP.

This is a follow up for
#126026, only the latest commit
is relevant.

This is a replacement for
#127478 using a
`/user/<username>/<branchname>` branch.

PR stack:
- #126026
- #127595 (this PR)
- #127633
- #127634
- #127635
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 2, 2025
…on. (#127595)

Upstreams the next part of do concurrent to OpenMP mapping pass (from
AMD's ROCm implementation). See
llvm/llvm-project#126026 for more context.

This PR add loop nest detection logic. This enables us to discover
muli-range do concurrent loops and then map them as "collapsed" loop
nests to OpenMP.

This is a follow up for
llvm/llvm-project#126026, only the latest commit
is relevant.

This is a replacement for
llvm/llvm-project#127478 using a
`/user/<username>/<branchname>` branch.

PR stack:
- llvm/llvm-project#126026
- llvm/llvm-project#127595 (this PR)
- llvm/llvm-project#127633
- llvm/llvm-project#127634
- llvm/llvm-project#127635
ergawy added a commit that referenced this pull request Apr 2, 2025
…ructs (#127633)

Upstreams one more part of the ROCm `do concurrent` to OpenMP mapping
pass. This PR add support for converting simple loops to the equivalent
OpenMP constructs on the host: `omp parallel do`. Towards that end, we
have to collect more information about loop nests for which we add new
utils in the `looputils` name space.

PR stack:
- #126026
- #127595
- #127633 (this PR)
- #127634
- #127635
@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_4_multi_range_loops branch from 7b60c5b to 629305b Compare April 2, 2025 09:33
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 2, 2025
… host constructs (#127633)

Upstreams one more part of the ROCm `do concurrent` to OpenMP mapping
pass. This PR add support for converting simple loops to the equivalent
OpenMP constructs on the host: `omp parallel do`. Towards that end, we
have to collect more information about loop nests for which we add new
utils in the `looputils` name space.

PR stack:
- llvm/llvm-project#126026
- llvm/llvm-project#127595
- llvm/llvm-project#127633 (this PR)
- llvm/llvm-project#127634
- llvm/llvm-project#127635
ergawy added a commit that referenced this pull request Apr 2, 2025
…127634)

Adds support for converting mulit-range loops to OpenMP (on the host
only for now). The changes here "prepare" a loop nest for collapsing by
sinking iteration variables to the innermost `fir.do_loop` op in the
nest.

PR stack:
- #126026
- #127595
- #127633
- #127634 (this PR)
- #127635
Base automatically changed from users/ergawy/upstream_do_concurrent_4_multi_range_loops to main April 2, 2025 10:43
@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_5_local_values branch from 6321731 to 7db8234 Compare April 2, 2025 10:48
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 2, 2025
…nge loops (#127634)

Adds support for converting mulit-range loops to OpenMP (on the host
only for now). The changes here "prepare" a loop nest for collapsing by
sinking iteration variables to the innermost `fir.do_loop` op in the
nest.

PR stack:
- llvm/llvm-project#126026
- llvm/llvm-project#127595
- llvm/llvm-project#127633
- llvm/llvm-project#127634 (this PR)
- llvm/llvm-project#127635
Extends `do concurrent` mapping to handle "loop-local values". A loop-local
value is one that is used exclusively inside the loop but allocated outside
of it. This usually corresponds to temporary values that are used inside the
loop body for initialzing other variables for example. After collecting these
values, the pass localizes them to the loop nest by moving their allocations.
@ergawy ergawy force-pushed the users/ergawy/upstream_do_concurrent_5_local_values branch from 7db8234 to 31cfcd3 Compare April 2, 2025 12:06
@ergawy
Copy link
Member Author

ergawy commented Apr 2, 2025

Premerge checks stuck for more than 1.5 hours. Merging since I restarted 2 times already.

@ergawy ergawy merged commit de6c909 into main Apr 2, 2025
10 of 12 checks passed
@ergawy ergawy deleted the users/ergawy/upstream_do_concurrent_5_local_values branch April 2, 2025 13:43
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 2, 2025
…nt` nests (#127635)

Extends `do concurrent` mapping to handle "loop-local values". A
loop-local value is one that is used exclusively inside the loop but
allocated outside of it. This usually corresponds to temporary values
that are used inside the loop body for initialzing other variables for
example. After collecting these values, the pass localizes them to the
loop nest by moving their allocations.

PR stack:
- llvm/llvm-project#126026
- llvm/llvm-project#127595
- llvm/llvm-project#127633
- llvm/llvm-project#127634
- llvm/llvm-project#127635 (this PR)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants