Skip to content

Commit bb02e5f

Browse files
committed
[flang][OpenMP] Handle "loop-local values" in do concurrent nests (llvm#127635)
Extends `do concurrent` mapping to handle "loop-local values". A loop-local value is one that is used exclusively inside the loop but allocated outside of it. This usually corresponds to temporary values that are used inside the loop body for initialzing other variables for example. After collecting these values, the pass localizes them to the loop nest by moving their allocations. PR stack: - llvm#126026 - llvm#127595 - llvm#127633 - llvm#127634 - llvm#127635 (this PR)
1 parent e5f9817 commit bb02e5f

File tree

2 files changed

+56
-2
lines changed

2 files changed

+56
-2
lines changed

flang/docs/DoConcurrentConversionToOpenMP.md

+51
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,57 @@ variables: `i` and `j`. These are locally allocated inside the parallel/target
202202
OpenMP region similar to what the single-range example in previous section
203203
shows.
204204

205+
### Data environment
206+
207+
By default, variables that are used inside a `do concurrent` loop nest are
208+
either treated as `shared` in case of mapping to `host`, or mapped into the
209+
`target` region using a `map` clause in case of mapping to `device`. The only
210+
exceptions to this are:
211+
1. the loop's iteration variable(s) (IV) of **perfect** loop nests. In that
212+
case, for each IV, we allocate a local copy as shown by the mapping
213+
examples above.
214+
1. any values that are from allocations outside the loop nest and used
215+
exclusively inside of it. In such cases, a local privatized
216+
copy is created in the OpenMP region to prevent multiple teams of threads
217+
from accessing and destroying the same memory block, which causes runtime
218+
issues. For an example of such cases, see
219+
`flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90`.
220+
221+
Implicit mapping detection (for mapping to the target device) is still quite
222+
limited and work to make it smarter is underway for both OpenMP in general
223+
and `do concurrent` mapping.
224+
225+
#### Non-perfectly-nested loops' IVs
226+
227+
For non-perfectly-nested loops, the IVs are still treated as `shared` or
228+
`map` entries as pointed out above. This **might not** be consistent with what
229+
the Fortran specification tells us. In particular, taking the following
230+
snippets from the spec (version 2023) into account:
231+
232+
> § 3.35
233+
> ------
234+
> construct entity
235+
> entity whose identifier has the scope of a construct
236+
237+
> § 19.4
238+
> ------
239+
> A variable that appears as an index-name in a FORALL or DO CONCURRENT
240+
> construct [...] is a construct entity. A variable that has LOCAL or
241+
> LOCAL_INIT locality in a DO CONCURRENT construct is a construct entity.
242+
> [...]
243+
> The name of a variable that appears as an index-name in a DO CONCURRENT
244+
> construct, FORALL statement, or FORALL construct has a scope of the statement
245+
> or construct. A variable that has LOCAL or LOCAL_INIT locality in a DO
246+
> CONCURRENT construct has the scope of that construct.
247+
248+
From the above quotes, it seems there is an equivalence between the IV of a `do
249+
concurrent` loop and a variable with a `LOCAL` locality specifier (equivalent
250+
to OpenMP's `private` clause). Which means that we should probably
251+
localize/privatize a `do concurrent` loop's IV even if it is not perfectly
252+
nested in the nest we are parallelizing. For now, however, we **do not** do
253+
that as pointed out previously. In the near future, we propose a middle-ground
254+
solution (see the Next steps section for more details).
255+
205256
<!--
206257
More details about current status will be added along with relevant parts of the
207258
implementation in later upstreaming patches.

flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp

+5-2
Original file line numberDiff line numberDiff line change
@@ -649,12 +649,15 @@ void sinkLoopIVArgs(mlir::ConversionPatternRewriter &rewriter,
649649
/// of it. This usually corresponds to temporary values that are used inside the
650650
/// loop body for initialzing other variables for example.
651651
///
652+
/// See `flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90` for an
653+
/// example of why we need this.
654+
///
652655
/// \param [in] doLoop - the loop within which the function searches for values
653656
/// used exclusively inside.
654657
///
655658
/// \param [out] locals - the list of loop-local values detected for \p doLoop.
656-
static void collectLoopLocalValues(fir::DoLoopOp doLoop,
657-
llvm::SetVector<mlir::Value> &locals) {
659+
void collectLoopLocalValues(fir::DoLoopOp doLoop,
660+
llvm::SetVector<mlir::Value> &locals) {
658661
doLoop.walk([&](mlir::Operation *op) {
659662
for (mlir::Value operand : op->getOperands()) {
660663
if (locals.contains(operand))

0 commit comments

Comments
 (0)