Skip to content

[flang][OpenMP] Lower target .. private(..) to omp.private ops #94195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 7, 2024

Conversation

ergawy
Copy link
Member

@ergawy ergawy commented Jun 3, 2024

Extends delayed privatization support to taraget .. private(..). With this PR, private is support for target only is delayed privatization mode.

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels Jun 3, 2024
@llvmbot
Copy link
Member

llvmbot commented Jun 3, 2024

@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-flang-openmp

Author: Kareem Ergawy (ergawy)

Changes

Extends delayed privatization support to taraget .. private(..). With this PR, private is support for target only is delayed privatization mode.


Full diff: https://github.com/llvm/llvm-project/pull/94195.diff

5 Files Affected:

  • (modified) flang/lib/Lower/OpenMP/DataSharingProcessor.cpp (+6-12)
  • (modified) flang/lib/Lower/OpenMP/DataSharingProcessor.h (+9-9)
  • (modified) flang/lib/Lower/OpenMP/OpenMP.cpp (+59-12)
  • (added) flang/test/Lower/OpenMP/DelayedPrivatization/target-private-allocatable.f90 (+69)
  • (added) flang/test/Lower/OpenMP/DelayedPrivatization/target-private-simple.f90 (+38)
diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index 557a9685024c5..d8421b00cbd50 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -48,14 +48,13 @@ DataSharingProcessor::DataSharingProcessor(
 }
 
 void DataSharingProcessor::processStep1(
-    mlir::omp::PrivateClauseOps *clauseOps,
-    llvm::SmallVectorImpl<const semantics::Symbol *> *privateSyms) {
+    mlir::omp::PrivateClauseOps *clauseOps) {
   collectSymbolsForPrivatization();
   collectDefaultSymbols();
   collectImplicitSymbols();
   collectPreDeterminedSymbols();
 
-  privatize(clauseOps, privateSyms);
+  privatize(clauseOps);
 
   insertBarrier();
 }
@@ -416,15 +415,14 @@ void DataSharingProcessor::collectPreDeterminedSymbols() {
 }
 
 void DataSharingProcessor::privatize(
-    mlir::omp::PrivateClauseOps *clauseOps,
-    llvm::SmallVectorImpl<const semantics::Symbol *> *privateSyms) {
+    mlir::omp::PrivateClauseOps *clauseOps) {
   for (const semantics::Symbol *sym : allPrivatizedSymbols) {
     if (const auto *commonDet =
             sym->detailsIf<semantics::CommonBlockDetails>()) {
       for (const auto &mem : commonDet->objects())
-        doPrivatize(&*mem, clauseOps, privateSyms);
+        doPrivatize(&*mem, clauseOps);
     } else
-      doPrivatize(sym, clauseOps, privateSyms);
+      doPrivatize(sym, clauseOps);
   }
 }
 
@@ -442,8 +440,7 @@ void DataSharingProcessor::copyLastPrivatize(mlir::Operation *op) {
 }
 
 void DataSharingProcessor::doPrivatize(
-    const semantics::Symbol *sym, mlir::omp::PrivateClauseOps *clauseOps,
-    llvm::SmallVectorImpl<const semantics::Symbol *> *privateSyms) {
+    const semantics::Symbol *sym, mlir::omp::PrivateClauseOps *clauseOps) {
   if (!useDelayedPrivatization) {
     cloneSymbol(sym);
     copyFirstPrivateSymbol(sym);
@@ -548,9 +545,6 @@ void DataSharingProcessor::doPrivatize(
     clauseOps->privateVars.push_back(hsb.getAddr());
   }
 
-  if (privateSyms)
-    privateSyms->push_back(sym);
-
   symToPrivatizer[sym] = privatizerOp;
 }
 
diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.h b/flang/lib/Lower/OpenMP/DataSharingProcessor.h
index 80a956de35ba9..fa2f87eb8d764 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.h
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.h
@@ -103,18 +103,15 @@ class DataSharingProcessor {
   void collectDefaultSymbols();
   void collectImplicitSymbols();
   void collectPreDeterminedSymbols();
-  void privatize(mlir::omp::PrivateClauseOps *clauseOps,
-                 llvm::SmallVectorImpl<const semantics::Symbol *> *privateSyms);
+  void privatize(mlir::omp::PrivateClauseOps *clauseOps);
   void defaultPrivatize(
       mlir::omp::PrivateClauseOps *clauseOps,
       llvm::SmallVectorImpl<const semantics::Symbol *> *privateSyms);
   void implicitPrivatize(
       mlir::omp::PrivateClauseOps *clauseOps,
       llvm::SmallVectorImpl<const semantics::Symbol *> *privateSyms);
-  void
-  doPrivatize(const semantics::Symbol *sym,
-              mlir::omp::PrivateClauseOps *clauseOps,
-              llvm::SmallVectorImpl<const semantics::Symbol *> *privateSyms);
+  void doPrivatize(const semantics::Symbol *sym,
+                   mlir::omp::PrivateClauseOps *clauseOps);
   void copyLastPrivatize(mlir::Operation *op);
   void insertLastPrivateCompare(mlir::Operation *op);
   void cloneSymbol(const semantics::Symbol *sym);
@@ -145,15 +142,18 @@ class DataSharingProcessor {
   // Step2 performs the copying for lastprivates and requires knowledge of the
   // MLIR operation to insert the last private update. Step2 adds
   // dealocation code as well.
-  void processStep1(
-      mlir::omp::PrivateClauseOps *clauseOps = nullptr,
-      llvm::SmallVectorImpl<const semantics::Symbol *> *privateSyms = nullptr);
+  void processStep1(mlir::omp::PrivateClauseOps *clauseOps = nullptr);
   void processStep2(mlir::Operation *op, bool isLoop);
 
   void setLoopIV(mlir::Value iv) {
     assert(!loopIV && "Loop iteration variable already set");
     loopIV = iv;
   }
+
+  const llvm::SetVector<const semantics::Symbol *> &
+  getAllSymbolsToPrivatize() const {
+    return allPrivatizedSymbols;
+  }
 };
 
 } // namespace omp
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index 9598457d123cf..06ec2534119a1 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -758,15 +758,32 @@ genBodyOfTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
                   llvm::ArrayRef<const semantics::Symbol *> mapSyms,
                   llvm::ArrayRef<mlir::Location> mapSymLocs,
                   llvm::ArrayRef<mlir::Type> mapSymTypes,
+                  DataSharingProcessor &dsp,
                   const mlir::Location &currentLocation,
                   const ConstructQueue &queue, ConstructQueue::iterator item) {
   assert(mapSymTypes.size() == mapSymLocs.size());
 
   fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
   mlir::Region &region = targetOp.getRegion();
-
-  auto *regionBlock =
-      firOpBuilder.createBlock(&region, {}, mapSymTypes, mapSymLocs);
+  mlir::OperandRange privateVars = targetOp.getPrivateVars();
+
+  llvm::SmallVector<mlir::Type> allRegionArgTypes;
+  allRegionArgTypes.reserve(mapSymTypes.size() + targetOp.getPrivateVars().size());
+  llvm::transform(mapSymTypes, std::back_inserter(allRegionArgTypes),
+                  [](mlir::Type t) { return t; });
+  llvm::transform(privateVars, std::back_inserter(allRegionArgTypes),
+                  [](mlir::Value v) { return v.getType(); });
+
+  llvm::SmallVector<mlir::Location> allRegionArgLocs;
+  allRegionArgTypes.reserve(mapSymTypes.size() +
+                            targetOp.getPrivateVars().size());
+  llvm::transform(mapSymLocs, std::back_inserter(allRegionArgLocs),
+                  [](mlir::Location l) { return l; });
+  llvm::transform(privateVars, std::back_inserter(allRegionArgLocs),
+                  [](mlir::Value v) { return v.getLoc(); });
+
+  auto *regionBlock = firOpBuilder.createBlock(&region, {}, allRegionArgTypes,
+                                               allRegionArgLocs);
 
   // Clones the `bounds` placing them inside the target region and returns them.
   auto cloneBound = [&](mlir::Value bound) {
@@ -830,6 +847,20 @@ genBodyOfTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
         });
   }
 
+  for (auto [argIndex, argSymbol] :
+       llvm::enumerate(dsp.getAllSymbolsToPrivatize())) {
+    argIndex = mapSyms.size() + argIndex;
+
+    const mlir::BlockArgument &arg = region.getArgument(argIndex);
+    converter.bindSymbol(*argSymbol,
+                         hlfir::translateToExtendedValue(
+                             currentLocation, firOpBuilder, hlfir::Entity{arg},
+                             /*contiguousHint=*/
+                             evaluate::IsSimplyContiguous(
+                                 *argSymbol, converter.getFoldingContext()))
+                             .first);
+  }
+
   // Check if cloning the bounds introduced any dependency on the outer region.
   // If so, then either clone them as well if they are MemoryEffectFree, or else
   // copy them to a new temporary and add them to the map and block_argument
@@ -907,6 +938,8 @@ genBodyOfTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
   } else {
     genNestedEvaluations(converter, eval);
   }
+
+  dsp.processStep2(targetOp, /*isLoop=*/false);
 }
 
 template <typename OpTy, typename... Args>
@@ -1048,15 +1081,18 @@ static void genTargetClauses(
                         devicePtrSyms);
   cp.processMap(loc, stmtCtx, clauseOps, &mapSyms, &mapLocs, &mapTypes);
   cp.processThreadLimit(stmtCtx, clauseOps);
-  // TODO Support delayed privatization.
 
   if (processHostOnlyClauses)
     cp.processNowait(clauseOps);
 
   cp.processTODO<clause::Allocate, clause::Defaultmap, clause::Firstprivate,
-                 clause::InReduction, clause::Private, clause::Reduction,
+                 clause::InReduction, clause::Reduction,
                  clause::UsesAllocators>(loc,
                                          llvm::omp::Directive::OMPD_target);
+
+  // `target private(..)` is only supported in delayed privatization mode.
+  if(!enableDelayedPrivatization)
+    cp.processTODO<clause::Private>(loc, llvm::omp::Directive::OMPD_target);
 }
 
 static void genTargetDataClauses(
@@ -1289,7 +1325,6 @@ genParallelOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
   fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
   lower::StatementContext stmtCtx;
   mlir::omp::ParallelClauseOps clauseOps;
-  llvm::SmallVector<const semantics::Symbol *> privateSyms;
   llvm::SmallVector<mlir::Type> reductionTypes;
   llvm::SmallVector<const semantics::Symbol *> reductionSyms;
   genParallelClauses(converter, semaCtx, stmtCtx, item->clauses, loc,
@@ -1319,7 +1354,7 @@ genParallelOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
                            /*useDelayedPrivatization=*/true, &symTable);
 
   if (privatize)
-    dsp.processStep1(&clauseOps, &privateSyms);
+    dsp.processStep1(&clauseOps);
 
   auto genRegionEntryCB = [&](mlir::Operation *op) {
     auto parallelOp = llvm::cast<mlir::omp::ParallelOp>(op);
@@ -1344,9 +1379,10 @@ genParallelOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
                              privateVarLocs);
 
     llvm::SmallVector<const semantics::Symbol *> allSymbols = reductionSyms;
-    allSymbols.append(privateSyms);
+    allSymbols.append(dsp.getAllSymbolsToPrivatize().begin(),
+                      dsp.getAllSymbolsToPrivatize().end());
+
     for (auto [arg, prv] : llvm::zip_equal(allSymbols, region.getArguments())) {
-      fir::ExtendedValue hostExV = converter.getSymbolExtendedValue(*arg);
       converter.bindSymbol(*arg, hlfir::translateToExtendedValue(
                                      loc, firOpBuilder, hlfir::Entity{prv},
                                      /*contiguousHint=*/
@@ -1541,11 +1577,22 @@ genTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
                    deviceAddrLocs, deviceAddrTypes, devicePtrSyms,
                    devicePtrLocs, devicePtrTypes);
 
+  llvm::SmallVector<const semantics::Symbol *> privateSyms;
+  DataSharingProcessor dsp(converter, semaCtx, item->clauses, eval,
+                           /*shouldCollectPreDeterminedSymbols=*/
+                           lower::omp::isLastItemInQueue(item, queue),
+                           /*useDelayedPrivatization=*/true, &symTable);
+  dsp.processStep1(&clauseOps);
+
   // 5.8.1 Implicit Data-Mapping Attribute Rules
   // The following code follows the implicit data-mapping rules to map all the
-  // symbols used inside the region that have not been explicitly mapped using
-  // the map clause.
+  // symbols used inside the region that do not have explicit data-environment
+  // attribute clauses (neither data-sharing; e.g. `private`, nor `map`
+  // clauses).
   auto captureImplicitMap = [&](const semantics::Symbol &sym) {
+    if (dsp.getAllSymbolsToPrivatize().contains(&sym))
+      return;
+
     if (llvm::find(mapSyms, &sym) == mapSyms.end()) {
       mlir::Value baseOp = converter.getSymbolAddress(sym);
       if (!baseOp)
@@ -1632,7 +1679,7 @@ genTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
 
   auto targetOp = firOpBuilder.create<mlir::omp::TargetOp>(loc, clauseOps);
   genBodyOfTargetOp(converter, symTable, semaCtx, eval, targetOp, mapSyms,
-                    mapLocs, mapTypes, loc, queue, item);
+                    mapLocs, mapTypes, dsp, loc, queue, item);
   return targetOp;
 }
 
diff --git a/flang/test/Lower/OpenMP/DelayedPrivatization/target-private-allocatable.f90 b/flang/test/Lower/OpenMP/DelayedPrivatization/target-private-allocatable.f90
new file mode 100644
index 0000000000000..5131aed29e9de
--- /dev/null
+++ b/flang/test/Lower/OpenMP/DelayedPrivatization/target-private-allocatable.f90
@@ -0,0 +1,69 @@
+! RUN: %flang_fc1 -emit-hlfir -fopenmp -mmlir --openmp-enable-delayed-privatization \
+! RUN:   -o - %s 2>&1 | FileCheck %s
+! RUN: bbc -emit-hlfir -fopenmp --openmp-enable-delayed-privatization -o - %s 2>&1 \
+! RUN:   | FileCheck %s
+
+subroutine target_allocatable
+  implicit none
+  integer, allocatable :: alloc_var
+
+  !$omp target private(alloc_var)
+    alloc_var = 10
+  !$omp end target
+end subroutine target_allocatable
+
+! CHECK-LABEL: omp.private {type = private}
+! CHECK-SAME:    @[[VAR_PRIVATIZER_SYM:.*]] :
+! CHECK-SAME:      [[TYPE:!fir.ref<!fir.box<!fir.heap<i32>>>]] alloc {
+! CHECK:  ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
+! CHECK:    %[[PRIV_ALLOC:.*]] = fir.alloca !fir.box<!fir.heap<i32>> {bindc_name = "alloc_var", {{.*}}}
+
+! CHECK-NEXT:   %[[PRIV_ARG_VAL:.*]] = fir.load %[[PRIV_ARG]] : !fir.ref<!fir.box<!fir.heap<i32>>>
+! CHECK-NEXT:   %[[PRIV_ARG_BOX:.*]] = fir.box_addr %[[PRIV_ARG_VAL]] : (!fir.box<!fir.heap<i32>>) -> !fir.heap<i32>
+! CHECK-NEXT:   %[[PRIV_ARG_ADDR:.*]] = fir.convert %[[PRIV_ARG_BOX]] : (!fir.heap<i32>) -> i64
+! CHECK-NEXT:   %[[C0:.*]] = arith.constant 0 : i64
+! CHECK-NEXT:   %[[ALLOC_COND:.*]] = arith.cmpi ne, %[[PRIV_ARG_ADDR]], %[[C0]] : i64
+
+! CHECK-NEXT:   fir.if %[[ALLOC_COND]] {
+! CHECK:          %[[PRIV_ALLOCMEM:.*]] = fir.allocmem i32 {fir.must_be_heap = true, {{.*}}}
+! CHECK-NEXT:     %[[PRIV_ALLOCMEM_BOX:.*]] = fir.embox %[[PRIV_ALLOCMEM]] : (!fir.heap<i32>) -> !fir.box<!fir.heap<i32>>
+! CHECK-NEXT:     fir.store %[[PRIV_ALLOCMEM_BOX]] to %[[PRIV_ALLOC]] : !fir.ref<!fir.box<!fir.heap<i32>>>
+! CHECK-NEXT:   } else {
+! CHECK-NEXT:     %[[ZERO_BITS:.*]] = fir.zero_bits !fir.heap<i32>
+! CHECK-NEXT:     %[[ZERO_BOX:.*]] = fir.embox %[[ZERO_BITS]] : (!fir.heap<i32>) -> !fir.box<!fir.heap<i32>>
+! CHECK-NEXT:     fir.store %[[ZERO_BOX]] to %[[PRIV_ALLOC]] : !fir.ref<!fir.box<!fir.heap<i32>>>
+! CHECK-NEXT:   }
+
+! CHECK-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]]
+! CHECK-NEXT:   omp.yield(%[[PRIV_DECL]]#0 : [[TYPE]])
+
+! CHECK-NEXT: } dealloc {
+! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
+
+! CHECK-NEXT:   %[[PRIV_VAL:.*]] = fir.load %[[PRIV_ARG]]
+! CHECK-NEXT:   %[[PRIV_ADDR:.*]] = fir.box_addr %[[PRIV_VAL]]
+! CHECK-NEXT:   %[[PRIV_ADDR_I64:.*]] = fir.convert %[[PRIV_ADDR]]
+! CHECK-NEXT:   %[[C0:.*]] = arith.constant 0 : i64
+! CHECK-NEXT:   %[[PRIV_NULL_COND:.*]] = arith.cmpi ne, %[[PRIV_ADDR_I64]], %[[C0]] : i64
+
+! CHECK-NEXT:   fir.if %[[PRIV_NULL_COND]] {
+! CHECK:          %[[PRIV_VAL_2:.*]] = fir.load %[[PRIV_ARG]]
+! CHECK-NEXT:     %[[PRIV_ADDR_2:.*]] = fir.box_addr %[[PRIV_VAL_2]]
+! CHECK-NEXT:     fir.freemem %[[PRIV_ADDR_2]]
+! CHECK-NEXT:     %[[ZEROS:.*]] = fir.zero_bits
+! CHECK-NEXT:     %[[ZEROS_BOX:.*]]  = fir.embox %[[ZEROS]]
+! CHECK-NEXT:     fir.store %[[ZEROS_BOX]] to %[[PRIV_ARG]]
+! CHECK-NEXT:   }
+
+! CHECK-NEXT:   omp.yield
+! CHECK-NEXT: }
+
+
+! CHECK-LABEL: func.func @_QPtarget_allocatable() {
+
+! CHECK:  %[[VAR_ALLOC:.*]] = fir.alloca !fir.box<!fir.heap<i32>>
+! CHECK-SAME: {bindc_name = "alloc_var", {{.*}}}
+! CHECK:  %[[VAR_DECL:.*]]:2 = hlfir.declare %[[VAR_ALLOC]]
+
+! CHECK:  omp.target private(
+! CHECK-SAME: @[[VAR_PRIVATIZER_SYM]] %[[VAR_DECL]]#0 -> %{{.*}} : [[TYPE]]) {
diff --git a/flang/test/Lower/OpenMP/DelayedPrivatization/target-private-simple.f90 b/flang/test/Lower/OpenMP/DelayedPrivatization/target-private-simple.f90
new file mode 100644
index 0000000000000..951fdcb810e5c
--- /dev/null
+++ b/flang/test/Lower/OpenMP/DelayedPrivatization/target-private-simple.f90
@@ -0,0 +1,38 @@
+! RUN: %flang_fc1 -emit-hlfir -fopenmp -mmlir --openmp-enable-delayed-privatization \
+! RUN:   -o - %s 2>&1 | FileCheck %s
+! RUN: bbc -emit-hlfir -fopenmp --openmp-enable-delayed-privatization -o - %s 2>&1 \
+! RUN:   | FileCheck %s
+
+subroutine target_simple
+  implicit none
+  integer :: simple_var
+
+  !$omp target private(simple_var)
+    simple_var = 10
+  !$omp end target
+end subroutine target_simple
+
+! CHECK-LABEL: omp.private {type = private}
+! CHECK-SAME:              @[[VAR_PRIVATIZER_SYM:.*]] : !fir.ref<i32> alloc {
+! CHECK:  ^bb0(%[[PRIV_ARG:.*]]: !fir.ref<i32>):
+! CHECK:    %[[PRIV_ALLOC:.*]] = fir.alloca i32 {bindc_name = "simple_var", {{.*}}}
+! CHECK:    %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]]
+! CHECK:    omp.yield(%[[PRIV_DECL]]#0 : !fir.ref<i32>)
+! CHECK: }
+
+! CHECK-LABEL: func.func @_QPtarget_simple() {
+! CHECK:  %[[VAR_ALLOC:.*]] = fir.alloca i32 {bindc_name = "simple_var", {{.*}}}
+! CHECK:  %[[VAR_DECL:.*]]:2 = hlfir.declare %[[VAR_ALLOC]]
+
+! CHECK:  omp.target private(
+! CHECK-SAME: @[[VAR_PRIVATIZER_SYM]] %[[VAR_DECL]]#0 -> %{{.*}} : !fir.ref<i32>) {
+! CHECK:    ^bb0(%[[REG_ARG:.*]]: !fir.ref<i32>):
+! CHECK:      %[[REG_DECL:.*]]:2 = hlfir.declare %[[REG_ARG]]
+! CHECK:      %[[C10:.*]] = arith.constant 10
+! CHECK:      hlfir.assign %[[C10]] to %[[REG_DECL]]#0
+! CHECK:      omp.terminator
+! CHECK:    }
+
+! CHECK:    return
+! CHECK:  }
+

Copy link

github-actions bot commented Jun 3, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Extends delayed privatization support to `taraget .. private(..)`. With
this PR, `private` is support for `target` **only** is delayed
privatization mode.
@ergawy ergawy force-pushed the target_private_delayed branch from cd6cce5 to 75291f1 Compare June 3, 2024 09:04
@ergawy ergawy requested review from skatrak, kparzysz, kiranchandramohan, tblah and bhandarkar-pranav and removed request for skatrak and kparzysz June 3, 2024 12:29
@ergawy
Copy link
Member Author

ergawy commented Jun 4, 2024

Ping 🔔. Please try to take a look on this PR!

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a few more tests?
-> With a default private.
-> With three different types of privates.

Did you get a chance to discuss with @mjklemm about how privatisation works for tasks and since target has some task like properties whether we need something more than inlining?

Comment on lines 768 to 787
mlir::OperandRange privateVars = targetOp.getPrivateVars();

llvm::SmallVector<mlir::Type> allRegionArgTypes;
allRegionArgTypes.reserve(mapSymTypes.size() +
targetOp.getPrivateVars().size());
llvm::transform(mapSymTypes, std::back_inserter(allRegionArgTypes),
[](mlir::Type t) { return t; });
llvm::transform(privateVars, std::back_inserter(allRegionArgTypes),
[](mlir::Value v) { return v.getType(); });

llvm::SmallVector<mlir::Location> allRegionArgLocs;
allRegionArgTypes.reserve(mapSymTypes.size() +
targetOp.getPrivateVars().size());
llvm::transform(mapSymLocs, std::back_inserter(allRegionArgLocs),
[](mlir::Location l) { return l; });
llvm::transform(privateVars, std::back_inserter(allRegionArgLocs),
[](mlir::Value v) { return v.getLoc(); });

auto *regionBlock = firOpBuilder.createBlock(&region, {}, allRegionArgTypes,
allRegionArgLocs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment for this block of code. Is it possible to share any of it with the parallel operation? Or is it not possible due to the isolated from above nature of target?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstracted the common pieces into one template and commented it. Let me know if you don't like how I handled this or if it is too abstract and/or noisy.

@ergawy
Copy link
Member Author

ergawy commented Jun 5, 2024

Thanks for taking a look @kiranchandramohan.

Could you add a few more tests?
-> With a default private.

I might have missed something, but target does not support the default clause, right?

-> With three different types of privates.

You mean different types of variables; e.g. primitive, allocatable, pointer, array, etc? Or something else?

Did you get a chance to discuss with @mjklemm about how privatisation works for tasks and since target has some task like properties whether we need something more than inlining?

Not yet. Will do that. (cc @mjklemm & @pranavb-ca since we discussed target support in MLIR -> LLVM translation layer)

@bhandarkar-pranav
Copy link
Contributor

Could you add a few more tests? -> With a default private. -> With three different types of privates.

Did you get a chance to discuss with @mjklemm about how privatisation works for tasks and since target has some task like properties whether we need something more than inlining?

Yes, we'll need a way to send the privates via the target task to the target kernel so that firstprivate variables can be initialized. The way clang does this is by tacking on the private variables at the end of the task_t structure in the call to kmpc_omp_task_alloc. For now, this is a TODO item for both, Tasks and Target in the OpenMPIRBuilder. These values can then be picked up by the inlined code (generated out of the privatizers by way of a PrivCB callback) in the target kernel.

@kiranchandramohan
Copy link
Contributor

I might have missed something, but target does not support the default clause, right?

Sorry, i completely missed that.

You mean different types of variables; e.g. primitive, allocatable, pointer, array, etc? Or something else?

I specifically meant types here (integer, real, complex, character). But you can include arrays, allocatable etc. Just to ensure we are correctly handling the case when more than one type is present. I suggest also trying out multiple private clauses.

Could you add a few more tests? -> With a default private. -> With three different types of privates.
Did you get a chance to discuss with @mjklemm about how privatisation works for tasks and since target has some task like properties whether we need something more than inlining?

Yes, we'll need a way to send the privates via the target task to the target kernel so that firstprivate variables can be initialized. The way clang does this is by tacking on the private variables at the end of the task_t structure in the call to kmpc_omp_task_alloc. For now, this is a TODO item for both, Tasks and Target in the OpenMPIRBuilder. These values can then be picked up by the inlined code (generated out of the privatizers by way of a PrivCB callback) in the target kernel.

What I am worried about is whether we want a different style entry in the PrivateClause Operation so that the firstprivate variable can be laid out properly in the privatisation part of the task_t structure.
Would it make sense to add a TODO/Not Yet Implemented Error for all target directives for firstprivate clauses that can be deferred?

Separately, you might need something like @luporl's patch for default firstprivate behaviour. #85989

@bhandarkar-pranav
Copy link
Contributor

bhandarkar-pranav commented Jun 5, 2024

What I am worried about is whether we want a different style entry in the PrivateClause Operation so that the firstprivate variable can be laid out properly in the privatisation part of the task_t structure.

I am afraid I dont really understands what you mean here, specifically when you say "different style entry in PrivateClause Operation". Could you please elaborate?

Would it make sense to add a TODO/Not Yet Implemented Error for all target directives for firstprivate clauses that can be deferred?

I think there's no harm in separating the translation of private from firstprivate on target constructs by deferring the firstprivate work. (Implementing a TODO)

Separately, you might need something like @luporl's patch for default firstprivate behaviour. #85989

Thank you for bringing to my notice.

@kiranchandramohan
Copy link
Contributor

What I am worried about is whether we want a different style entry in the PrivateClause Operation so that the firstprivate variable can be laid out properly in the privatisation part of the task_t structure.

I am afraid I dont really understands what you mean here, specifically when you say "different style entry in PrivateClause Operation". Could you please elaborate?

To layout a copy of the original variable in the task datastructure, I am assuming you will need some kind of size, offset and alignment information. With a PrivateClause operation we have a sequence of MLIR operations. It is not clear whether we can deduce the necessary information for laying out a firstprivate variable in the task datastructure.

@ergawy
Copy link
Member Author

ergawy commented Jun 7, 2024

Would it make sense to add a TODO/Not Yet Implemented Error for all target directives for firstprivate clauses that can be deferred?

firstprivate is currrently still a TODO for target:

  cp.processTODO<clause::Allocate, clause::Defaultmap, clause::Firstprivate,
                 clause::InReduction, clause::Reduction,
                 clause::UsesAllocators>(loc,
                                         llvm::omp::Directive::OMPD_target);

This PR is only for private. Let me know if I misunderstood what you mean.

I also added a more extensive test.

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG.

@ergawy ergawy merged commit 913a824 into llvm:main Jun 7, 2024
7 checks passed
@HerrCai0907 HerrCai0907 mentioned this pull request Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants