Skip to content

[flang][mlir][OpenMP] Add support for COPYPRIVATE #73128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

luporl
Copy link
Contributor

@luporl luporl commented Nov 22, 2023

Add initial handling of OpenMP COPYPRIVATE clause in Flang.

MLIR's omp.single operation was modified to support an optional
CopyPrivateVarList. It consists of pairs of variables and
functions. When present, each thread variable is updated with the
variable value of the thread that executed the single region,
using the specified functions to perform the copy.

When lowering COPYPRIVATE, Flang then generates the copy function
needed by each variable and builds the appropriate
CopyPrivateVarList. The translation to LLVM IR is done in
OMPIRBuilder, by calling createCopyPrivate() for each variable in
the list, which generates calls to __kmpc_copyprivate.

Fixes #63933

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp flang:semantics labels Nov 22, 2023
@llvmbot
Copy link
Member

llvmbot commented Nov 22, 2023

@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-mlir-openmp
@llvm/pr-subscribers-mlir-llvm
@llvm/pr-subscribers-flang-semantics

@llvm/pr-subscribers-flang-fir-hlfir

Author: Leandro Lupori (luporl)

Changes

Add initial handling of COPYPRIVATE clause.

It was implemented using a temporary stack variable that can be
accessed by all threads, a sync variable. A single thread writes the
value of its private variable to this temporary, at the end of the
single region. After the single operation, all threads then read
from the sync variable and write its value to their private copies.

Fixes #63933


Patch is 42.64 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73128.diff

6 Files Affected:

  • (modified) flang/include/flang/Lower/AbstractConverter.h (+23)
  • (modified) flang/lib/Lower/Bridge.cpp (+257-173)
  • (modified) flang/lib/Lower/OpenMP.cpp (+70-5)
  • (modified) flang/lib/Semantics/resolve-directives.cpp (+2-1)
  • (removed) flang/test/Lower/OpenMP/Todo/copyprivate.f90 (-13)
  • (added) flang/test/Lower/OpenMP/copyprivate.f90 (+244)
diff --git a/flang/include/flang/Lower/AbstractConverter.h b/flang/include/flang/Lower/AbstractConverter.h
index 980fde881373249..bb182812c54132f 100644
--- a/flang/include/flang/Lower/AbstractConverter.h
+++ b/flang/include/flang/Lower/AbstractConverter.h
@@ -51,6 +51,7 @@ class DerivedTypeSpec;
 } // namespace semantics
 
 namespace lower {
+struct SymbolBox;
 class SymMap;
 namespace pft {
 struct Variable;
@@ -111,13 +112,35 @@ class AbstractConverter {
   virtual bool
   createHostAssociateVarClone(const Fortran::semantics::Symbol &sym) = 0;
 
+  /// For a given symbol which may not be host-associated, create a clone using
+  /// parameters from the symbol or from the host-associated symbol, if any.
+  /// This member function does not insert the clone in the symbol table and
+  /// does not initialize it.
+  virtual Fortran::lower::SymbolBox
+  createVarClone(const Fortran::semantics::Symbol &sym) = 0;
+
+  /// Initialize a previously created clone.
+  virtual void initVarClone(const Fortran::semantics::Symbol &sym,
+                            const Fortran::lower::SymbolBox &clone) = 0;
+
   virtual void
   createHostAssociateVarCloneDealloc(const Fortran::semantics::Symbol &sym) = 0;
 
+  virtual void createVarCloneDealloc(const Fortran::semantics::Symbol &sym,
+                                     Fortran::lower::SymbolBox &sb) = 0;
+
   virtual void copyHostAssociateVar(
       const Fortran::semantics::Symbol &sym,
       mlir::OpBuilder::InsertPoint *copyAssignIP = nullptr) = 0;
 
+  virtual void copyVar(const Fortran::semantics::Symbol &dst,
+                       const Fortran::lower::SymbolBox &src,
+                       bool needBarrier = false) = 0;
+
+  virtual void copyVar(const Fortran::lower::SymbolBox &dst,
+                       const Fortran::semantics::Symbol &src,
+                       bool needBarrier = false) = 0;
+
   /// For a given symbol, check if it is present in the inner-most
   /// level of the symbol map.
   virtual bool isPresentShallowLookup(Fortran::semantics::Symbol &sym) = 0;
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index 872bf6bc729ecd0..0cb43bb67a2a964 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -53,6 +53,7 @@
 #include "flang/Semantics/symbol.h"
 #include "flang/Semantics/tools.h"
 #include "mlir/Dialect/ControlFlow/IR/ControlFlowOps.h"
+#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
 #include "mlir/IR/PatternMatch.h"
 #include "mlir/Parser/Parser.h"
 #include "mlir/Transforms/RegionUtils.h"
@@ -609,125 +610,41 @@ class FirConverter : public Fortran::lower::AbstractConverter {
 
   bool createHostAssociateVarClone(
       const Fortran::semantics::Symbol &sym) override final {
-    mlir::Location loc = genLocation(sym.name());
-    mlir::Type symType = genType(sym);
-    const auto *details = sym.detailsIf<Fortran::semantics::HostAssocDetails>();
-    assert(details && "No host-association found");
-    const Fortran::semantics::Symbol &hsym = details->symbol();
-    mlir::Type hSymType = genType(hsym);
-    Fortran::lower::SymbolBox hsb = lookupSymbol(hsym);
-
-    auto allocate = [&](llvm::ArrayRef<mlir::Value> shape,
-                        llvm::ArrayRef<mlir::Value> typeParams) -> mlir::Value {
-      mlir::Value allocVal = builder->allocateLocal(
-          loc,
-          Fortran::semantics::IsAllocatableOrObjectPointer(&hsym.GetUltimate())
-              ? hSymType
-              : symType,
-          mangleName(sym), toStringRef(sym.GetUltimate().name()),
-          /*pinned=*/true, shape, typeParams,
-          sym.GetUltimate().attrs().test(Fortran::semantics::Attr::TARGET));
-      return allocVal;
-    };
-
-    fir::ExtendedValue hexv = symBoxToExtendedValue(hsb);
-    fir::ExtendedValue exv = hexv.match(
-        [&](const fir::BoxValue &box) -> fir::ExtendedValue {
-          const Fortran::semantics::DeclTypeSpec *type = sym.GetType();
-          if (type && type->IsPolymorphic())
-            TODO(loc, "create polymorphic host associated copy");
-          // Create a contiguous temp with the same shape and length as
-          // the original variable described by a fir.box.
-          llvm::SmallVector<mlir::Value> extents =
-              fir::factory::getExtents(loc, *builder, hexv);
-          if (box.isDerivedWithLenParameters())
-            TODO(loc, "get length parameters from derived type BoxValue");
-          if (box.isCharacter()) {
-            mlir::Value len = fir::factory::readCharLen(*builder, loc, box);
-            mlir::Value temp = allocate(extents, {len});
-            return fir::CharArrayBoxValue{temp, len, extents};
-          }
-          return fir::ArrayBoxValue{allocate(extents, {}), extents};
-        },
-        [&](const fir::MutableBoxValue &box) -> fir::ExtendedValue {
-          // Allocate storage for a pointer/allocatble descriptor.
-          // No shape/lengths to be passed to the alloca.
-          return fir::MutableBoxValue(allocate({}, {}), {}, {});
-        },
-        [&](const auto &) -> fir::ExtendedValue {
-          mlir::Value temp =
-              allocate(fir::factory::getExtents(loc, *builder, hexv),
-                       fir::factory::getTypeParams(loc, *builder, hexv));
-          return fir::substBase(hexv, temp);
-        });
-
-    // Initialise cloned allocatable
-    hexv.match(
-        [&](const fir::MutableBoxValue &box) -> void {
-          // Do not process pointers
-          if (Fortran::semantics::IsPointer(sym.GetUltimate())) {
-            return;
-          }
-          // Allocate storage for a pointer/allocatble descriptor.
-          // No shape/lengths to be passed to the alloca.
-          const auto new_box = exv.getBoxOf<fir::MutableBoxValue>();
+    assert(sym.detailsIf<Fortran::semantics::HostAssocDetails>() &&
+           "No host-association found");
+    fir::ExtendedValue exv = cloneSymbolValue(sym);
+    fir::ExtendedValue oexv = symBoxToExtendedValue(getOriginalSymbolBox(sym));
+    initClonedValue(sym, exv, oexv);
+    return bindIfNewSymbol(sym, exv);
+  }
 
-          // allocate if allocated
-          mlir::Value isAllocated =
-              fir::factory::genIsAllocatedOrAssociatedTest(*builder, loc, box);
-          auto if_builder = builder->genIfThenElse(loc, isAllocated);
-          if_builder.genThen([&]() {
-            std::string name = mangleName(sym) + ".alloc";
-            if (auto seqTy = symType.dyn_cast<fir::SequenceType>()) {
-              fir::ExtendedValue read = fir::factory::genMutableBoxRead(
-                  *builder, loc, box, /*mayBePolymorphic=*/false);
-              if (auto read_arr_box = read.getBoxOf<fir::ArrayBoxValue>()) {
-                fir::factory::genInlinedAllocation(
-                    *builder, loc, *new_box, read_arr_box->getLBounds(),
-                    read_arr_box->getExtents(),
-                    /*lenParams=*/std::nullopt, name,
-                    /*mustBeHeap=*/true);
-              } else if (auto read_char_arr_box =
-                             read.getBoxOf<fir::CharArrayBoxValue>()) {
-                fir::factory::genInlinedAllocation(
-                    *builder, loc, *new_box, read_char_arr_box->getLBounds(),
-                    read_char_arr_box->getExtents(),
-                    read_char_arr_box->getLen(), name,
-                    /*mustBeHeap=*/true);
-              } else {
-                TODO(loc, "Unhandled allocatable box type");
-              }
-            } else {
-              fir::factory::genInlinedAllocation(
-                  *builder, loc, *new_box, box.getMutableProperties().lbounds,
-                  box.getMutableProperties().extents,
-                  box.nonDeferredLenParams(), name,
-                  /*mustBeHeap=*/true);
-            }
-          });
-          if_builder.genElse([&]() {
-            // nullify box
-            auto empty = fir::factory::createUnallocatedBox(
-                *builder, loc, new_box->getBoxTy(),
-                new_box->nonDeferredLenParams(), {});
-            builder->create<fir::StoreOp>(loc, empty, new_box->getAddr());
-          });
-          if_builder.end();
-        },
-        [&](const auto &) -> void {
-          // Do nothing
-        });
+  Fortran::lower::SymbolBox
+  createVarClone(const Fortran::semantics::Symbol &sym) override final {
+    fir::ExtendedValue exv = cloneSymbolValue(sym);
+    Fortran::lower::SymMap symMap;
+    addSymbol(sym, exv, /*forced=*/true, symMap);
+    return symMap.shallowLookupSymbol(sym);
+  }
 
-    return bindIfNewSymbol(sym, exv);
+  void initVarClone(const Fortran::semantics::Symbol &sym,
+                    const Fortran::lower::SymbolBox &clone) override final {
+    fir::ExtendedValue exv = symBoxToExtendedValue(clone);
+    fir::ExtendedValue oexv = symBoxToExtendedValue(getOriginalSymbolBox(sym));
+    initClonedValue(sym, exv, oexv);
   }
 
   void createHostAssociateVarCloneDealloc(
       const Fortran::semantics::Symbol &sym) override final {
-    mlir::Location loc = genLocation(sym.name());
     Fortran::lower::SymbolBox hsb = lookupSymbol(sym);
+    createVarCloneDealloc(sym, hsb);
+  }
+
+  void createVarCloneDealloc(const Fortran::semantics::Symbol &sym,
+                             Fortran::lower::SymbolBox &sb) override final {
+    mlir::Location loc = genLocation(sym.name());
 
-    fir::ExtendedValue hexv = symBoxToExtendedValue(hsb);
-    hexv.match(
+    fir::ExtendedValue exv = symBoxToExtendedValue(sb);
+    exv.match(
         [&](const fir::MutableBoxValue &new_box) -> void {
           // Do not process pointers
           if (Fortran::semantics::IsPointer(sym.GetUltimate())) {
@@ -741,6 +658,20 @@ class FirConverter : public Fortran::lower::AbstractConverter {
         });
   }
 
+  void copyVar(const Fortran::semantics::Symbol &dst,
+               const Fortran::lower::SymbolBox &src,
+               bool needBarrier = false) override final {
+    Fortran::lower::SymbolBox dst_sb = lookupSymbol(dst);
+    copyVar(dst, dst_sb, src, needBarrier);
+  }
+
+  void copyVar(const Fortran::lower::SymbolBox &dst,
+               const Fortran::semantics::Symbol &src,
+               bool needBarrier = false) override final {
+    Fortran::lower::SymbolBox src_sb = lookupSymbol(src);
+    copyVar(src, dst, src_sb, needBarrier);
+  }
+
   void copyHostAssociateVar(
       const Fortran::semantics::Symbol &sym,
       mlir::OpBuilder::InsertPoint *copyAssignIP = nullptr) override final {
@@ -775,64 +706,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
       rhs_sb = &hsb;
     }
 
-    mlir::Location loc = genLocation(sym.name());
-
-    if (lowerToHighLevelFIR()) {
-      hlfir::Entity lhs{lhs_sb->getAddr()};
-      hlfir::Entity rhs{rhs_sb->getAddr()};
-      // Temporary_lhs is set to true in hlfir.assign below to avoid user
-      // assignment to be used and finalization to be called on the LHS.
-      // This may or may not be correct but mimics the current behaviour
-      // without HLFIR.
-      auto copyData = [&](hlfir::Entity l, hlfir::Entity r) {
-        // Dereference RHS and load it if trivial scalar.
-        r = hlfir::loadTrivialScalar(loc, *builder, r);
-        builder->create<hlfir::AssignOp>(
-            loc, r, l,
-            /*isWholeAllocatableAssignment=*/false,
-            /*keepLhsLengthInAllocatableAssignment=*/false,
-            /*temporary_lhs=*/true);
-      };
-      if (lhs.isAllocatable()) {
-        // Deep copy allocatable if it is allocated.
-        // Note that when allocated, the RHS is already allocated with the LHS
-        // shape for copy on entry in createHostAssociateVarClone.
-        // For lastprivate, this assumes that the RHS was not reallocated in
-        // the OpenMP region.
-        lhs = hlfir::derefPointersAndAllocatables(loc, *builder, lhs);
-        mlir::Value addr = hlfir::genVariableRawAddress(loc, *builder, lhs);
-        mlir::Value isAllocated = builder->genIsNotNullAddr(loc, addr);
-        builder->genIfThen(loc, isAllocated)
-            .genThen([&]() {
-              // Copy the DATA, not the descriptors.
-              copyData(lhs, rhs);
-            })
-            .end();
-      } else if (lhs.isPointer()) {
-        // Set LHS target to the target of RHS (do not copy the RHS
-        // target data into the LHS target storage).
-        auto loadVal = builder->create<fir::LoadOp>(loc, rhs);
-        builder->create<fir::StoreOp>(loc, loadVal, lhs);
-      } else {
-        // Non ALLOCATABLE/POINTER variable. Simple DATA copy.
-        copyData(lhs, rhs);
-      }
-    } else {
-      fir::ExtendedValue lhs = symBoxToExtendedValue(*lhs_sb);
-      fir::ExtendedValue rhs = symBoxToExtendedValue(*rhs_sb);
-      mlir::Type symType = genType(sym);
-      if (auto seqTy = symType.dyn_cast<fir::SequenceType>()) {
-        Fortran::lower::StatementContext stmtCtx;
-        Fortran::lower::createSomeArrayAssignment(*this, lhs, rhs, localSymbols,
-                                                  stmtCtx);
-        stmtCtx.finalizeAndReset();
-      } else if (lhs.getBoxOf<fir::CharBoxValue>()) {
-        fir::factory::CharacterExprHelper{*builder, loc}.createAssign(lhs, rhs);
-      } else {
-        auto loadVal = builder->create<fir::LoadOp>(loc, fir::getBase(rhs));
-        builder->create<fir::StoreOp>(loc, loadVal, fir::getBase(lhs));
-      }
-    }
+    copyVar(sym, *lhs_sb, *rhs_sb);
 
     if (copyAssignIP && copyAssignIP->isSet() &&
         sym.test(Fortran::semantics::Symbol::Flag::OmpLastPrivate)) {
@@ -1075,16 +949,226 @@ class FirConverter : public Fortran::lower::AbstractConverter {
                  fir::ExtendedValue val, bool forced = false) {
     if (!forced && lookupSymbol(sym))
       return false;
+    return addSymbol(sym, val, forced, localSymbols);
+  }
+
+  /// Add the symbol to \p symMap.
+  /// Always returns `true`.
+  bool addSymbol(const Fortran::semantics::SymbolRef sym,
+                 fir::ExtendedValue val, bool forced,
+                 Fortran::lower::SymMap &symMap) {
     if (lowerToHighLevelFIR()) {
-      Fortran::lower::genDeclareSymbol(*this, localSymbols, sym, val,
-                                       fir::FortranVariableFlagsEnum::None,
-                                       forced);
+      Fortran::lower::genDeclareSymbol(
+          *this, symMap, sym, val, fir::FortranVariableFlagsEnum::None, forced);
     } else {
-      localSymbols.addSymbol(sym, val, forced);
+      symMap.addSymbol(sym, val, forced);
     }
     return true;
   }
 
+  void initClonedValue(const Fortran::semantics::Symbol &sym,
+                       const fir::ExtendedValue &clone,
+                       const fir::ExtendedValue &orig) {
+    mlir::Location loc = genLocation(sym.name());
+    mlir::Type symType = genType(sym);
+    // The type of a non host associated symbol may be wrapped inside a box.
+    if (!sym.detailsIf<Fortran::semantics::HostAssocDetails>()) {
+      if (mlir::Type seqType = fir::unwrapUntilSeqType(symType))
+        symType = seqType;
+    }
+
+    // Initialise cloned allocatable
+    orig.match(
+        [&](const fir::MutableBoxValue &box) -> void {
+          // Do not process pointers
+          if (Fortran::semantics::IsPointer(sym.GetUltimate())) {
+            return;
+          }
+          // Allocate storage for a pointer/allocatble descriptor.
+          // No shape/lengths to be passed to the alloca.
+          const auto new_box = clone.getBoxOf<fir::MutableBoxValue>();
+
+          // allocate if allocated
+          mlir::Value isAllocated =
+              fir::factory::genIsAllocatedOrAssociatedTest(*builder, loc, box);
+          auto if_builder = builder->genIfThenElse(loc, isAllocated);
+          if_builder.genThen([&]() {
+            std::string name = mangleName(sym) + ".alloc";
+            if (auto seqTy = symType.dyn_cast<fir::SequenceType>()) {
+              fir::ExtendedValue read = fir::factory::genMutableBoxRead(
+                  *builder, loc, box, /*mayBePolymorphic=*/false);
+              if (auto read_arr_box = read.getBoxOf<fir::ArrayBoxValue>()) {
+                fir::factory::genInlinedAllocation(
+                    *builder, loc, *new_box, read_arr_box->getLBounds(),
+                    read_arr_box->getExtents(),
+                    /*lenParams=*/std::nullopt, name,
+                    /*mustBeHeap=*/true);
+              } else if (auto read_char_arr_box =
+                             read.getBoxOf<fir::CharArrayBoxValue>()) {
+                fir::factory::genInlinedAllocation(
+                    *builder, loc, *new_box, read_char_arr_box->getLBounds(),
+                    read_char_arr_box->getExtents(),
+                    read_char_arr_box->getLen(), name,
+                    /*mustBeHeap=*/true);
+              } else {
+                TODO(loc, "Unhandled allocatable box type");
+              }
+            } else {
+              fir::factory::genInlinedAllocation(
+                  *builder, loc, *new_box, box.getMutableProperties().lbounds,
+                  box.getMutableProperties().extents,
+                  box.nonDeferredLenParams(), name,
+                  /*mustBeHeap=*/true);
+            }
+          });
+          if_builder.genElse([&]() {
+            // nullify box
+            auto empty = fir::factory::createUnallocatedBox(
+                *builder, loc, new_box->getBoxTy(),
+                new_box->nonDeferredLenParams(), {});
+            builder->create<fir::StoreOp>(loc, empty, new_box->getAddr());
+          });
+          if_builder.end();
+        },
+        [&](const auto &) -> void {
+          // Do nothing
+        });
+  }
+
+  Fortran::lower::SymbolBox
+  getOriginalSymbolBox(const Fortran::semantics::Symbol &sym) {
+    const auto *details = sym.detailsIf<Fortran::semantics::HostAssocDetails>();
+    if (details) {
+      const Fortran::semantics::Symbol &hsym = details->symbol();
+      return lookupSymbol(hsym);
+    }
+    return lookupSymbol(sym);
+  }
+
+  fir::ExtendedValue cloneSymbolValue(const Fortran::semantics::Symbol &sym) {
+    mlir::Location loc = genLocation(sym.name());
+    mlir::Type symType = genType(sym);
+    mlir::Type allocType = symType;
+    const auto *details = sym.detailsIf<Fortran::semantics::HostAssocDetails>();
+    if (details) {
+      const Fortran::semantics::Symbol &hsym = details->symbol();
+      if (Fortran::semantics::IsAllocatableOrObjectPointer(&hsym.GetUltimate()))
+        allocType = genType(hsym);
+    }
+    Fortran::lower::SymbolBox sb = getOriginalSymbolBox(sym);
+
+    auto allocate = [&](llvm::ArrayRef<mlir::Value> shape,
+                        llvm::ArrayRef<mlir::Value> typeParams) -> mlir::Value {
+      mlir::Value allocVal = builder->allocateLocal(
+          loc, allocType, mangleName(sym),
+          toStringRef(sym.GetUltimate().name()),
+          /*pinned=*/true, shape, typeParams,
+          sym.GetUltimate().attrs().test(Fortran::semantics::Attr::TARGET));
+      return allocVal;
+    };
+
+    fir::ExtendedValue oexv = symBoxToExtendedValue(sb);
+    fir::ExtendedValue exv = oexv.match(
+        [&](const fir::BoxValue &box) -> fir::ExtendedValue {
+          const Fortran::semantics::DeclTypeSpec *type = sym.GetType();
+          if (type && type->IsPolymorphic())
+            TODO(loc, "create polymorphic copy");
+          // Create a contiguous temp with the same shape and length as
+          // the original variable described by a fir.box.
+          llvm::SmallVector<mlir::Value> extents =
+              fir::factory::getExtents(loc, *builder, oexv);
+          if (box.isDerivedWithLenParameters())
+            TODO(loc, "get length parameters from derived type BoxValue");
+          if (box.isCharacter()) {
+            mlir::Value len = fir::factory::readCharLen(*builder, loc, box);
+            mlir::Value temp = allocate(extents, {len});
+            return fir::CharArrayBoxValue{temp, len, extents};
+          }
+          return fir::ArrayBoxValue{allocate(extents, {}), extents};
+        },
+        [&](const fir::MutableBoxValue &box) -> fir::ExtendedValue {
+          // Allocate storage for a pointer/allocatble descriptor.
+          // No shape/lengths to be passed to the alloca.
+          return fir::MutableBoxValue(allocate({}, {}), {}, {});
+        },
+        [&](const auto &) -> fir::ExtendedValue {
+          mlir::...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Nov 22, 2023

@llvm/pr-subscribers-flang-openmp

Author: Leandro Lupori (luporl)

Changes

Add initial handling of COPYPRIVATE clause.

It was implemented using a temporary stack variable that can be
accessed by all threads, a sync variable. A single thread writes the
value of its private variable to this temporary, at the end of the
single region. After the single operation, all threads then read
from the sync variable and write its value to their private copies.

Fixes #63933


Patch is 42.64 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73128.diff

6 Files Affected:

  • (modified) flang/include/flang/Lower/AbstractConverter.h (+23)
  • (modified) flang/lib/Lower/Bridge.cpp (+257-173)
  • (modified) flang/lib/Lower/OpenMP.cpp (+70-5)
  • (modified) flang/lib/Semantics/resolve-directives.cpp (+2-1)
  • (removed) flang/test/Lower/OpenMP/Todo/copyprivate.f90 (-13)
  • (added) flang/test/Lower/OpenMP/copyprivate.f90 (+244)
diff --git a/flang/include/flang/Lower/AbstractConverter.h b/flang/include/flang/Lower/AbstractConverter.h
index 980fde881373249..bb182812c54132f 100644
--- a/flang/include/flang/Lower/AbstractConverter.h
+++ b/flang/include/flang/Lower/AbstractConverter.h
@@ -51,6 +51,7 @@ class DerivedTypeSpec;
 } // namespace semantics
 
 namespace lower {
+struct SymbolBox;
 class SymMap;
 namespace pft {
 struct Variable;
@@ -111,13 +112,35 @@ class AbstractConverter {
   virtual bool
   createHostAssociateVarClone(const Fortran::semantics::Symbol &sym) = 0;
 
+  /// For a given symbol which may not be host-associated, create a clone using
+  /// parameters from the symbol or from the host-associated symbol, if any.
+  /// This member function does not insert the clone in the symbol table and
+  /// does not initialize it.
+  virtual Fortran::lower::SymbolBox
+  createVarClone(const Fortran::semantics::Symbol &sym) = 0;
+
+  /// Initialize a previously created clone.
+  virtual void initVarClone(const Fortran::semantics::Symbol &sym,
+                            const Fortran::lower::SymbolBox &clone) = 0;
+
   virtual void
   createHostAssociateVarCloneDealloc(const Fortran::semantics::Symbol &sym) = 0;
 
+  virtual void createVarCloneDealloc(const Fortran::semantics::Symbol &sym,
+                                     Fortran::lower::SymbolBox &sb) = 0;
+
   virtual void copyHostAssociateVar(
       const Fortran::semantics::Symbol &sym,
       mlir::OpBuilder::InsertPoint *copyAssignIP = nullptr) = 0;
 
+  virtual void copyVar(const Fortran::semantics::Symbol &dst,
+                       const Fortran::lower::SymbolBox &src,
+                       bool needBarrier = false) = 0;
+
+  virtual void copyVar(const Fortran::lower::SymbolBox &dst,
+                       const Fortran::semantics::Symbol &src,
+                       bool needBarrier = false) = 0;
+
   /// For a given symbol, check if it is present in the inner-most
   /// level of the symbol map.
   virtual bool isPresentShallowLookup(Fortran::semantics::Symbol &sym) = 0;
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index 872bf6bc729ecd0..0cb43bb67a2a964 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -53,6 +53,7 @@
 #include "flang/Semantics/symbol.h"
 #include "flang/Semantics/tools.h"
 #include "mlir/Dialect/ControlFlow/IR/ControlFlowOps.h"
+#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
 #include "mlir/IR/PatternMatch.h"
 #include "mlir/Parser/Parser.h"
 #include "mlir/Transforms/RegionUtils.h"
@@ -609,125 +610,41 @@ class FirConverter : public Fortran::lower::AbstractConverter {
 
   bool createHostAssociateVarClone(
       const Fortran::semantics::Symbol &sym) override final {
-    mlir::Location loc = genLocation(sym.name());
-    mlir::Type symType = genType(sym);
-    const auto *details = sym.detailsIf<Fortran::semantics::HostAssocDetails>();
-    assert(details && "No host-association found");
-    const Fortran::semantics::Symbol &hsym = details->symbol();
-    mlir::Type hSymType = genType(hsym);
-    Fortran::lower::SymbolBox hsb = lookupSymbol(hsym);
-
-    auto allocate = [&](llvm::ArrayRef<mlir::Value> shape,
-                        llvm::ArrayRef<mlir::Value> typeParams) -> mlir::Value {
-      mlir::Value allocVal = builder->allocateLocal(
-          loc,
-          Fortran::semantics::IsAllocatableOrObjectPointer(&hsym.GetUltimate())
-              ? hSymType
-              : symType,
-          mangleName(sym), toStringRef(sym.GetUltimate().name()),
-          /*pinned=*/true, shape, typeParams,
-          sym.GetUltimate().attrs().test(Fortran::semantics::Attr::TARGET));
-      return allocVal;
-    };
-
-    fir::ExtendedValue hexv = symBoxToExtendedValue(hsb);
-    fir::ExtendedValue exv = hexv.match(
-        [&](const fir::BoxValue &box) -> fir::ExtendedValue {
-          const Fortran::semantics::DeclTypeSpec *type = sym.GetType();
-          if (type && type->IsPolymorphic())
-            TODO(loc, "create polymorphic host associated copy");
-          // Create a contiguous temp with the same shape and length as
-          // the original variable described by a fir.box.
-          llvm::SmallVector<mlir::Value> extents =
-              fir::factory::getExtents(loc, *builder, hexv);
-          if (box.isDerivedWithLenParameters())
-            TODO(loc, "get length parameters from derived type BoxValue");
-          if (box.isCharacter()) {
-            mlir::Value len = fir::factory::readCharLen(*builder, loc, box);
-            mlir::Value temp = allocate(extents, {len});
-            return fir::CharArrayBoxValue{temp, len, extents};
-          }
-          return fir::ArrayBoxValue{allocate(extents, {}), extents};
-        },
-        [&](const fir::MutableBoxValue &box) -> fir::ExtendedValue {
-          // Allocate storage for a pointer/allocatble descriptor.
-          // No shape/lengths to be passed to the alloca.
-          return fir::MutableBoxValue(allocate({}, {}), {}, {});
-        },
-        [&](const auto &) -> fir::ExtendedValue {
-          mlir::Value temp =
-              allocate(fir::factory::getExtents(loc, *builder, hexv),
-                       fir::factory::getTypeParams(loc, *builder, hexv));
-          return fir::substBase(hexv, temp);
-        });
-
-    // Initialise cloned allocatable
-    hexv.match(
-        [&](const fir::MutableBoxValue &box) -> void {
-          // Do not process pointers
-          if (Fortran::semantics::IsPointer(sym.GetUltimate())) {
-            return;
-          }
-          // Allocate storage for a pointer/allocatble descriptor.
-          // No shape/lengths to be passed to the alloca.
-          const auto new_box = exv.getBoxOf<fir::MutableBoxValue>();
+    assert(sym.detailsIf<Fortran::semantics::HostAssocDetails>() &&
+           "No host-association found");
+    fir::ExtendedValue exv = cloneSymbolValue(sym);
+    fir::ExtendedValue oexv = symBoxToExtendedValue(getOriginalSymbolBox(sym));
+    initClonedValue(sym, exv, oexv);
+    return bindIfNewSymbol(sym, exv);
+  }
 
-          // allocate if allocated
-          mlir::Value isAllocated =
-              fir::factory::genIsAllocatedOrAssociatedTest(*builder, loc, box);
-          auto if_builder = builder->genIfThenElse(loc, isAllocated);
-          if_builder.genThen([&]() {
-            std::string name = mangleName(sym) + ".alloc";
-            if (auto seqTy = symType.dyn_cast<fir::SequenceType>()) {
-              fir::ExtendedValue read = fir::factory::genMutableBoxRead(
-                  *builder, loc, box, /*mayBePolymorphic=*/false);
-              if (auto read_arr_box = read.getBoxOf<fir::ArrayBoxValue>()) {
-                fir::factory::genInlinedAllocation(
-                    *builder, loc, *new_box, read_arr_box->getLBounds(),
-                    read_arr_box->getExtents(),
-                    /*lenParams=*/std::nullopt, name,
-                    /*mustBeHeap=*/true);
-              } else if (auto read_char_arr_box =
-                             read.getBoxOf<fir::CharArrayBoxValue>()) {
-                fir::factory::genInlinedAllocation(
-                    *builder, loc, *new_box, read_char_arr_box->getLBounds(),
-                    read_char_arr_box->getExtents(),
-                    read_char_arr_box->getLen(), name,
-                    /*mustBeHeap=*/true);
-              } else {
-                TODO(loc, "Unhandled allocatable box type");
-              }
-            } else {
-              fir::factory::genInlinedAllocation(
-                  *builder, loc, *new_box, box.getMutableProperties().lbounds,
-                  box.getMutableProperties().extents,
-                  box.nonDeferredLenParams(), name,
-                  /*mustBeHeap=*/true);
-            }
-          });
-          if_builder.genElse([&]() {
-            // nullify box
-            auto empty = fir::factory::createUnallocatedBox(
-                *builder, loc, new_box->getBoxTy(),
-                new_box->nonDeferredLenParams(), {});
-            builder->create<fir::StoreOp>(loc, empty, new_box->getAddr());
-          });
-          if_builder.end();
-        },
-        [&](const auto &) -> void {
-          // Do nothing
-        });
+  Fortran::lower::SymbolBox
+  createVarClone(const Fortran::semantics::Symbol &sym) override final {
+    fir::ExtendedValue exv = cloneSymbolValue(sym);
+    Fortran::lower::SymMap symMap;
+    addSymbol(sym, exv, /*forced=*/true, symMap);
+    return symMap.shallowLookupSymbol(sym);
+  }
 
-    return bindIfNewSymbol(sym, exv);
+  void initVarClone(const Fortran::semantics::Symbol &sym,
+                    const Fortran::lower::SymbolBox &clone) override final {
+    fir::ExtendedValue exv = symBoxToExtendedValue(clone);
+    fir::ExtendedValue oexv = symBoxToExtendedValue(getOriginalSymbolBox(sym));
+    initClonedValue(sym, exv, oexv);
   }
 
   void createHostAssociateVarCloneDealloc(
       const Fortran::semantics::Symbol &sym) override final {
-    mlir::Location loc = genLocation(sym.name());
     Fortran::lower::SymbolBox hsb = lookupSymbol(sym);
+    createVarCloneDealloc(sym, hsb);
+  }
+
+  void createVarCloneDealloc(const Fortran::semantics::Symbol &sym,
+                             Fortran::lower::SymbolBox &sb) override final {
+    mlir::Location loc = genLocation(sym.name());
 
-    fir::ExtendedValue hexv = symBoxToExtendedValue(hsb);
-    hexv.match(
+    fir::ExtendedValue exv = symBoxToExtendedValue(sb);
+    exv.match(
         [&](const fir::MutableBoxValue &new_box) -> void {
           // Do not process pointers
           if (Fortran::semantics::IsPointer(sym.GetUltimate())) {
@@ -741,6 +658,20 @@ class FirConverter : public Fortran::lower::AbstractConverter {
         });
   }
 
+  void copyVar(const Fortran::semantics::Symbol &dst,
+               const Fortran::lower::SymbolBox &src,
+               bool needBarrier = false) override final {
+    Fortran::lower::SymbolBox dst_sb = lookupSymbol(dst);
+    copyVar(dst, dst_sb, src, needBarrier);
+  }
+
+  void copyVar(const Fortran::lower::SymbolBox &dst,
+               const Fortran::semantics::Symbol &src,
+               bool needBarrier = false) override final {
+    Fortran::lower::SymbolBox src_sb = lookupSymbol(src);
+    copyVar(src, dst, src_sb, needBarrier);
+  }
+
   void copyHostAssociateVar(
       const Fortran::semantics::Symbol &sym,
       mlir::OpBuilder::InsertPoint *copyAssignIP = nullptr) override final {
@@ -775,64 +706,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
       rhs_sb = &hsb;
     }
 
-    mlir::Location loc = genLocation(sym.name());
-
-    if (lowerToHighLevelFIR()) {
-      hlfir::Entity lhs{lhs_sb->getAddr()};
-      hlfir::Entity rhs{rhs_sb->getAddr()};
-      // Temporary_lhs is set to true in hlfir.assign below to avoid user
-      // assignment to be used and finalization to be called on the LHS.
-      // This may or may not be correct but mimics the current behaviour
-      // without HLFIR.
-      auto copyData = [&](hlfir::Entity l, hlfir::Entity r) {
-        // Dereference RHS and load it if trivial scalar.
-        r = hlfir::loadTrivialScalar(loc, *builder, r);
-        builder->create<hlfir::AssignOp>(
-            loc, r, l,
-            /*isWholeAllocatableAssignment=*/false,
-            /*keepLhsLengthInAllocatableAssignment=*/false,
-            /*temporary_lhs=*/true);
-      };
-      if (lhs.isAllocatable()) {
-        // Deep copy allocatable if it is allocated.
-        // Note that when allocated, the RHS is already allocated with the LHS
-        // shape for copy on entry in createHostAssociateVarClone.
-        // For lastprivate, this assumes that the RHS was not reallocated in
-        // the OpenMP region.
-        lhs = hlfir::derefPointersAndAllocatables(loc, *builder, lhs);
-        mlir::Value addr = hlfir::genVariableRawAddress(loc, *builder, lhs);
-        mlir::Value isAllocated = builder->genIsNotNullAddr(loc, addr);
-        builder->genIfThen(loc, isAllocated)
-            .genThen([&]() {
-              // Copy the DATA, not the descriptors.
-              copyData(lhs, rhs);
-            })
-            .end();
-      } else if (lhs.isPointer()) {
-        // Set LHS target to the target of RHS (do not copy the RHS
-        // target data into the LHS target storage).
-        auto loadVal = builder->create<fir::LoadOp>(loc, rhs);
-        builder->create<fir::StoreOp>(loc, loadVal, lhs);
-      } else {
-        // Non ALLOCATABLE/POINTER variable. Simple DATA copy.
-        copyData(lhs, rhs);
-      }
-    } else {
-      fir::ExtendedValue lhs = symBoxToExtendedValue(*lhs_sb);
-      fir::ExtendedValue rhs = symBoxToExtendedValue(*rhs_sb);
-      mlir::Type symType = genType(sym);
-      if (auto seqTy = symType.dyn_cast<fir::SequenceType>()) {
-        Fortran::lower::StatementContext stmtCtx;
-        Fortran::lower::createSomeArrayAssignment(*this, lhs, rhs, localSymbols,
-                                                  stmtCtx);
-        stmtCtx.finalizeAndReset();
-      } else if (lhs.getBoxOf<fir::CharBoxValue>()) {
-        fir::factory::CharacterExprHelper{*builder, loc}.createAssign(lhs, rhs);
-      } else {
-        auto loadVal = builder->create<fir::LoadOp>(loc, fir::getBase(rhs));
-        builder->create<fir::StoreOp>(loc, loadVal, fir::getBase(lhs));
-      }
-    }
+    copyVar(sym, *lhs_sb, *rhs_sb);
 
     if (copyAssignIP && copyAssignIP->isSet() &&
         sym.test(Fortran::semantics::Symbol::Flag::OmpLastPrivate)) {
@@ -1075,16 +949,226 @@ class FirConverter : public Fortran::lower::AbstractConverter {
                  fir::ExtendedValue val, bool forced = false) {
     if (!forced && lookupSymbol(sym))
       return false;
+    return addSymbol(sym, val, forced, localSymbols);
+  }
+
+  /// Add the symbol to \p symMap.
+  /// Always returns `true`.
+  bool addSymbol(const Fortran::semantics::SymbolRef sym,
+                 fir::ExtendedValue val, bool forced,
+                 Fortran::lower::SymMap &symMap) {
     if (lowerToHighLevelFIR()) {
-      Fortran::lower::genDeclareSymbol(*this, localSymbols, sym, val,
-                                       fir::FortranVariableFlagsEnum::None,
-                                       forced);
+      Fortran::lower::genDeclareSymbol(
+          *this, symMap, sym, val, fir::FortranVariableFlagsEnum::None, forced);
     } else {
-      localSymbols.addSymbol(sym, val, forced);
+      symMap.addSymbol(sym, val, forced);
     }
     return true;
   }
 
+  void initClonedValue(const Fortran::semantics::Symbol &sym,
+                       const fir::ExtendedValue &clone,
+                       const fir::ExtendedValue &orig) {
+    mlir::Location loc = genLocation(sym.name());
+    mlir::Type symType = genType(sym);
+    // The type of a non host associated symbol may be wrapped inside a box.
+    if (!sym.detailsIf<Fortran::semantics::HostAssocDetails>()) {
+      if (mlir::Type seqType = fir::unwrapUntilSeqType(symType))
+        symType = seqType;
+    }
+
+    // Initialise cloned allocatable
+    orig.match(
+        [&](const fir::MutableBoxValue &box) -> void {
+          // Do not process pointers
+          if (Fortran::semantics::IsPointer(sym.GetUltimate())) {
+            return;
+          }
+          // Allocate storage for a pointer/allocatble descriptor.
+          // No shape/lengths to be passed to the alloca.
+          const auto new_box = clone.getBoxOf<fir::MutableBoxValue>();
+
+          // allocate if allocated
+          mlir::Value isAllocated =
+              fir::factory::genIsAllocatedOrAssociatedTest(*builder, loc, box);
+          auto if_builder = builder->genIfThenElse(loc, isAllocated);
+          if_builder.genThen([&]() {
+            std::string name = mangleName(sym) + ".alloc";
+            if (auto seqTy = symType.dyn_cast<fir::SequenceType>()) {
+              fir::ExtendedValue read = fir::factory::genMutableBoxRead(
+                  *builder, loc, box, /*mayBePolymorphic=*/false);
+              if (auto read_arr_box = read.getBoxOf<fir::ArrayBoxValue>()) {
+                fir::factory::genInlinedAllocation(
+                    *builder, loc, *new_box, read_arr_box->getLBounds(),
+                    read_arr_box->getExtents(),
+                    /*lenParams=*/std::nullopt, name,
+                    /*mustBeHeap=*/true);
+              } else if (auto read_char_arr_box =
+                             read.getBoxOf<fir::CharArrayBoxValue>()) {
+                fir::factory::genInlinedAllocation(
+                    *builder, loc, *new_box, read_char_arr_box->getLBounds(),
+                    read_char_arr_box->getExtents(),
+                    read_char_arr_box->getLen(), name,
+                    /*mustBeHeap=*/true);
+              } else {
+                TODO(loc, "Unhandled allocatable box type");
+              }
+            } else {
+              fir::factory::genInlinedAllocation(
+                  *builder, loc, *new_box, box.getMutableProperties().lbounds,
+                  box.getMutableProperties().extents,
+                  box.nonDeferredLenParams(), name,
+                  /*mustBeHeap=*/true);
+            }
+          });
+          if_builder.genElse([&]() {
+            // nullify box
+            auto empty = fir::factory::createUnallocatedBox(
+                *builder, loc, new_box->getBoxTy(),
+                new_box->nonDeferredLenParams(), {});
+            builder->create<fir::StoreOp>(loc, empty, new_box->getAddr());
+          });
+          if_builder.end();
+        },
+        [&](const auto &) -> void {
+          // Do nothing
+        });
+  }
+
+  Fortran::lower::SymbolBox
+  getOriginalSymbolBox(const Fortran::semantics::Symbol &sym) {
+    const auto *details = sym.detailsIf<Fortran::semantics::HostAssocDetails>();
+    if (details) {
+      const Fortran::semantics::Symbol &hsym = details->symbol();
+      return lookupSymbol(hsym);
+    }
+    return lookupSymbol(sym);
+  }
+
+  fir::ExtendedValue cloneSymbolValue(const Fortran::semantics::Symbol &sym) {
+    mlir::Location loc = genLocation(sym.name());
+    mlir::Type symType = genType(sym);
+    mlir::Type allocType = symType;
+    const auto *details = sym.detailsIf<Fortran::semantics::HostAssocDetails>();
+    if (details) {
+      const Fortran::semantics::Symbol &hsym = details->symbol();
+      if (Fortran::semantics::IsAllocatableOrObjectPointer(&hsym.GetUltimate()))
+        allocType = genType(hsym);
+    }
+    Fortran::lower::SymbolBox sb = getOriginalSymbolBox(sym);
+
+    auto allocate = [&](llvm::ArrayRef<mlir::Value> shape,
+                        llvm::ArrayRef<mlir::Value> typeParams) -> mlir::Value {
+      mlir::Value allocVal = builder->allocateLocal(
+          loc, allocType, mangleName(sym),
+          toStringRef(sym.GetUltimate().name()),
+          /*pinned=*/true, shape, typeParams,
+          sym.GetUltimate().attrs().test(Fortran::semantics::Attr::TARGET));
+      return allocVal;
+    };
+
+    fir::ExtendedValue oexv = symBoxToExtendedValue(sb);
+    fir::ExtendedValue exv = oexv.match(
+        [&](const fir::BoxValue &box) -> fir::ExtendedValue {
+          const Fortran::semantics::DeclTypeSpec *type = sym.GetType();
+          if (type && type->IsPolymorphic())
+            TODO(loc, "create polymorphic copy");
+          // Create a contiguous temp with the same shape and length as
+          // the original variable described by a fir.box.
+          llvm::SmallVector<mlir::Value> extents =
+              fir::factory::getExtents(loc, *builder, oexv);
+          if (box.isDerivedWithLenParameters())
+            TODO(loc, "get length parameters from derived type BoxValue");
+          if (box.isCharacter()) {
+            mlir::Value len = fir::factory::readCharLen(*builder, loc, box);
+            mlir::Value temp = allocate(extents, {len});
+            return fir::CharArrayBoxValue{temp, len, extents};
+          }
+          return fir::ArrayBoxValue{allocate(extents, {}), extents};
+        },
+        [&](const fir::MutableBoxValue &box) -> fir::ExtendedValue {
+          // Allocate storage for a pointer/allocatble descriptor.
+          // No shape/lengths to be passed to the alloca.
+          return fir::MutableBoxValue(allocate({}, {}), {}, {});
+        },
+        [&](const auto &) -> fir::ExtendedValue {
+          mlir::...
[truncated]

@luporl
Copy link
Contributor Author

luporl commented Nov 22, 2023

Some refactoring was needed in Bridge.cpp, especially in createHostAssociateVarClone()
and copyHostAssociateVar(). It was needed to be able to clone and copy variables that may
not be host associated. But I guess the refactoring was not done in the best way, so any
suggestion of how to improve it is welcome.

Also, this implementation considers only single operations inside (or not) a parallel region.
If there are other OpenMP directives that provide parallelism without using parallel, where
single can be used, these will need to be handled later.

@kiranchandramohan
Copy link
Contributor

kiranchandramohan commented Nov 24, 2023

@luporl I see that clang uses a runtime call to implement copyprivate. And that runtime call internally has a barrier.
https://openmp.llvm.org/doxygen/group__THREADPRIVATE.html#ga1453eca6136fd77e5de88ea0e78cc7a4

Have you explored using the runtime call?

In general, there is a desire to capture the privatisation and data-copying closes in the OpenMP dialect and then lower them later when we interface with the OpenMP IRBuilder. We did not do this initially to minimize any changes from the default FIR flow, lack of experience, and also the fact that privatisation in Parallel construct did not need a runtime call or any special construction. This will change with tasks (which need all the firstprivate variables to be allocated in the task data-structure). But copyprivate also could be a good candidate to initiate capturing this information in the OpenMP dialect and then lowering it later using the OpenMPIRBuilder since it can make use of the kmpc_copyprivate runtime call and also because it is only applicable to one construct.

Let me know what your thoughts are on this.

@kiranchandramohan
Copy link
Contributor

Also support for copyprivate was added to the OpenMPIRBuilder in the following patch.
https://reviews.llvm.org/D85617

Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit rusty with OpenMP, but how do you ensure that the sync variable is shared by all the threads when this is a single in a subprogram without a parallel?

I would expect the alloca for the buffer to be "private" to each threads that entered the subprogram from a call in some parallel region (so unusable to broadcast the value to all threads).

e.g: do all threads print 1 in the program below?

subroutine foo(i)
 !$omp single
 i = 1
 !$omp end single copyprivate(i)
end subroutine

  integer, save :: i = 0
  !$omp threadprivate(i)
  !$omp parallel
  call foo(i)
  !$omp critical
  print *, i
  !$omp end critical 
  !$omp end parallel
end

From a design point of view, would there be advantages to keep the copyprivate implementation abstract in the IR (e.g, to not make the buffer/barriers/copies explicit yet)?

@clementval
Copy link
Contributor

From a design point of view, would there be advantages to keep the copyprivate implementation abstract in the IR (e.g, to not make the buffer/barriers/copies explicit yet)?

+1
I think introducing abstraction in the OpenMP dialect and then have passes to lower that further down would be nice.

@kiranchandramohan
Copy link
Contributor

I am a bit rusty with OpenMP, but how do you ensure that the sync variable is shared by all the threads when this is a single in a subprogram without a parallel?

I would expect the alloca for the buffer to be "private" to each threads that entered the subprogram from a call in some parallel region (so unusable to broadcast the value to all threads).

I think @jeanPerier is right here, storing the sync variable on the stack is probably not sufficient. I guess a global or support from runtime is required. This is probably the reason that the kmpc_copyprivate runtime exists.

__kmpc_copyprivate implements the interface for the private data broadcast

would there be advantages to keep the copyprivate implementation abstract in the IR (e.g, to not make the buffer/barriers/copies explicit yet)?

One advantage is that it will help make it possible to handle copyprivate with a call to the runtime.

I think two options for this are :

  1. Since the copyprivate clause is only applicable to the single construct we could consider making the omp.single operation an operation with two regions. The semantics being that the first region is executed only by the single thread and all other threads execute the copy region after the single region is executed. The copy region can later become the copy function used in the kmpc_copyprivate runtime call.
omp.single copyprivate(x1,x2) {
 x1 = bval1
 x2 = bval2
} copy  {
bb0(tmp1,tmp2): !tmp1 and tmp2 are block arguments corresponding to the sync variables of x1 and x2
   x1 = tmp1
   x2 = tmp2
}
  1. Model it like the OpenMP reduction/OpenACC reduction + privatisation with a declare/recipe function that can later be used to generate the copy function that is needed in the kmpc_copyprivate runtime call.
omp.single copyprivate(x1:copy_x1,x2:copy_x2) {
 x1 = bval1
 x2 = bval2
}

@luporl
Copy link
Contributor Author

luporl commented Nov 27, 2023

I am a bit rusty with OpenMP, but how do you ensure that the sync variable is shared by all the threads when this is a single in a subprogram without a parallel?

I would expect the alloca for the buffer to be "private" to each threads that entered the subprogram from a call in some parallel region (so unusable to broadcast the value to all threads).

e.g: do all threads print 1 in the program below?

subroutine foo(i)
 !$omp single
 i = 1
 !$omp end single copyprivate(i)
end subroutine

  integer, save :: i = 0
  !$omp threadprivate(i)
  !$omp parallel
  call foo(i)
  !$omp critical
  print *, i
  !$omp end critical 
  !$omp end parallel
end

This program doesn't compile:

test.f90:4:31: error: COPYPRIVATE variable 'i' is not PRIVATE or THREADPRIVATE in outer context
   !$omp end single copyprivate(i)

Making i threadprivate results in:

test.f90:2:22: error: A variable that appears in a THREADPRIVATE directive must be declared in the scope of a module or have the SAVE attribute, either explicitly or implicitly
   !$omp threadprivate(i)

And it is not possible to add the SAVE attribute to a dummy argument. But the following program shows the issue you are mentioning:

program main
  implicit none
  integer, save :: i = 0
  !$omp threadprivate(i)

  !$omp parallel
  call foo
  !$omp critical
  print *, i
  !$omp end critical
  !$omp end parallel

contains
  subroutine foo()
    !$omp single
    i = 1
    !$omp end single copyprivate(i)
  end subroutine
end program

Only one thread prints 1, all the others print 0. In this case the sync variable will indeed end up in a "private" thread stack, instead of the "shared" stack. So with this approach a global sync variable would be needed instead. Thanks for catching this.

@luporl
Copy link
Contributor Author

luporl commented Nov 27, 2023

2. Model it like the OpenMP reduction/OpenACC reduction + privatisation with a declare/recipe function that can later be used to generate the copy function that is needed in the kmpc_copyprivate runtime call.
omp.single copyprivate(x1:copy_x1,x2:copy_x2) {
 x1 = bval1
 x2 = bval2
}

Would copy_x1 and copy_x2 be internal generated functions that perform a copy between 2 memory locations, as required by kmpc_copyprivate's cpy_func? I'm not familiar with OpenMP reduction implementation.

In this case, would it be possible, or desirable, to have a runtime function that could perform a copy between any 2 variables? This could eliminate the need of having several generated copy functions.

If Assign can do this already, then it would be a matter creating a wrapper that would obtain the missing arguments somehow and call it. An idea would be to pass the data to be copied (cpy_data) as a pointer inside a struct, that would then hold the arguments in the other fields, but this could introduce some inefficiencies. If kmpc_copyprivate's cpy_func could be modified to accept an extra "context" argument, it would be better.

@kiranchandramohan
Copy link
Contributor

Would copy_x1 and copy_x2 be internal generated functions that perform a copy between 2 memory locations, as required by kmpc_copyprivate's cpy_func? I'm not familiar with OpenMP reduction implementation.

Yes, they would be generated function like operations that perform copies. There will probably be as many versions as the various types, kinds etc. This will have to be modified to whatever cpy_func expects.

copy_integer_declare {
  ^bb0(%arg0: !fir.ref<i32>, %arg1: !fir.ref<i32>):
    %0 = fir.load %arg0 : !fir.ref<i32>
    fir.store %0 to %arg1 : !fir.ref<i32>
    omp.terminator
  }

In this case, would it be possible, or desirable, to have a runtime function that could perform a copy between any 2 variables?

There already a runtime Assign function but I think this operates on descriptors (

RT_API_ATTRS static void Assign(
). So I guess the disadvantage here might be that we will always need descriptors.

This could eliminate the need to have several generated copy functions.
We would either be generating code that inlines the copy or creating function-like operations that does them. If there are a lot of copy-private members of the same type then they could all just point to the same copy declare operation.

@luporl
Copy link
Contributor Author

luporl commented Nov 28, 2023

There already a runtime Assign function but I think this operates on descriptors (

RT_API_ATTRS static void Assign(

). So I guess the disadvantage here might be that we will always need descriptors.

Ok, then I guess it's better to worry about generating multiple copy functions later, if this really becomes an issue.
I'm thinking about trying to implement option 2 and see if everything works properly. What do you think?

@kiranchandramohan
Copy link
Contributor

There already a runtime Assign function but I think this operates on descriptors (

RT_API_ATTRS static void Assign(

). So I guess the disadvantage here might be that we will always need descriptors.

Ok, then I guess it's better to worry about generating multiple copy functions later, if this really becomes an issue. I'm thinking about trying to implement option 2 and see if everything works properly. What do you think?

That would be great. Thanks in advance for trying this.

luporl added a commit to luporl/llvm-project that referenced this pull request Feb 2, 2024
This adds a new custom CopyPrivateVarList to the single operation.
Each list item is formed by a reference to the variable to be
updated, its type and the function to be used to perform the copy.

It will be translated to LLVM IR using OpenMP builder, that will
use the information in the copyprivate list to call
__kmpc_copyprivate.

This is patch 2 of 4, to add support for COPYPRIVATE in Flang.
Original PR: llvm#73128
luporl added a commit to luporl/llvm-project that referenced this pull request Feb 2, 2024
This is patch 1 of 4, to add support for COPYPRIVATE.
Original PR: llvm#73128
luporl added a commit to luporl/llvm-project that referenced this pull request Feb 2, 2024
This adds a new custom CopyPrivateVarList to the single operation.
Each list item is formed by a reference to the variable to be
updated, its type and the function to be used to perform the copy.

It will be translated to LLVM IR using OpenMP builder, that will
use the information in the copyprivate list to call
__kmpc_copyprivate.

This is patch 2 of 4, to add support for COPYPRIVATE in Flang.
Original PR: llvm#73128
luporl added a commit to luporl/llvm-project that referenced this pull request Feb 2, 2024
Add initial handling of OpenMP copyprivate clause in Flang.

When lowering copyprivate, Flang generates the copy function
needed by each variable and builds the appropriate
omp.single's CopyPrivateVarList.

This is patch 3 of 4, to add support for COPYPRIVATE in Flang.
Original PR: llvm#73128
luporl added a commit to luporl/llvm-project that referenced this pull request Feb 2, 2024
Use the new copyprivate list from omp.single to emit calls to
__kmpc_copyprivate, during the creation of the single operation
in OMPIRBuilder.

This is patch 4 of 4, to add support for COPYPRIVATE in Flang.
Original PR: llvm#73128
luporl added a commit to luporl/llvm-project that referenced this pull request Feb 2, 2024
This patch must land only after the 4 other patches that add
support for copyprivate in Flang land. Added as a separate PR to
prevent the OpenMPIRBuilder patch from depending on the Flang
patch.
Original PR: llvm#73128
luporl added a commit that referenced this pull request Feb 6, 2024
This is patch 1 of 4, to add support for COPYPRIVATE.
Original PR: #73128
luporl added a commit to luporl/llvm-project that referenced this pull request Feb 6, 2024
This adds a new custom CopyPrivateVarList to the single operation.
Each list item is formed by a reference to the variable to be
updated, its type and the function to be used to perform the copy.

It will be translated to LLVM IR using OpenMP builder, that will
use the information in the copyprivate list to call
__kmpc_copyprivate.

This is patch 2 of 4, to add support for COPYPRIVATE in Flang.
Original PR: llvm#73128
luporl added a commit to luporl/llvm-project that referenced this pull request Feb 7, 2024
This adds a new custom CopyPrivateVarList to the single operation.
Each list item is formed by a reference to the variable to be
updated, its type and the function to be used to perform the copy.

It will be translated to LLVM IR using OpenMP builder, that will
use the information in the copyprivate list to call
__kmpc_copyprivate.

This is patch 2 of 4, to add support for COPYPRIVATE in Flang.
Original PR: llvm#73128
luporl added a commit that referenced this pull request Feb 15, 2024
This adds a new custom CopyPrivateVarList to the single operation.
Each list item is formed by a reference to the variable to be
updated, its type and the function to be used to perform the copy.

It will be translated to LLVM IR using OpenMP builder, that will
use the information in the copyprivate list to call
__kmpc_copyprivate.

This is patch 2 of 4, to add support for COPYPRIVATE in Flang.
Original PR: #73128
luporl added a commit to luporl/llvm-project that referenced this pull request Feb 16, 2024
Use the new copyprivate list from omp.single to emit calls to
__kmpc_copyprivate, during the creation of the single operation
in OMPIRBuilder.

This is patch 4 of 4, to add support for COPYPRIVATE in Flang.
Original PR: llvm#73128
luporl added a commit to luporl/llvm-project that referenced this pull request Feb 16, 2024
Add initial handling of OpenMP copyprivate clause in Flang.

When lowering copyprivate, Flang generates the copy function
needed by each variable and builds the appropriate
omp.single's CopyPrivateVarList.

This is patch 3 of 4, to add support for COPYPRIVATE in Flang.
Original PR: llvm#73128
luporl added a commit to luporl/llvm-project that referenced this pull request Feb 21, 2024
Add initial handling of OpenMP copyprivate clause in Flang.

When lowering copyprivate, Flang generates the copy function
needed by each variable and builds the appropriate
omp.single's CopyPrivateVarList.

This is patch 3 of 4, to add support for COPYPRIVATE in Flang.
Original PR: llvm#73128
luporl added a commit that referenced this pull request Feb 21, 2024
Add initial handling of OpenMP copyprivate clause in Flang.

When lowering copyprivate, Flang generates the copy function
needed by each variable and builds the appropriate
omp.single's CopyPrivateVarList.

This is patch 3 of 4, to add support for COPYPRIVATE in Flang.
Original PR: #73128
@luporl
Copy link
Contributor Author

luporl commented Feb 22, 2024

Closing this PR now, as it was splitted into separate PRs that reference this one.

@luporl luporl closed this Feb 22, 2024
@luporl luporl deleted the luporl-copypriv branch February 22, 2024 20:32
luporl added a commit to luporl/llvm-project that referenced this pull request Feb 22, 2024
Use the new copyprivate list from omp.single to emit calls to
__kmpc_copyprivate, during the creation of the single operation
in OMPIRBuilder.

This is patch 4 of 4, to add support for COPYPRIVATE in Flang.
Original PR: llvm#73128
luporl added a commit that referenced this pull request Feb 28, 2024
Use the new copyprivate list from omp.single to emit calls to
__kmpc_copyprivate, during the creation of the single operation
in OMPIRBuilder.

This is patch 4 of 4, to add support for COPYPRIVATE in Flang.
Original PR: #73128
@harishch4
Copy link
Contributor

Only one thread prints 1, all the others print 0. In this case the sync variable will indeed end up in a "private" thread stack, instead of the "shared" stack. So with this approach a global sync variable would be needed instead. Thanks for catching this.

Hi @luporl Is this issue resolved? I see it is still giving the wrong output. could you please confirm?

@luporl
Copy link
Contributor Author

luporl commented Mar 8, 2024

Only one thread prints 1, all the others print 0. In this case the sync variable will indeed end up in a "private" thread stack, instead of the "shared" stack. So with this approach a global sync variable would be needed instead. Thanks for catching this.

Hi @luporl Is this issue resolved? I see it is still giving the wrong output. could you please confirm?

@harishch4 It should be resolved, since now we're using __kmpc_copyprivate instead, but I'm also getting the wrong output with a flang build from yesterday. I'll investigate this issue, thanks for reporting it.

@luporl
Copy link
Contributor Author

luporl commented Mar 8, 2024

It seems to me the problem is the threadprivate variable not being lowered as such inside subroutine foo, but the original i variable is being used instead.

I was able to make copyprivate work with the test program by manually adding omp.threadprivate to foo's generated FIR.

And the following test program, that doesn't use copyprivate nor single, also shows the same behavior:

program main
  implicit none
  integer, save :: i = 0
  !$omp threadprivate(i)

  !$omp parallel
  call foo
  !$omp critical
  print *, i
  !$omp end critical
  !$omp end parallel

contains
  subroutine foo()
    i = 1
  end subroutine
end program

The issue also happens if I wrap the i = 1 inside a parallel directive.
I'll create a new issue for this.

@luporl
Copy link
Contributor Author

luporl commented Mar 8, 2024

#84561

mylai-mtk pushed a commit to mylai-mtk/llvm-project that referenced this pull request Jul 12, 2024
…0488)

Use the new copyprivate list from omp.single to emit calls to
__kmpc_copyprivate, during the creation of the single operation
in OMPIRBuilder.

This is patch 4 of 4, to add support for COPYPRIVATE in Flang.
Original PR: llvm#73128
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:openmp OpenMP related changes to Clang flang:fir-hlfir flang:openmp flang:semantics flang Flang issues not falling into any other category mlir:llvm mlir:openmp mlir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Flang][OpenMP]Incorrect execution result of COPYPRIVATE clause specified in SINGLE construct
7 participants