-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[Flang][OpenMP] Minimize host ops remaining in device compilation #137200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: users/skatrak/map-rework-02-mark-host-device
Are you sure you want to change the base?
[Flang][OpenMP] Minimize host ops remaining in device compilation #137200
Conversation
This patch updates the function filtering OpenMP pass intended to remove host functions from the MLIR module created by Flang lowering when targeting an OpenMP target device. Host functions holding target regions must be kept, so that the target regions within them can be translated for the device. The issue is that non-target operations inside these functions cannot be discarded because some of them hold information that is also relevant during target device codegen. Specifically, mapping information resides outside of `omp.target` regions. This patch updates the previous behavior where all host operations were preserved to then ignore all of those that are not actually needed by target device codegen. This, in practice, means only keeping target regions and mapping information needed by the device. Arguments for some of these remaining operations are replaced by placeholder allocations and `fir.undefined`, since they are only actually defined inside of the target regions themselves. As a result, this set of changes makes it possible to later simplify target device codegen, as it is no longer necessary to handle host operations differently to avoid issues.
@llvm/pr-subscribers-flang-fir-hlfir @llvm/pr-subscribers-flang-openmp Author: Sergio Afonso (skatrak) ChangesThis patch updates the function filtering OpenMP pass intended to remove host functions from the MLIR module created by Flang lowering when targeting an OpenMP target device. Host functions holding target regions must be kept, so that the target regions within them can be translated for the device. The issue is that non-target operations inside these functions cannot be discarded because some of them hold information that is also relevant during target device codegen. Specifically, mapping information resides outside of This patch updates the previous behavior where all host operations were preserved to then ignore all of those that are not actually needed by target device codegen. This, in practice, means only keeping target regions and mapping information needed by the device. Arguments for some of these remaining operations are replaced by placeholder allocations and As a result, this set of changes makes it possible to later simplify target device codegen, as it is no longer necessary to handle host operations differently to avoid issues. Patch is 50.50 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/137200.diff 7 Files Affected:
diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.td b/flang/include/flang/Optimizer/OpenMP/Passes.td
index fcc7a4ca31fef..dcc97122efdf7 100644
--- a/flang/include/flang/Optimizer/OpenMP/Passes.td
+++ b/flang/include/flang/Optimizer/OpenMP/Passes.td
@@ -46,7 +46,8 @@ def FunctionFilteringPass : Pass<"omp-function-filtering"> {
"for the target device.";
let dependentDialects = [
"mlir::func::FuncDialect",
- "fir::FIROpsDialect"
+ "fir::FIROpsDialect",
+ "mlir::omp::OpenMPDialect"
];
}
diff --git a/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp b/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp
index 9554808824ac3..9e11df77506d6 100644
--- a/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp
+++ b/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp
@@ -13,12 +13,14 @@
#include "flang/Optimizer/Dialect/FIRDialect.h"
#include "flang/Optimizer/Dialect/FIROpsSupport.h"
+#include "flang/Optimizer/HLFIR/HLFIROps.h"
#include "flang/Optimizer/OpenMP/Passes.h"
#include "mlir/Dialect/Func/IR/FuncOps.h"
#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
#include "mlir/Dialect/OpenMP/OpenMPInterfaces.h"
#include "mlir/IR/BuiltinOps.h"
+#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallVector.h"
namespace flangomp {
@@ -94,6 +96,12 @@ class FunctionFilteringPass
funcOp.erase();
return WalkResult::skip();
}
+
+ if (failed(rewriteHostRegion(funcOp.getRegion()))) {
+ funcOp.emitOpError() << "could not be rewritten for target device";
+ return WalkResult::interrupt();
+ }
+
if (declareTargetOp)
declareTargetOp.setDeclareTarget(declareType,
omp::DeclareTargetCaptureClause::to);
@@ -101,5 +109,285 @@ class FunctionFilteringPass
return WalkResult::advance();
});
}
+
+private:
+ /// Add the given \c omp.map.info to a sorted set while taking into account
+ /// its dependencies.
+ static void collectMapInfos(omp::MapInfoOp mapOp, Region ®ion,
+ llvm::SetVector<omp::MapInfoOp> &mapInfos) {
+ for (Value member : mapOp.getMembers())
+ collectMapInfos(cast<omp::MapInfoOp>(member.getDefiningOp()), region,
+ mapInfos);
+
+ if (region.isAncestor(mapOp->getParentRegion()))
+ mapInfos.insert(mapOp);
+ }
+
+ /// Add the given value to a sorted set if it should be replaced by a
+ /// placeholder when used as a pointer-like argument to an operation
+ /// participating in the initialization of an \c omp.map.info.
+ static void markPtrOperandForRewrite(Value value,
+ llvm::SetVector<Value> &rewriteValues) {
+ // We don't need to rewrite operands if they are defined by block arguments
+ // of operations that will still remain after the region is rewritten.
+ if (isa<BlockArgument>(value) &&
+ isa<func::FuncOp, omp::TargetDataOp>(
+ cast<BlockArgument>(value).getOwner()->getParentOp()))
+ return;
+
+ rewriteValues.insert(value);
+ }
+
+ /// Rewrite the given host device region belonging to a function that contains
+ /// \c omp.target operations, to remove host-only operations that are not used
+ /// by device codegen.
+ ///
+ /// It is based on the expected form of the MLIR module as produced by Flang
+ /// lowering and it performs the following mutations:
+ /// - Replace all values returned by the function with \c fir.undefined.
+ /// - Operations taking map-like clauses (e.g. \c omp.target,
+ /// \c omp.target_data, etc) are moved to the end of the function. If they
+ /// are nested inside of any other operations, they are hoisted out of
+ /// them. If the region belongs to \c omp.target_data, these operations
+ /// are hoisted to its top level, rather than to the parent function.
+ /// - Only \c omp.map.info operations associated to these target regions are
+ /// preserved. These are moved above all \c omp.target and sorted to
+ /// satisfy dependencies among them.
+ /// - \c bounds arguments are removed from \c omp.map.info operations.
+ /// - \c var_ptr and \c var_ptr_ptr arguments of \c omp.map.info are
+ /// handled as follows:
+ /// - \c var_ptr_ptr is expected to be defined by a \c fir.box_offset
+ /// operation which is preserved. Otherwise, the pass will fail.
+ /// - \c var_ptr can be defined by an \c hlfir.declare which is also
+ /// preserved. If the \c var_ptr or \c hlfir.declare \c memref argument
+ /// is a \c fir.address_of operation, that operation is also maintained.
+ /// Otherwise, it is replaced by a placeholder \c fir.alloca and a
+ /// \c fir.convert or kept unmodified when it is defined by an entry
+ /// block argument. If it has \c shape or \c typeparams arguments, they
+ /// are also replaced by applicable constants. \c dummy_scope arguments
+ /// are discarded.
+ /// - Every other operation not located inside of an \c omp.target is
+ /// removed.
+ LogicalResult rewriteHostRegion(Region ®ion) {
+ // Extract parent op information.
+ auto [funcOp, targetDataOp] = [®ion]() {
+ Operation *parent = region.getParentOp();
+ return std::make_tuple(dyn_cast<func::FuncOp>(parent),
+ dyn_cast<omp::TargetDataOp>(parent));
+ }();
+ assert((bool)funcOp != (bool)targetDataOp &&
+ "region must be defined by either func.func or omp.target_data");
+
+ // Collect operations that have mapping information associated to them.
+ llvm::SmallVector<
+ std::variant<omp::TargetOp, omp::TargetDataOp, omp::TargetEnterDataOp,
+ omp::TargetExitDataOp, omp::TargetUpdateOp>>
+ targetOps;
+
+ WalkResult result = region.walk<WalkOrder::PreOrder>([&](Operation *op) {
+ // Skip the inside of omp.target regions, since these contain device code.
+ if (auto targetOp = dyn_cast<omp::TargetOp>(op)) {
+ targetOps.push_back(targetOp);
+ return WalkResult::skip();
+ }
+
+ if (auto targetOp = dyn_cast<omp::TargetDataOp>(op)) {
+ // Recursively rewrite omp.target_data regions as well.
+ if (failed(rewriteHostRegion(targetOp.getRegion()))) {
+ targetOp.emitOpError() << "rewrite for target device failed";
+ return WalkResult::interrupt();
+ }
+
+ targetOps.push_back(targetOp);
+ return WalkResult::skip();
+ }
+
+ if (auto targetOp = dyn_cast<omp::TargetEnterDataOp>(op))
+ targetOps.push_back(targetOp);
+ if (auto targetOp = dyn_cast<omp::TargetExitDataOp>(op))
+ targetOps.push_back(targetOp);
+ if (auto targetOp = dyn_cast<omp::TargetUpdateOp>(op))
+ targetOps.push_back(targetOp);
+
+ return WalkResult::advance();
+ });
+
+ if (result.wasInterrupted())
+ return failure();
+
+ // Make a temporary clone of the parent operation with an empty region,
+ // and update all references to entry block arguments to those of the new
+ // region. Users will later either be moved to the new region or deleted
+ // when the original region is replaced by the new.
+ OpBuilder builder(&getContext());
+ builder.setInsertionPointAfter(region.getParentOp());
+ Operation *newOp = builder.cloneWithoutRegions(*region.getParentOp());
+ Block &block = newOp->getRegion(0).emplaceBlock();
+
+ llvm::SmallVector<Location> locs;
+ locs.reserve(region.getNumArguments());
+ llvm::transform(region.getArguments(), std::back_inserter(locs),
+ [](const BlockArgument &arg) { return arg.getLoc(); });
+ block.addArguments(region.getArgumentTypes(), locs);
+
+ for (auto [oldArg, newArg] :
+ llvm::zip_equal(region.getArguments(), block.getArguments()))
+ oldArg.replaceAllUsesWith(newArg);
+
+ // Collect omp.map.info ops while satisfying interdependencies. This must be
+ // updated whenever new map-like clauses are introduced or they are attached
+ // to other operations.
+ llvm::SetVector<omp::MapInfoOp> mapInfos;
+ for (auto targetOp : targetOps) {
+ std::visit(
+ [®ion, &mapInfos](auto op) {
+ for (Value mapVar : op.getMapVars())
+ collectMapInfos(cast<omp::MapInfoOp>(mapVar.getDefiningOp()),
+ region, mapInfos);
+
+ if constexpr (std::is_same_v<decltype(op), omp::TargetOp>) {
+ for (Value mapVar : op.getHasDeviceAddrVars())
+ collectMapInfos(cast<omp::MapInfoOp>(mapVar.getDefiningOp()),
+ region, mapInfos);
+ } else if constexpr (std::is_same_v<decltype(op),
+ omp::TargetDataOp>) {
+ for (Value mapVar : op.getUseDeviceAddrVars())
+ collectMapInfos(cast<omp::MapInfoOp>(mapVar.getDefiningOp()),
+ region, mapInfos);
+ for (Value mapVar : op.getUseDevicePtrVars())
+ collectMapInfos(cast<omp::MapInfoOp>(mapVar.getDefiningOp()),
+ region, mapInfos);
+ }
+ },
+ targetOp);
+ }
+
+ // Move omp.map.info ops to the new block and collect dependencies.
+ llvm::SetVector<hlfir::DeclareOp> declareOps;
+ llvm::SetVector<fir::BoxOffsetOp> boxOffsets;
+ llvm::SetVector<Value> rewriteValues;
+ for (omp::MapInfoOp mapOp : mapInfos) {
+ // Handle var_ptr: hlfir.declare.
+ if (auto declareOp = dyn_cast_if_present<hlfir::DeclareOp>(
+ mapOp.getVarPtr().getDefiningOp())) {
+ if (region.isAncestor(declareOp->getParentRegion()))
+ declareOps.insert(declareOp);
+ } else {
+ markPtrOperandForRewrite(mapOp.getVarPtr(), rewriteValues);
+ }
+
+ // Handle var_ptr_ptr: fir.box_offset.
+ if (Value varPtrPtr = mapOp.getVarPtrPtr()) {
+ if (auto boxOffset = llvm::dyn_cast_if_present<fir::BoxOffsetOp>(
+ varPtrPtr.getDefiningOp())) {
+ if (region.isAncestor(boxOffset->getParentRegion()))
+ boxOffsets.insert(boxOffset);
+ } else {
+ return mapOp->emitOpError() << "var_ptr_ptr rewrite only supported "
+ "if defined by fir.box_offset";
+ }
+ }
+
+ // Bounds are not used during target device codegen.
+ mapOp.getBoundsMutable().clear();
+ mapOp->moveBefore(&block, block.end());
+ }
+
+ // Create a temporary marker to simplify the op moving process below.
+ builder.setInsertionPointToStart(&block);
+ auto marker = builder.create<fir::UndefOp>(builder.getUnknownLoc(),
+ builder.getNoneType());
+ builder.setInsertionPoint(marker);
+
+ // Move dependencies of hlfir.declare ops.
+ for (hlfir::DeclareOp declareOp : declareOps) {
+ Value memref = declareOp.getMemref();
+
+ // If it's defined by fir.address_of, then we need to keep that op as well
+ // because it might be pointing to a 'declare target' global.
+ if (auto addressOf =
+ dyn_cast_if_present<fir::AddrOfOp>(memref.getDefiningOp()))
+ addressOf->moveBefore(marker);
+ else
+ markPtrOperandForRewrite(memref, rewriteValues);
+
+ // Shape and typeparams aren't needed for target device codegen, but
+ // removing them would break verifiers.
+ Value zero;
+ if (declareOp.getShape() || !declareOp.getTypeparams().empty())
+ zero = builder.create<arith::ConstantOp>(declareOp.getLoc(),
+ builder.getI64IntegerAttr(0));
+
+ if (auto shape = declareOp.getShape()) {
+ Operation *shapeOp = shape.getDefiningOp();
+ unsigned numArgs = shapeOp->getNumOperands();
+ if (isa<fir::ShapeShiftOp>(shapeOp))
+ numArgs /= 2;
+
+ // Since the pre-cg rewrite pass requires the shape to be defined by one
+ // of fir.shape, fir.shapeshift or fir.shift, we need to create one of
+ // these.
+ llvm::SmallVector<Value> extents(numArgs, zero);
+ auto newShape = builder.create<fir::ShapeOp>(shape.getLoc(), extents);
+ declareOp.getShapeMutable().assign(newShape);
+ }
+
+ for (OpOperand &typeParam : declareOp.getTypeparamsMutable())
+ typeParam.assign(zero);
+
+ declareOp.getDummyScopeMutable().clear();
+ }
+
+ // We don't actually need the proper local allocations, but rather maintain
+ // the basic form of map operands. We create 1-bit placeholder allocas
+ // that we "typecast" to the expected pointer type and replace all uses.
+ // Using fir.undefined here instead is not possible because these variables
+ // cannot be constants, as that would trigger different codegen for target
+ // regions.
+ for (Value value : rewriteValues) {
+ Location loc = value.getLoc();
+ Value placeholder =
+ builder.create<fir::AllocaOp>(loc, builder.getI1Type());
+ value.replaceAllUsesWith(
+ builder.create<fir::ConvertOp>(loc, value.getType(), placeholder));
+ }
+
+ // Move omp.map.info dependencies.
+ for (hlfir::DeclareOp declareOp : declareOps)
+ declareOp->moveBefore(marker);
+
+ // The box_ref argument of fir.box_offset is expected to be the same value
+ // that was passed as var_ptr to the corresponding omp.map.info, so we don't
+ // need to move its defining op here.
+ for (fir::BoxOffsetOp boxOffset : boxOffsets)
+ boxOffset->moveBefore(marker);
+
+ marker->erase();
+
+ // Move mapping information users to the end of the new block.
+ for (auto targetOp : targetOps)
+ std::visit([&block](auto op) { op->moveBefore(&block, block.end()); },
+ targetOp);
+
+ // Add terminator to the new block.
+ builder.setInsertionPointToEnd(&block);
+ if (funcOp) {
+ llvm::SmallVector<Value> returnValues;
+ returnValues.reserve(funcOp.getNumResults());
+ for (auto type : funcOp.getResultTypes())
+ returnValues.push_back(
+ builder.create<fir::UndefOp>(funcOp.getLoc(), type));
+
+ builder.create<func::ReturnOp>(funcOp.getLoc(), returnValues);
+ } else {
+ builder.create<omp::TerminatorOp>(targetDataOp.getLoc());
+ }
+
+ // Replace old (now missing ops) region with the new one and remove the
+ // temporary clone.
+ region.takeBody(newOp->getRegion(0));
+ newOp->erase();
+ return success();
+ }
};
} // namespace
diff --git a/flang/test/Lower/OpenMP/declare-target-link-tarop-cap.f90 b/flang/test/Lower/OpenMP/declare-target-link-tarop-cap.f90
index cfdcd9eda82d1..8f4d1bdd600d7 100644
--- a/flang/test/Lower/OpenMP/declare-target-link-tarop-cap.f90
+++ b/flang/test/Lower/OpenMP/declare-target-link-tarop-cap.f90
@@ -1,7 +1,7 @@
-!RUN: %flang_fc1 -emit-hlfir -fopenmp %s -o - | FileCheck %s
-!RUN: %flang_fc1 -emit-hlfir -fopenmp -fopenmp-is-device %s -o - | FileCheck %s
-!RUN: bbc -emit-hlfir -fopenmp %s -o - | FileCheck %s
-!RUN: bbc -emit-hlfir -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s
+!RUN: %flang_fc1 -emit-hlfir -fopenmp %s -o - | FileCheck %s --check-prefixes=BOTH,HOST
+!RUN: %flang_fc1 -emit-hlfir -fopenmp -fopenmp-is-device %s -o - | FileCheck %s --check-prefixes=BOTH,DEVICE
+!RUN: bbc -emit-hlfir -fopenmp %s -o - | FileCheck %s --check-prefixes=BOTH,HOST
+!RUN: bbc -emit-hlfir -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s --check-prefixes=BOTH,DEVICE
program test_link
@@ -20,13 +20,14 @@ program test_link
integer, pointer :: test_ptr2
!$omp declare target link(test_ptr2)
- !CHECK-DAG: {{%.*}} = omp.map.info var_ptr({{%.*}} : !fir.ref<i32>, i32) map_clauses(implicit, tofrom) capture(ByRef) -> !fir.ref<i32> {name = "test_int"}
+ !BOTH-DAG: {{%.*}} = omp.map.info var_ptr({{%.*}} : !fir.ref<i32>, i32) map_clauses(implicit, tofrom) capture(ByRef) -> !fir.ref<i32> {name = "test_int"}
!$omp target
test_int = test_int + 1
!$omp end target
- !CHECK-DAG: {{%.*}} = omp.map.info var_ptr({{%.*}} : !fir.ref<!fir.array<3xi32>>, !fir.array<3xi32>) map_clauses(implicit, tofrom) capture(ByRef) bounds({{%.*}}) -> !fir.ref<!fir.array<3xi32>> {name = "test_array_1d"}
+ !HOST-DAG: {{%.*}} = omp.map.info var_ptr({{%.*}} : !fir.ref<!fir.array<3xi32>>, !fir.array<3xi32>) map_clauses(implicit, tofrom) capture(ByRef) bounds({{%.*}}) -> !fir.ref<!fir.array<3xi32>> {name = "test_array_1d"}
+ !DEVICE-DAG: {{%.*}} = omp.map.info var_ptr({{%.*}} : !fir.ref<!fir.array<3xi32>>, !fir.array<3xi32>) map_clauses(implicit, tofrom) capture(ByRef) -> !fir.ref<!fir.array<3xi32>> {name = "test_array_1d"}
!$omp target
do i = 1,3
test_array_1d(i) = i * 2
@@ -35,18 +36,18 @@ program test_link
allocate(test_ptr1)
test_ptr1 = 1
- !CHECK-DAG: {{%.*}} = omp.map.info var_ptr({{%.*}} : !fir.ref<!fir.box<!fir.ptr<i32>>>, !fir.box<!fir.ptr<i32>>) map_clauses(implicit, to) capture(ByRef) members({{%.*}} : !fir.llvm_ptr<!fir.ref<i32>>) -> !fir.ref<!fir.box<!fir.ptr<i32>>> {name = "test_ptr1"}
+ !BOTH-DAG: {{%.*}} = omp.map.info var_ptr({{%.*}} : !fir.ref<!fir.box<!fir.ptr<i32>>>, !fir.box<!fir.ptr<i32>>) map_clauses(implicit, to) capture(ByRef) members({{%.*}} : !fir.llvm_ptr<!fir.ref<i32>>) -> !fir.ref<!fir.box<!fir.ptr<i32>>> {name = "test_ptr1"}
!$omp target
test_ptr1 = test_ptr1 + 1
!$omp end target
- !CHECK-DAG: {{%.*}} = omp.map.info var_ptr({{%.*}} : !fir.ref<i32>, i32) map_clauses(implicit, tofrom) capture(ByRef) -> !fir.ref<i32> {name = "test_target"}
+ !BOTH-DAG: {{%.*}} = omp.map.info var_ptr({{%.*}} : !fir.ref<i32>, i32) map_clauses(implicit, tofrom) capture(ByRef) -> !fir.ref<i32> {name = "test_target"}
!$omp target
test_target = test_target + 1
!$omp end target
- !CHECK-DAG: {{%.*}} = omp.map.info var_ptr({{%.*}} : !fir.ref<!fir.box<!fir.ptr<i32>>>, !fir.box<!fir.ptr<i32>>) map_clauses(implicit, to) capture(ByRef) members({{%.*}} : !fir.llvm_ptr<!fir.ref<i32>>) -> !fir.ref<!fir.box<!fir.ptr<i32>>> {name = "test_ptr2"}
+ !BOTH-DAG: {{%.*}} = omp.map.info var_ptr({{%.*}} : !fir.ref<!fir.box<!fir.ptr<i32>>>, !fir.box<!fir.ptr<i32>>) map_clauses(implicit, to) capture(ByRef) members({{%.*}} : !fir.llvm_ptr<!fir.ref<i32>>) -> !fir.ref<!fir.box<!fir.ptr<i32>>> {name = "test_ptr2"}
test_ptr2 => test_target
!$omp target
test_ptr2 = test_ptr2 + 1
diff --git a/flang/test/Lower/OpenMP/host-eval.f90 b/flang/test/Lower/OpenMP/host-eval.f90
index fe5b9597f8620..c059f7338b26d 100644
--- a/flang/test/Lower/OpenMP/host-eval.f90
+++ b/flang/test/Lower/OpenMP/host-eval.f90
@@ -22,8 +22,10 @@ subroutine teams()
!$omp end target
- ! BOTH: omp.teams
- ! BOTH-SAME: num_teams({{.*}}) thread_limit({{.*}}) {
+ ! HOST: omp.teams
+ ! HOST-SAME: num_teams({{.*}}) thread_limit({{.*}}) {
+
+ ! DEVICE-NOT: omp.teams
!$omp teams num_teams(1) thread_limit(2)
call foo()
!$omp end teams
@@ -76,13 +78,18 @@ subroutine distribute_parallel_do()
!$omp end distribute parallel do
!$omp end target teams
- ! BOTH: omp.teams
+ ! HOST: omp.teams
+ ! DEVICE-NOT: omp.teams
!$omp teams
- ! BOTH: omp.parallel
- ! BOTH-SAME: num_threads({{.*}})
- ! BOTH: omp.distribute
- ! BOTH-NEXT: omp.wsloop
+ ! HOST: omp.parallel
+ ! HOST-SAME: num_threads({{.*}})
+ ! HOST: omp.distribute
+ ! HOST-NEXT: omp.wsloop
+
+ ! DEVICE-NOT: omp.parallel
+ ! DEVICE-NOT: omp.distribute
+ ! DEVICE-NOT: omp.wsloop
!$omp distribute parallel do num_threads(1)
do i=1,10
call foo()
@@ -140,14 +147,20 @@ subroutine distribute_parallel_do_simd()
!$omp end distribute parallel do simd
!$omp end target teams
- ! BOTH: omp.teams
+ ! HOST: omp.teams
+ ! DEVICE-NOT: omp.teams
!$omp teams
- ! BOTH: omp.parallel
- ! BOTH-SAME: num_threads({{.*}})
- ! BOTH: omp.distribute
- ! BOTH-NEXT: omp.wsloop
- ! BOTH-NEXT: omp.simd
+ ! HOST: omp.parallel
+ ! HOST-SAME: num_threads({{.*}})
+ ! HOST: omp.distribute
+ ! HOST-NEXT: omp.wsloop
+ ! HOST-NEXT: omp.simd
+
+ ! DEVICE-NOT: omp.parallel
+ ! DEVICE-NOT: omp.distribute
+ ! DEVICE-NOT: omp.wsloop
+ ! DEVICE-NOT: omp.simd
!$omp distribute parallel do simd num_threads(1)
do i=1,10
call foo()
@@ -194,10 +207,12 @@ subroutine distribute()
!$omp end distribute
!$omp end targ...
[truncated]
|
This patch updates the function filtering OpenMP pass intended to remove host functions from the MLIR module created by Flang lowering when targeting an OpenMP target device.
Host functions holding target regions must be kept, so that the target regions within them can be translated for the device. The issue is that non-target operations inside these functions cannot be discarded because some of them hold information that is also relevant during target device codegen. Specifically, mapping information resides outside of
omp.target
regions.This patch updates the previous behavior where all host operations were preserved to then ignore all of those that are not actually needed by target device codegen. This, in practice, means only keeping target regions and mapping information needed by the device. Arguments for some of these remaining operations are replaced by placeholder allocations and
fir.undefined
, since they are only actually defined inside of the target regions themselves.As a result, this set of changes makes it possible to later simplify target device codegen, as it is no longer necessary to handle host operations differently to avoid issues.