Skip to content

[MLIR][OpenMP] LLVM IR translation of host_eval #116052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 14, 2025

Conversation

skatrak
Copy link
Member

@skatrak skatrak commented Nov 13, 2024

This patch adds support for processing the host_eval clause of omp.target to populate default and runtime kernel launch attributes. Specifically, these related to the num_teams, thread_limit and num_threads clauses attached to operations nested inside of omp.target. As a result, the thread_limit clause of omp.target is also supported.

The implementation of initTargetDefaultAttrs() is intended to reflect clang's own processing of multiple constructs and clauses in order to define a default number of teams and threads to be used as kernel attributes and to populate global variables in the target device module.

One side effect of this change is that it is no longer possible to translate to LLVM IR target device MLIR modules unless they have a supported target triple. This is because the local getGridValue() function in the OpenMPIRBuilder only works for certain architectures, and it is called whenever the maximum number of threads has not been explicitly defined. This limitation also matches clang.

Support for evaluating the collapsed loop trip count of SPMD and Generic-SPMD kernels remains unsupported.

@llvmbot
Copy link
Member

llvmbot commented Nov 13, 2024

@llvm/pr-subscribers-mlir-openmp
@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-flang-openmp

Author: Sergio Afonso (skatrak)

Changes

This patch adds support for processing the host_eval clause of omp.target to populate default and runtime kernel launch attributes. Specifically, these related to the num_teams, thread_limit and num_threads clauses attached to operations nested inside of omp.target. As a result, the thread_limit clause of omp.target is also supported.

The implementation of initTargetDefaultAttrs() is intended to reflect clang's own processing of multiple constructs and clauses in order to define a default number of teams and threads to be used as kernel attributes and to populate global variables in the target device module.

One side effect of this change is that it is no longer possible to translate to LLVM IR target device MLIR modules unless they have a supported target triple. This is because the local getGridValue() function in the OpenMPIRBuilder only works for certain architectures, and it is called whenever the maximum number of threads has not been explicitly defined. This limitation also matches clang.

Support for evaluating the collapsed loop trip count of target SPMD kernels remains unsupported.


Patch is 37.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/116052.diff

18 Files Affected:

  • (modified) flang/test/Integration/OpenMP/target-filtering.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/function-filtering-2.f90 (+3-3)
  • (modified) flang/test/Lower/OpenMP/function-filtering-3.f90 (+3-3)
  • (modified) flang/test/Lower/OpenMP/function-filtering.f90 (+3-3)
  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+229-16)
  • (modified) mlir/test/Target/LLVMIR/omptarget-byref-bycopy-generation-device.mlir (+2-2)
  • (modified) mlir/test/Target/LLVMIR/omptarget-constant-alloca-raise.mlir (+2-2)
  • (modified) mlir/test/Target/LLVMIR/omptarget-constant-indexing-device-region.mlir (+2-2)
  • (modified) mlir/test/Target/LLVMIR/omptarget-debug.mlir (+1-1)
  • (modified) mlir/test/Target/LLVMIR/omptarget-declare-target-llvm-device.mlir (+1-1)
  • (modified) mlir/test/Target/LLVMIR/omptarget-parallel-llvm.mlir (+2-2)
  • (modified) mlir/test/Target/LLVMIR/omptarget-region-device-llvm.mlir (+3-3)
  • (modified) mlir/test/Target/LLVMIR/omptarget-target-inside-task.mlir (+2-2)
  • (added) mlir/test/Target/LLVMIR/openmp-target-launch-device.mlir (+43)
  • (added) mlir/test/Target/LLVMIR/openmp-target-launch-host.mlir (+31)
  • (modified) mlir/test/Target/LLVMIR/openmp-target-use-device-nested.mlir (+2-2)
  • (modified) mlir/test/Target/LLVMIR/openmp-task-target-device.mlir (+1-1)
  • (modified) mlir/test/Target/LLVMIR/openmp-todo.mlir (+13-14)
diff --git a/flang/test/Integration/OpenMP/target-filtering.f90 b/flang/test/Integration/OpenMP/target-filtering.f90
index d1ab1b47e580d4..699c1040d91f9c 100644
--- a/flang/test/Integration/OpenMP/target-filtering.f90
+++ b/flang/test/Integration/OpenMP/target-filtering.f90
@@ -7,7 +7,7 @@
 !===----------------------------------------------------------------------===!
 
 !RUN: %flang_fc1 -emit-llvm -fopenmp %s -o - | FileCheck %s --check-prefixes HOST,ALL
-!RUN: %flang_fc1 -emit-llvm -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s --check-prefixes DEVICE,ALL
+!RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-llvm -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s --check-prefixes DEVICE,ALL
 
 !HOST: define {{.*}}@{{.*}}before{{.*}}(
 !DEVICE-NOT: define {{.*}}@before{{.*}}(
diff --git a/flang/test/Lower/OpenMP/function-filtering-2.f90 b/flang/test/Lower/OpenMP/function-filtering-2.f90
index 0c02aa223820e7..a2c5e29cfdcbf6 100644
--- a/flang/test/Lower/OpenMP/function-filtering-2.f90
+++ b/flang/test/Lower/OpenMP/function-filtering-2.f90
@@ -1,9 +1,9 @@
 ! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM,LLVM-HOST %s
 ! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -emit-hlfir %s -o - | FileCheck --check-prefix=MLIR %s
-! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM,LLVM-DEVICE %s
-! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefix=MLIR %s
+! RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM,LLVM-DEVICE %s
+! RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefix=MLIR %s
 ! RUN: bbc -fopenmp -fopenmp-version=52 -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-HOST,MLIR-ALL %s
-! RUN: bbc -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
+! RUN: bbc -target amdgcn-amd-amdhsa -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
 
 ! MLIR: func.func @{{.*}}implicit_invocation() attributes {omp.declare_target = #omp.declaretarget<device_type = (nohost), capture_clause = (to)>}
 ! MLIR: return
diff --git a/flang/test/Lower/OpenMP/function-filtering-3.f90 b/flang/test/Lower/OpenMP/function-filtering-3.f90
index a277c06d620669..4754da85de3aa7 100644
--- a/flang/test/Lower/OpenMP/function-filtering-3.f90
+++ b/flang/test/Lower/OpenMP/function-filtering-3.f90
@@ -1,9 +1,9 @@
 ! RUN: %flang_fc1 -fopenmp -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM-HOST,LLVM-ALL %s
 ! RUN: %flang_fc1 -fopenmp -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-HOST,MLIR-ALL %s
-! RUN: %flang_fc1 -fopenmp -fopenmp-is-target-device -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM-DEVICE,LLVM-ALL %s
-! RUN: %flang_fc1 -fopenmp -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
+! RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -fopenmp -fopenmp-is-target-device -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM-DEVICE,LLVM-ALL %s
+! RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -fopenmp -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
 ! RUN: bbc -fopenmp -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-HOST,MLIR-ALL %s
-! RUN: bbc -fopenmp -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
+! RUN: bbc -target amdgcn-amd-amdhsa -fopenmp -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
 
 ! Check that the correct LLVM IR functions are kept for the host and device
 ! after running the whole set of translation and transformation passes from
diff --git a/flang/test/Lower/OpenMP/function-filtering.f90 b/flang/test/Lower/OpenMP/function-filtering.f90
index a72822503c3739..9f37e6f028d7a9 100644
--- a/flang/test/Lower/OpenMP/function-filtering.f90
+++ b/flang/test/Lower/OpenMP/function-filtering.f90
@@ -1,9 +1,9 @@
 ! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM-HOST,LLVM-ALL %s
 ! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-HOST,MLIR-ALL %s
-! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM-DEVICE,LLVM-ALL %s
-! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
+! RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM-DEVICE,LLVM-ALL %s
+! RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
 ! RUN: bbc -fopenmp -fopenmp-version=52 -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-HOST,MLIR-ALL %s
-! RUN: bbc -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
+! RUN: bbc -target amdgcn-amd-amdhsa -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
 
 ! Check that the correct LLVM IR functions are kept for the host and device
 ! after running the whole set of translation and transformation passes from
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 9bdf3e11496f3a..9de4609ba67486 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -174,10 +174,6 @@ static LogicalResult checkImplementationStatus(Operation &op) {
     if (op.getHint())
       op.emitWarning("hint clause discarded");
   };
-  auto checkHostEval = [&todo](auto op, LogicalResult &result) {
-    if (!op.getHostEvalVars().empty())
-      result = todo("host_eval");
-  };
   auto checkIf = [&todo](auto op, LogicalResult &result) {
     if (op.getIfExpr())
       result = todo("if");
@@ -228,10 +224,6 @@ static LogicalResult checkImplementationStatus(Operation &op) {
         op.getReductionSyms())
       result = todo("reduction");
   };
-  auto checkThreadLimit = [&todo](auto op, LogicalResult &result) {
-    if (op.getThreadLimit())
-      result = todo("thread_limit");
-  };
   auto checkTaskReduction = [&todo](auto op, LogicalResult &result) {
     if (!op.getTaskReductionVars().empty() || op.getTaskReductionByref() ||
         op.getTaskReductionSyms())
@@ -295,7 +287,16 @@ static LogicalResult checkImplementationStatus(Operation &op) {
         checkAllocate(op, result);
         checkDevice(op, result);
         checkHasDeviceAddr(op, result);
-        checkHostEval(op, result);
+
+        // Host evaluated clauses are supported, except for target SPMD loop
+        // bounds.
+        for (BlockArgument arg :
+             cast<omp::BlockArgOpenMPOpInterface>(*op).getHostEvalBlockArgs())
+          for (Operation *user : arg.getUsers())
+            if (isa<omp::LoopNestOp>(user))
+              result = op.emitError("not yet implemented: host evaluation of "
+                                    "loop bounds in omp.target operation");
+
         checkIf(op, result);
         checkInReduction(op, result);
         checkIsDevicePtr(op, result);
@@ -316,7 +317,6 @@ static LogicalResult checkImplementationStatus(Operation &op) {
                                     "structures in omp.target operation");
           }
         }
-        checkThreadLimit(op, result);
       })
       .Default([](Operation &) {
         // Assume all clauses for an operation can be translated unless they are
@@ -3800,6 +3800,201 @@ createDeviceArgumentAccessor(MapInfoData &mapData, llvm::Argument &arg,
   return builder.saveIP();
 }
 
+/// Follow uses of `host_eval`-defined block arguments of the given `omp.target`
+/// operation and populate output variables with their corresponding host value
+/// (i.e. operand evaluated outside of the target region), based on their uses
+/// inside of the target region.
+///
+/// Loop bounds and steps are only optionally populated, if output vectors are
+/// provided.
+static void extractHostEvalClauses(omp::TargetOp targetOp, Value &numThreads,
+                                   Value &numTeamsLower, Value &numTeamsUpper,
+                                   Value &threadLimit) {
+  auto blockArgIface = llvm::cast<omp::BlockArgOpenMPOpInterface>(*targetOp);
+  for (auto item : llvm::zip_equal(targetOp.getHostEvalVars(),
+                                   blockArgIface.getHostEvalBlockArgs())) {
+    Value hostEvalVar = std::get<0>(item), blockArg = std::get<1>(item);
+
+    for (Operation *user : blockArg.getUsers()) {
+      llvm::TypeSwitch<Operation *>(user)
+          .Case([&](omp::TeamsOp teamsOp) {
+            if (teamsOp.getNumTeamsLower() == blockArg)
+              numTeamsLower = hostEvalVar;
+            else if (teamsOp.getNumTeamsUpper() == blockArg)
+              numTeamsUpper = hostEvalVar;
+            else if (teamsOp.getThreadLimit() == blockArg)
+              threadLimit = hostEvalVar;
+            else
+              llvm_unreachable("unsupported host_eval use");
+          })
+          .Case([&](omp::ParallelOp parallelOp) {
+            if (parallelOp.getNumThreads() == blockArg)
+              numThreads = hostEvalVar;
+            else
+              llvm_unreachable("unsupported host_eval use");
+          })
+          .Case([&](omp::LoopNestOp loopOp) {
+            // TODO: Extract bounds and step values.
+          })
+          .Default([](Operation *) {
+            llvm_unreachable("unsupported host_eval use");
+          });
+    }
+  }
+}
+
+/// If \p op is of the given type parameter, return it casted to that type.
+/// Otherwise, if its immediate parent operation (or some other higher-level
+/// parent, if \p immediateParent is false) is of that type, return that parent
+/// casted to the given type.
+///
+/// If \p op is \c null or neither it or its parent(s) are of the specified
+/// type, return a \c null operation.
+template <typename OpTy>
+static OpTy castOrGetParentOfType(Operation *op, bool immediateParent = false) {
+  if (!op)
+    return OpTy();
+
+  if (OpTy casted = dyn_cast<OpTy>(op))
+    return casted;
+
+  if (immediateParent)
+    return dyn_cast_if_present<OpTy>(op->getParentOp());
+
+  return op->getParentOfType<OpTy>();
+}
+
+/// Populate default `MinTeams`, `MaxTeams` and `MaxThreads` to their default
+/// values as stated by the corresponding clauses, if constant.
+///
+/// These default values must be set before the creation of the outlined LLVM
+/// function for the target region, so that they can be used to initialize the
+/// corresponding global `ConfigurationEnvironmentTy` structure.
+static void
+initTargetDefaultAttrs(omp::TargetOp targetOp,
+                       llvm::OpenMPIRBuilder::TargetKernelDefaultAttrs &attrs,
+                       bool isTargetDevice) {
+  Value hostNumThreads, hostNumTeamsLower, hostNumTeamsUpper, hostThreadLimit;
+  extractHostEvalClauses(targetOp, hostNumThreads, hostNumTeamsLower,
+                         hostNumTeamsUpper, hostThreadLimit);
+
+  // TODO: Handle constant 'if' clauses.
+  Operation *capturedOp = targetOp.getInnermostCapturedOmpOp();
+
+  auto extractConstInteger = [](Value value) -> std::optional<int64_t> {
+    if (auto constOp =
+            dyn_cast_if_present<LLVM::ConstantOp>(value.getDefiningOp()))
+      if (auto constAttr = dyn_cast<IntegerAttr>(constOp.getValue()))
+        return constAttr.getInt();
+
+    return std::nullopt;
+  };
+
+  // Handle clauses impacting the number of teams.
+
+  int32_t minTeamsVal = 1, maxTeamsVal = -1;
+  if (castOrGetParentOfType<omp::TeamsOp>(capturedOp)) {
+    // TODO: Use `hostNumTeamsLower` to initialize `minTeamsVal`. For now, match
+    // clang and set min and max to the same value.
+    if (hostNumTeamsUpper) {
+      if (auto val = extractConstInteger(hostNumTeamsUpper))
+        minTeamsVal = maxTeamsVal = *val;
+    } else {
+      minTeamsVal = maxTeamsVal = 0;
+    }
+  } else if (castOrGetParentOfType<omp::ParallelOp>(capturedOp,
+                                                    /*immediateParent=*/true) ||
+             castOrGetParentOfType<omp::SimdOp>(capturedOp,
+                                                /*immediateParent=*/true)) {
+    minTeamsVal = maxTeamsVal = 1;
+  } else {
+    minTeamsVal = maxTeamsVal = -1;
+  }
+
+  // Handle clauses impacting the number of threads.
+
+  auto setMaxValueFromClause = [&extractConstInteger](Value clauseValue,
+                                                      int32_t &result) {
+    if (!clauseValue)
+      return;
+
+    if (auto val = extractConstInteger(clauseValue))
+      result = *val;
+
+    // Found an applicable clause, so it's not undefined. Mark as unknown
+    // because it's not constant.
+    if (result < 0)
+      result = 0;
+  };
+
+  // Extract 'thread_limit' clause from 'target' and 'teams' directives.
+  int32_t targetThreadLimitVal = -1, teamsThreadLimitVal = -1;
+  setMaxValueFromClause(targetOp.getThreadLimit(), targetThreadLimitVal);
+  setMaxValueFromClause(hostThreadLimit, teamsThreadLimitVal);
+
+  // Extract 'max_threads' clause from 'parallel' or set to 1 if it's SIMD.
+  int32_t maxThreadsVal = -1;
+  if (auto parallelOp = castOrGetParentOfType<omp::ParallelOp>(capturedOp))
+    setMaxValueFromClause(hostNumThreads, maxThreadsVal);
+  else if (castOrGetParentOfType<omp::SimdOp>(capturedOp,
+                                              /*immediateParent=*/true))
+    maxThreadsVal = 1;
+
+  // For max values, < 0 means unset, == 0 means set but unknown. Select the
+  // minimum value between 'max_threads' and 'thread_limit' clauses that were
+  // set.
+  int32_t combinedMaxThreadsVal = targetThreadLimitVal;
+  if (combinedMaxThreadsVal < 0 ||
+      (teamsThreadLimitVal >= 0 && teamsThreadLimitVal < combinedMaxThreadsVal))
+    combinedMaxThreadsVal = teamsThreadLimitVal;
+
+  if (combinedMaxThreadsVal < 0 ||
+      (maxThreadsVal >= 0 && maxThreadsVal < combinedMaxThreadsVal))
+    combinedMaxThreadsVal = maxThreadsVal;
+
+  // Update kernel bounds structure for the `OpenMPIRBuilder` to use.
+  attrs.MinTeams = minTeamsVal;
+  attrs.MaxTeams.front() = maxTeamsVal;
+  attrs.MinThreads = 1;
+  attrs.MaxThreads.front() = combinedMaxThreadsVal;
+}
+
+/// Gather LLVM runtime values for all clauses evaluated in the host that are
+/// passed to the kernel invocation.
+///
+/// This function must be called only when compiling for the host. Also, it will
+/// only provide correct results if it's called after the body of \c targetOp
+/// has been fully generated.
+static void
+initTargetRuntimeAttrs(llvm::IRBuilderBase &builder,
+                       LLVM::ModuleTranslation &moduleTranslation,
+                       omp::TargetOp targetOp,
+                       llvm::OpenMPIRBuilder::TargetKernelRuntimeAttrs &attrs) {
+  Value numThreads, numTeamsLower, numTeamsUpper, teamsThreadLimit;
+  extractHostEvalClauses(targetOp, numThreads, numTeamsLower, numTeamsUpper,
+                         teamsThreadLimit);
+
+  // TODO: Handle constant 'if' clauses.
+  if (Value targetThreadLimit = targetOp.getThreadLimit())
+    attrs.TargetThreadLimit.front() =
+        moduleTranslation.lookupValue(targetThreadLimit);
+
+  if (numTeamsLower)
+    attrs.MinTeams = moduleTranslation.lookupValue(numTeamsLower);
+
+  if (numTeamsUpper)
+    attrs.MaxTeams.front() = moduleTranslation.lookupValue(numTeamsUpper);
+
+  if (teamsThreadLimit)
+    attrs.TeamsThreadLimit.front() =
+        moduleTranslation.lookupValue(teamsThreadLimit);
+
+  if (numThreads)
+    attrs.MaxThreads = moduleTranslation.lookupValue(numThreads);
+
+  // TODO: Populate attrs.LoopTripCount if it is target SPMD.
+}
+
 static LogicalResult
 convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
                  LLVM::ModuleTranslation &moduleTranslation) {
@@ -3809,12 +4004,13 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
 
   llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
   bool isTargetDevice = ompBuilder->Config.isTargetDevice();
+
   auto parentFn = opInst.getParentOfType<LLVM::LLVMFuncOp>();
+  auto blockIface = cast<omp::BlockArgOpenMPOpInterface>(opInst);
   auto &targetRegion = targetOp.getRegion();
   DataLayout dl = DataLayout(opInst.getParentOfType<ModuleOp>());
   SmallVector<Value> mapVars = targetOp.getMapVars();
-  ArrayRef<BlockArgument> mapBlockArgs =
-      cast<omp::BlockArgOpenMPOpInterface>(opInst).getMapBlockArgs();
+  ArrayRef<BlockArgument> mapBlockArgs = blockIface.getMapBlockArgs();
   llvm::Function *llvmOutlinedFn = nullptr;
 
   // TODO: It can also be false if a compile-time constant `false` IF clause is
@@ -3857,7 +4053,7 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
       OperandRange privateVars = targetOp.getPrivateVars();
       std::optional<ArrayAttr> privateSyms = targetOp.getPrivateSyms();
       MutableArrayRef<BlockArgument> privateBlockArgs =
-          cast<omp::BlockArgOpenMPOpInterface>(opInst).getPrivateBlockArgs();
+          blockIface.getPrivateBlockArgs();
 
       for (auto [privVar, privatizerNameAttr, privBlockArg] :
            llvm::zip_equal(privateVars, *privateSyms, privateBlockArgs)) {
@@ -3936,13 +4132,30 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
                                         allocaIP, codeGenIP);
   };
 
-  // TODO: Populate default and runtime attributes based on the construct and
-  // clauses.
+  llvm::SmallVector<llvm::Value *, 4> kernelInput;
   llvm::OpenMPIRBuilder::TargetKernelDefaultAttrs defaultAttrs = {
       /*MaxTeams=*/{-1}, /*MinTeams=*/0, /*MaxThreads=*/{0}, /*MinThreads=*/0};
+  initTargetDefaultAttrs(targetOp, defaultAttrs, isTargetDevice);
+
+  // Collect host-evaluated values needed to properly launch the kernel from the
+  // host.
   llvm::OpenMPIRBuilder::TargetKernelRuntimeAttrs runtimeAttrs;
+  if (!isTargetDevice)
+    initTargetRuntimeAttrs(builder, moduleTranslation, targetOp, runtimeAttrs);
+
+  // Pass host-evaluated values as parameters to the kernel / host fallback,
+  // except if they are constants. In any case, map the MLIR block argument to
+  // the corresponding LLVM values.
+  SmallVector<Value> hostEvalVars = targetOp.getHostEvalVars();
+  ArrayRef<BlockArgument> hostEvalBlockArgs = blockIface.getHostEvalBlockArgs();
+  for (auto [arg, var] : llvm::zip_equal(hostEvalBlockArgs, hostEvalVars)) {
+    llvm::Value *value = moduleTranslation.lookupValue(var);
+    moduleTranslation.mapValue(arg, value);
+
+    if (!llvm::isa<llvm::Constant>(value))
+      kernelInput.push_back(value);
+  }
 
-  llvm::SmallVector<llvm::Value *, 4> kernelInput;
   for (size_t i = 0; i < mapVars.size(); ++i) {
     // declare target arguments are not passed to kernels as arguments
     // TODO: We currently do not handle cases where a member is explicitly
diff --git a/mlir/test/Target/LLVMIR/omptarget-byref-bycopy-generation-device.mlir b/mlir/test/Target/LLVMIR/omptarget-byref-bycopy-generation-device.mlir
index 9549de1258efc5..89d8db5439bff6 100644
--- a/mlir/test/Target/LLVMIR/omptarget-byref-bycopy-generation-device.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-byref-bycopy-generation-device.mlir
@@ -1,6 +1,6 @@
 // RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
 
-module attributes {omp.is_target_device = true} {
+module attributes {llvm.target_triple = "amdgcn-amd-amdhsa", omp.is_target_device = true} {
   llvm.func @_QQmain() attributes {fir.bindc_name = "main"} {
     %0 = llvm.mlir.addressof ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Nov 13, 2024

@llvm/pr-subscribers-mlir-llvm

Author: Sergio Afonso (skatrak)

Changes

This patch adds support for processing the host_eval clause of omp.target to populate default and runtime kernel launch attributes. Specifically, these related to the num_teams, thread_limit and num_threads clauses attached to operations nested inside of omp.target. As a result, the thread_limit clause of omp.target is also supported.

The implementation of initTargetDefaultAttrs() is intended to reflect clang's own processing of multiple constructs and clauses in order to define a default number of teams and threads to be used as kernel attributes and to populate global variables in the target device module.

One side effect of this change is that it is no longer possible to translate to LLVM IR target device MLIR modules unless they have a supported target triple. This is because the local getGridValue() function in the OpenMPIRBuilder only works for certain architectures, and it is called whenever the maximum number of threads has not been explicitly defined. This limitation also matches clang.

Support for evaluating the collapsed loop trip count of target SPMD kernels remains unsupported.


Patch is 37.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/116052.diff

18 Files Affected:

  • (modified) flang/test/Integration/OpenMP/target-filtering.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/function-filtering-2.f90 (+3-3)
  • (modified) flang/test/Lower/OpenMP/function-filtering-3.f90 (+3-3)
  • (modified) flang/test/Lower/OpenMP/function-filtering.f90 (+3-3)
  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+229-16)
  • (modified) mlir/test/Target/LLVMIR/omptarget-byref-bycopy-generation-device.mlir (+2-2)
  • (modified) mlir/test/Target/LLVMIR/omptarget-constant-alloca-raise.mlir (+2-2)
  • (modified) mlir/test/Target/LLVMIR/omptarget-constant-indexing-device-region.mlir (+2-2)
  • (modified) mlir/test/Target/LLVMIR/omptarget-debug.mlir (+1-1)
  • (modified) mlir/test/Target/LLVMIR/omptarget-declare-target-llvm-device.mlir (+1-1)
  • (modified) mlir/test/Target/LLVMIR/omptarget-parallel-llvm.mlir (+2-2)
  • (modified) mlir/test/Target/LLVMIR/omptarget-region-device-llvm.mlir (+3-3)
  • (modified) mlir/test/Target/LLVMIR/omptarget-target-inside-task.mlir (+2-2)
  • (added) mlir/test/Target/LLVMIR/openmp-target-launch-device.mlir (+43)
  • (added) mlir/test/Target/LLVMIR/openmp-target-launch-host.mlir (+31)
  • (modified) mlir/test/Target/LLVMIR/openmp-target-use-device-nested.mlir (+2-2)
  • (modified) mlir/test/Target/LLVMIR/openmp-task-target-device.mlir (+1-1)
  • (modified) mlir/test/Target/LLVMIR/openmp-todo.mlir (+13-14)
diff --git a/flang/test/Integration/OpenMP/target-filtering.f90 b/flang/test/Integration/OpenMP/target-filtering.f90
index d1ab1b47e580d4..699c1040d91f9c 100644
--- a/flang/test/Integration/OpenMP/target-filtering.f90
+++ b/flang/test/Integration/OpenMP/target-filtering.f90
@@ -7,7 +7,7 @@
 !===----------------------------------------------------------------------===!
 
 !RUN: %flang_fc1 -emit-llvm -fopenmp %s -o - | FileCheck %s --check-prefixes HOST,ALL
-!RUN: %flang_fc1 -emit-llvm -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s --check-prefixes DEVICE,ALL
+!RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-llvm -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s --check-prefixes DEVICE,ALL
 
 !HOST: define {{.*}}@{{.*}}before{{.*}}(
 !DEVICE-NOT: define {{.*}}@before{{.*}}(
diff --git a/flang/test/Lower/OpenMP/function-filtering-2.f90 b/flang/test/Lower/OpenMP/function-filtering-2.f90
index 0c02aa223820e7..a2c5e29cfdcbf6 100644
--- a/flang/test/Lower/OpenMP/function-filtering-2.f90
+++ b/flang/test/Lower/OpenMP/function-filtering-2.f90
@@ -1,9 +1,9 @@
 ! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM,LLVM-HOST %s
 ! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -emit-hlfir %s -o - | FileCheck --check-prefix=MLIR %s
-! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM,LLVM-DEVICE %s
-! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefix=MLIR %s
+! RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM,LLVM-DEVICE %s
+! RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefix=MLIR %s
 ! RUN: bbc -fopenmp -fopenmp-version=52 -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-HOST,MLIR-ALL %s
-! RUN: bbc -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
+! RUN: bbc -target amdgcn-amd-amdhsa -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
 
 ! MLIR: func.func @{{.*}}implicit_invocation() attributes {omp.declare_target = #omp.declaretarget<device_type = (nohost), capture_clause = (to)>}
 ! MLIR: return
diff --git a/flang/test/Lower/OpenMP/function-filtering-3.f90 b/flang/test/Lower/OpenMP/function-filtering-3.f90
index a277c06d620669..4754da85de3aa7 100644
--- a/flang/test/Lower/OpenMP/function-filtering-3.f90
+++ b/flang/test/Lower/OpenMP/function-filtering-3.f90
@@ -1,9 +1,9 @@
 ! RUN: %flang_fc1 -fopenmp -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM-HOST,LLVM-ALL %s
 ! RUN: %flang_fc1 -fopenmp -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-HOST,MLIR-ALL %s
-! RUN: %flang_fc1 -fopenmp -fopenmp-is-target-device -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM-DEVICE,LLVM-ALL %s
-! RUN: %flang_fc1 -fopenmp -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
+! RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -fopenmp -fopenmp-is-target-device -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM-DEVICE,LLVM-ALL %s
+! RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -fopenmp -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
 ! RUN: bbc -fopenmp -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-HOST,MLIR-ALL %s
-! RUN: bbc -fopenmp -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
+! RUN: bbc -target amdgcn-amd-amdhsa -fopenmp -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
 
 ! Check that the correct LLVM IR functions are kept for the host and device
 ! after running the whole set of translation and transformation passes from
diff --git a/flang/test/Lower/OpenMP/function-filtering.f90 b/flang/test/Lower/OpenMP/function-filtering.f90
index a72822503c3739..9f37e6f028d7a9 100644
--- a/flang/test/Lower/OpenMP/function-filtering.f90
+++ b/flang/test/Lower/OpenMP/function-filtering.f90
@@ -1,9 +1,9 @@
 ! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM-HOST,LLVM-ALL %s
 ! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-HOST,MLIR-ALL %s
-! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM-DEVICE,LLVM-ALL %s
-! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
+! RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM-DEVICE,LLVM-ALL %s
+! RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
 ! RUN: bbc -fopenmp -fopenmp-version=52 -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-HOST,MLIR-ALL %s
-! RUN: bbc -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
+! RUN: bbc -target amdgcn-amd-amdhsa -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
 
 ! Check that the correct LLVM IR functions are kept for the host and device
 ! after running the whole set of translation and transformation passes from
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 9bdf3e11496f3a..9de4609ba67486 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -174,10 +174,6 @@ static LogicalResult checkImplementationStatus(Operation &op) {
     if (op.getHint())
       op.emitWarning("hint clause discarded");
   };
-  auto checkHostEval = [&todo](auto op, LogicalResult &result) {
-    if (!op.getHostEvalVars().empty())
-      result = todo("host_eval");
-  };
   auto checkIf = [&todo](auto op, LogicalResult &result) {
     if (op.getIfExpr())
       result = todo("if");
@@ -228,10 +224,6 @@ static LogicalResult checkImplementationStatus(Operation &op) {
         op.getReductionSyms())
       result = todo("reduction");
   };
-  auto checkThreadLimit = [&todo](auto op, LogicalResult &result) {
-    if (op.getThreadLimit())
-      result = todo("thread_limit");
-  };
   auto checkTaskReduction = [&todo](auto op, LogicalResult &result) {
     if (!op.getTaskReductionVars().empty() || op.getTaskReductionByref() ||
         op.getTaskReductionSyms())
@@ -295,7 +287,16 @@ static LogicalResult checkImplementationStatus(Operation &op) {
         checkAllocate(op, result);
         checkDevice(op, result);
         checkHasDeviceAddr(op, result);
-        checkHostEval(op, result);
+
+        // Host evaluated clauses are supported, except for target SPMD loop
+        // bounds.
+        for (BlockArgument arg :
+             cast<omp::BlockArgOpenMPOpInterface>(*op).getHostEvalBlockArgs())
+          for (Operation *user : arg.getUsers())
+            if (isa<omp::LoopNestOp>(user))
+              result = op.emitError("not yet implemented: host evaluation of "
+                                    "loop bounds in omp.target operation");
+
         checkIf(op, result);
         checkInReduction(op, result);
         checkIsDevicePtr(op, result);
@@ -316,7 +317,6 @@ static LogicalResult checkImplementationStatus(Operation &op) {
                                     "structures in omp.target operation");
           }
         }
-        checkThreadLimit(op, result);
       })
       .Default([](Operation &) {
         // Assume all clauses for an operation can be translated unless they are
@@ -3800,6 +3800,201 @@ createDeviceArgumentAccessor(MapInfoData &mapData, llvm::Argument &arg,
   return builder.saveIP();
 }
 
+/// Follow uses of `host_eval`-defined block arguments of the given `omp.target`
+/// operation and populate output variables with their corresponding host value
+/// (i.e. operand evaluated outside of the target region), based on their uses
+/// inside of the target region.
+///
+/// Loop bounds and steps are only optionally populated, if output vectors are
+/// provided.
+static void extractHostEvalClauses(omp::TargetOp targetOp, Value &numThreads,
+                                   Value &numTeamsLower, Value &numTeamsUpper,
+                                   Value &threadLimit) {
+  auto blockArgIface = llvm::cast<omp::BlockArgOpenMPOpInterface>(*targetOp);
+  for (auto item : llvm::zip_equal(targetOp.getHostEvalVars(),
+                                   blockArgIface.getHostEvalBlockArgs())) {
+    Value hostEvalVar = std::get<0>(item), blockArg = std::get<1>(item);
+
+    for (Operation *user : blockArg.getUsers()) {
+      llvm::TypeSwitch<Operation *>(user)
+          .Case([&](omp::TeamsOp teamsOp) {
+            if (teamsOp.getNumTeamsLower() == blockArg)
+              numTeamsLower = hostEvalVar;
+            else if (teamsOp.getNumTeamsUpper() == blockArg)
+              numTeamsUpper = hostEvalVar;
+            else if (teamsOp.getThreadLimit() == blockArg)
+              threadLimit = hostEvalVar;
+            else
+              llvm_unreachable("unsupported host_eval use");
+          })
+          .Case([&](omp::ParallelOp parallelOp) {
+            if (parallelOp.getNumThreads() == blockArg)
+              numThreads = hostEvalVar;
+            else
+              llvm_unreachable("unsupported host_eval use");
+          })
+          .Case([&](omp::LoopNestOp loopOp) {
+            // TODO: Extract bounds and step values.
+          })
+          .Default([](Operation *) {
+            llvm_unreachable("unsupported host_eval use");
+          });
+    }
+  }
+}
+
+/// If \p op is of the given type parameter, return it casted to that type.
+/// Otherwise, if its immediate parent operation (or some other higher-level
+/// parent, if \p immediateParent is false) is of that type, return that parent
+/// casted to the given type.
+///
+/// If \p op is \c null or neither it or its parent(s) are of the specified
+/// type, return a \c null operation.
+template <typename OpTy>
+static OpTy castOrGetParentOfType(Operation *op, bool immediateParent = false) {
+  if (!op)
+    return OpTy();
+
+  if (OpTy casted = dyn_cast<OpTy>(op))
+    return casted;
+
+  if (immediateParent)
+    return dyn_cast_if_present<OpTy>(op->getParentOp());
+
+  return op->getParentOfType<OpTy>();
+}
+
+/// Populate default `MinTeams`, `MaxTeams` and `MaxThreads` to their default
+/// values as stated by the corresponding clauses, if constant.
+///
+/// These default values must be set before the creation of the outlined LLVM
+/// function for the target region, so that they can be used to initialize the
+/// corresponding global `ConfigurationEnvironmentTy` structure.
+static void
+initTargetDefaultAttrs(omp::TargetOp targetOp,
+                       llvm::OpenMPIRBuilder::TargetKernelDefaultAttrs &attrs,
+                       bool isTargetDevice) {
+  Value hostNumThreads, hostNumTeamsLower, hostNumTeamsUpper, hostThreadLimit;
+  extractHostEvalClauses(targetOp, hostNumThreads, hostNumTeamsLower,
+                         hostNumTeamsUpper, hostThreadLimit);
+
+  // TODO: Handle constant 'if' clauses.
+  Operation *capturedOp = targetOp.getInnermostCapturedOmpOp();
+
+  auto extractConstInteger = [](Value value) -> std::optional<int64_t> {
+    if (auto constOp =
+            dyn_cast_if_present<LLVM::ConstantOp>(value.getDefiningOp()))
+      if (auto constAttr = dyn_cast<IntegerAttr>(constOp.getValue()))
+        return constAttr.getInt();
+
+    return std::nullopt;
+  };
+
+  // Handle clauses impacting the number of teams.
+
+  int32_t minTeamsVal = 1, maxTeamsVal = -1;
+  if (castOrGetParentOfType<omp::TeamsOp>(capturedOp)) {
+    // TODO: Use `hostNumTeamsLower` to initialize `minTeamsVal`. For now, match
+    // clang and set min and max to the same value.
+    if (hostNumTeamsUpper) {
+      if (auto val = extractConstInteger(hostNumTeamsUpper))
+        minTeamsVal = maxTeamsVal = *val;
+    } else {
+      minTeamsVal = maxTeamsVal = 0;
+    }
+  } else if (castOrGetParentOfType<omp::ParallelOp>(capturedOp,
+                                                    /*immediateParent=*/true) ||
+             castOrGetParentOfType<omp::SimdOp>(capturedOp,
+                                                /*immediateParent=*/true)) {
+    minTeamsVal = maxTeamsVal = 1;
+  } else {
+    minTeamsVal = maxTeamsVal = -1;
+  }
+
+  // Handle clauses impacting the number of threads.
+
+  auto setMaxValueFromClause = [&extractConstInteger](Value clauseValue,
+                                                      int32_t &result) {
+    if (!clauseValue)
+      return;
+
+    if (auto val = extractConstInteger(clauseValue))
+      result = *val;
+
+    // Found an applicable clause, so it's not undefined. Mark as unknown
+    // because it's not constant.
+    if (result < 0)
+      result = 0;
+  };
+
+  // Extract 'thread_limit' clause from 'target' and 'teams' directives.
+  int32_t targetThreadLimitVal = -1, teamsThreadLimitVal = -1;
+  setMaxValueFromClause(targetOp.getThreadLimit(), targetThreadLimitVal);
+  setMaxValueFromClause(hostThreadLimit, teamsThreadLimitVal);
+
+  // Extract 'max_threads' clause from 'parallel' or set to 1 if it's SIMD.
+  int32_t maxThreadsVal = -1;
+  if (auto parallelOp = castOrGetParentOfType<omp::ParallelOp>(capturedOp))
+    setMaxValueFromClause(hostNumThreads, maxThreadsVal);
+  else if (castOrGetParentOfType<omp::SimdOp>(capturedOp,
+                                              /*immediateParent=*/true))
+    maxThreadsVal = 1;
+
+  // For max values, < 0 means unset, == 0 means set but unknown. Select the
+  // minimum value between 'max_threads' and 'thread_limit' clauses that were
+  // set.
+  int32_t combinedMaxThreadsVal = targetThreadLimitVal;
+  if (combinedMaxThreadsVal < 0 ||
+      (teamsThreadLimitVal >= 0 && teamsThreadLimitVal < combinedMaxThreadsVal))
+    combinedMaxThreadsVal = teamsThreadLimitVal;
+
+  if (combinedMaxThreadsVal < 0 ||
+      (maxThreadsVal >= 0 && maxThreadsVal < combinedMaxThreadsVal))
+    combinedMaxThreadsVal = maxThreadsVal;
+
+  // Update kernel bounds structure for the `OpenMPIRBuilder` to use.
+  attrs.MinTeams = minTeamsVal;
+  attrs.MaxTeams.front() = maxTeamsVal;
+  attrs.MinThreads = 1;
+  attrs.MaxThreads.front() = combinedMaxThreadsVal;
+}
+
+/// Gather LLVM runtime values for all clauses evaluated in the host that are
+/// passed to the kernel invocation.
+///
+/// This function must be called only when compiling for the host. Also, it will
+/// only provide correct results if it's called after the body of \c targetOp
+/// has been fully generated.
+static void
+initTargetRuntimeAttrs(llvm::IRBuilderBase &builder,
+                       LLVM::ModuleTranslation &moduleTranslation,
+                       omp::TargetOp targetOp,
+                       llvm::OpenMPIRBuilder::TargetKernelRuntimeAttrs &attrs) {
+  Value numThreads, numTeamsLower, numTeamsUpper, teamsThreadLimit;
+  extractHostEvalClauses(targetOp, numThreads, numTeamsLower, numTeamsUpper,
+                         teamsThreadLimit);
+
+  // TODO: Handle constant 'if' clauses.
+  if (Value targetThreadLimit = targetOp.getThreadLimit())
+    attrs.TargetThreadLimit.front() =
+        moduleTranslation.lookupValue(targetThreadLimit);
+
+  if (numTeamsLower)
+    attrs.MinTeams = moduleTranslation.lookupValue(numTeamsLower);
+
+  if (numTeamsUpper)
+    attrs.MaxTeams.front() = moduleTranslation.lookupValue(numTeamsUpper);
+
+  if (teamsThreadLimit)
+    attrs.TeamsThreadLimit.front() =
+        moduleTranslation.lookupValue(teamsThreadLimit);
+
+  if (numThreads)
+    attrs.MaxThreads = moduleTranslation.lookupValue(numThreads);
+
+  // TODO: Populate attrs.LoopTripCount if it is target SPMD.
+}
+
 static LogicalResult
 convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
                  LLVM::ModuleTranslation &moduleTranslation) {
@@ -3809,12 +4004,13 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
 
   llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
   bool isTargetDevice = ompBuilder->Config.isTargetDevice();
+
   auto parentFn = opInst.getParentOfType<LLVM::LLVMFuncOp>();
+  auto blockIface = cast<omp::BlockArgOpenMPOpInterface>(opInst);
   auto &targetRegion = targetOp.getRegion();
   DataLayout dl = DataLayout(opInst.getParentOfType<ModuleOp>());
   SmallVector<Value> mapVars = targetOp.getMapVars();
-  ArrayRef<BlockArgument> mapBlockArgs =
-      cast<omp::BlockArgOpenMPOpInterface>(opInst).getMapBlockArgs();
+  ArrayRef<BlockArgument> mapBlockArgs = blockIface.getMapBlockArgs();
   llvm::Function *llvmOutlinedFn = nullptr;
 
   // TODO: It can also be false if a compile-time constant `false` IF clause is
@@ -3857,7 +4053,7 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
       OperandRange privateVars = targetOp.getPrivateVars();
       std::optional<ArrayAttr> privateSyms = targetOp.getPrivateSyms();
       MutableArrayRef<BlockArgument> privateBlockArgs =
-          cast<omp::BlockArgOpenMPOpInterface>(opInst).getPrivateBlockArgs();
+          blockIface.getPrivateBlockArgs();
 
       for (auto [privVar, privatizerNameAttr, privBlockArg] :
            llvm::zip_equal(privateVars, *privateSyms, privateBlockArgs)) {
@@ -3936,13 +4132,30 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
                                         allocaIP, codeGenIP);
   };
 
-  // TODO: Populate default and runtime attributes based on the construct and
-  // clauses.
+  llvm::SmallVector<llvm::Value *, 4> kernelInput;
   llvm::OpenMPIRBuilder::TargetKernelDefaultAttrs defaultAttrs = {
       /*MaxTeams=*/{-1}, /*MinTeams=*/0, /*MaxThreads=*/{0}, /*MinThreads=*/0};
+  initTargetDefaultAttrs(targetOp, defaultAttrs, isTargetDevice);
+
+  // Collect host-evaluated values needed to properly launch the kernel from the
+  // host.
   llvm::OpenMPIRBuilder::TargetKernelRuntimeAttrs runtimeAttrs;
+  if (!isTargetDevice)
+    initTargetRuntimeAttrs(builder, moduleTranslation, targetOp, runtimeAttrs);
+
+  // Pass host-evaluated values as parameters to the kernel / host fallback,
+  // except if they are constants. In any case, map the MLIR block argument to
+  // the corresponding LLVM values.
+  SmallVector<Value> hostEvalVars = targetOp.getHostEvalVars();
+  ArrayRef<BlockArgument> hostEvalBlockArgs = blockIface.getHostEvalBlockArgs();
+  for (auto [arg, var] : llvm::zip_equal(hostEvalBlockArgs, hostEvalVars)) {
+    llvm::Value *value = moduleTranslation.lookupValue(var);
+    moduleTranslation.mapValue(arg, value);
+
+    if (!llvm::isa<llvm::Constant>(value))
+      kernelInput.push_back(value);
+  }
 
-  llvm::SmallVector<llvm::Value *, 4> kernelInput;
   for (size_t i = 0; i < mapVars.size(); ++i) {
     // declare target arguments are not passed to kernels as arguments
     // TODO: We currently do not handle cases where a member is explicitly
diff --git a/mlir/test/Target/LLVMIR/omptarget-byref-bycopy-generation-device.mlir b/mlir/test/Target/LLVMIR/omptarget-byref-bycopy-generation-device.mlir
index 9549de1258efc5..89d8db5439bff6 100644
--- a/mlir/test/Target/LLVMIR/omptarget-byref-bycopy-generation-device.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-byref-bycopy-generation-device.mlir
@@ -1,6 +1,6 @@
 // RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
 
-module attributes {omp.is_target_device = true} {
+module attributes {llvm.target_triple = "amdgcn-amd-amdhsa", omp.is_target_device = true} {
   llvm.func @_QQmain() attributes {fir.bindc_name = "main"} {
     %0 = llvm.mlir.addressof ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Nov 13, 2024

@llvm/pr-subscribers-flang-fir-hlfir

Author: Sergio Afonso (skatrak)

Changes

This patch adds support for processing the host_eval clause of omp.target to populate default and runtime kernel launch attributes. Specifically, these related to the num_teams, thread_limit and num_threads clauses attached to operations nested inside of omp.target. As a result, the thread_limit clause of omp.target is also supported.

The implementation of initTargetDefaultAttrs() is intended to reflect clang's own processing of multiple constructs and clauses in order to define a default number of teams and threads to be used as kernel attributes and to populate global variables in the target device module.

One side effect of this change is that it is no longer possible to translate to LLVM IR target device MLIR modules unless they have a supported target triple. This is because the local getGridValue() function in the OpenMPIRBuilder only works for certain architectures, and it is called whenever the maximum number of threads has not been explicitly defined. This limitation also matches clang.

Support for evaluating the collapsed loop trip count of target SPMD kernels remains unsupported.


Patch is 37.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/116052.diff

18 Files Affected:

  • (modified) flang/test/Integration/OpenMP/target-filtering.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/function-filtering-2.f90 (+3-3)
  • (modified) flang/test/Lower/OpenMP/function-filtering-3.f90 (+3-3)
  • (modified) flang/test/Lower/OpenMP/function-filtering.f90 (+3-3)
  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+229-16)
  • (modified) mlir/test/Target/LLVMIR/omptarget-byref-bycopy-generation-device.mlir (+2-2)
  • (modified) mlir/test/Target/LLVMIR/omptarget-constant-alloca-raise.mlir (+2-2)
  • (modified) mlir/test/Target/LLVMIR/omptarget-constant-indexing-device-region.mlir (+2-2)
  • (modified) mlir/test/Target/LLVMIR/omptarget-debug.mlir (+1-1)
  • (modified) mlir/test/Target/LLVMIR/omptarget-declare-target-llvm-device.mlir (+1-1)
  • (modified) mlir/test/Target/LLVMIR/omptarget-parallel-llvm.mlir (+2-2)
  • (modified) mlir/test/Target/LLVMIR/omptarget-region-device-llvm.mlir (+3-3)
  • (modified) mlir/test/Target/LLVMIR/omptarget-target-inside-task.mlir (+2-2)
  • (added) mlir/test/Target/LLVMIR/openmp-target-launch-device.mlir (+43)
  • (added) mlir/test/Target/LLVMIR/openmp-target-launch-host.mlir (+31)
  • (modified) mlir/test/Target/LLVMIR/openmp-target-use-device-nested.mlir (+2-2)
  • (modified) mlir/test/Target/LLVMIR/openmp-task-target-device.mlir (+1-1)
  • (modified) mlir/test/Target/LLVMIR/openmp-todo.mlir (+13-14)
diff --git a/flang/test/Integration/OpenMP/target-filtering.f90 b/flang/test/Integration/OpenMP/target-filtering.f90
index d1ab1b47e580d4..699c1040d91f9c 100644
--- a/flang/test/Integration/OpenMP/target-filtering.f90
+++ b/flang/test/Integration/OpenMP/target-filtering.f90
@@ -7,7 +7,7 @@
 !===----------------------------------------------------------------------===!
 
 !RUN: %flang_fc1 -emit-llvm -fopenmp %s -o - | FileCheck %s --check-prefixes HOST,ALL
-!RUN: %flang_fc1 -emit-llvm -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s --check-prefixes DEVICE,ALL
+!RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-llvm -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s --check-prefixes DEVICE,ALL
 
 !HOST: define {{.*}}@{{.*}}before{{.*}}(
 !DEVICE-NOT: define {{.*}}@before{{.*}}(
diff --git a/flang/test/Lower/OpenMP/function-filtering-2.f90 b/flang/test/Lower/OpenMP/function-filtering-2.f90
index 0c02aa223820e7..a2c5e29cfdcbf6 100644
--- a/flang/test/Lower/OpenMP/function-filtering-2.f90
+++ b/flang/test/Lower/OpenMP/function-filtering-2.f90
@@ -1,9 +1,9 @@
 ! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM,LLVM-HOST %s
 ! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -emit-hlfir %s -o - | FileCheck --check-prefix=MLIR %s
-! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM,LLVM-DEVICE %s
-! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefix=MLIR %s
+! RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM,LLVM-DEVICE %s
+! RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefix=MLIR %s
 ! RUN: bbc -fopenmp -fopenmp-version=52 -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-HOST,MLIR-ALL %s
-! RUN: bbc -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
+! RUN: bbc -target amdgcn-amd-amdhsa -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
 
 ! MLIR: func.func @{{.*}}implicit_invocation() attributes {omp.declare_target = #omp.declaretarget<device_type = (nohost), capture_clause = (to)>}
 ! MLIR: return
diff --git a/flang/test/Lower/OpenMP/function-filtering-3.f90 b/flang/test/Lower/OpenMP/function-filtering-3.f90
index a277c06d620669..4754da85de3aa7 100644
--- a/flang/test/Lower/OpenMP/function-filtering-3.f90
+++ b/flang/test/Lower/OpenMP/function-filtering-3.f90
@@ -1,9 +1,9 @@
 ! RUN: %flang_fc1 -fopenmp -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM-HOST,LLVM-ALL %s
 ! RUN: %flang_fc1 -fopenmp -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-HOST,MLIR-ALL %s
-! RUN: %flang_fc1 -fopenmp -fopenmp-is-target-device -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM-DEVICE,LLVM-ALL %s
-! RUN: %flang_fc1 -fopenmp -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
+! RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -fopenmp -fopenmp-is-target-device -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM-DEVICE,LLVM-ALL %s
+! RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -fopenmp -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
 ! RUN: bbc -fopenmp -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-HOST,MLIR-ALL %s
-! RUN: bbc -fopenmp -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
+! RUN: bbc -target amdgcn-amd-amdhsa -fopenmp -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
 
 ! Check that the correct LLVM IR functions are kept for the host and device
 ! after running the whole set of translation and transformation passes from
diff --git a/flang/test/Lower/OpenMP/function-filtering.f90 b/flang/test/Lower/OpenMP/function-filtering.f90
index a72822503c3739..9f37e6f028d7a9 100644
--- a/flang/test/Lower/OpenMP/function-filtering.f90
+++ b/flang/test/Lower/OpenMP/function-filtering.f90
@@ -1,9 +1,9 @@
 ! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM-HOST,LLVM-ALL %s
 ! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-HOST,MLIR-ALL %s
-! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM-DEVICE,LLVM-ALL %s
-! RUN: %flang_fc1 -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
+! RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -flang-experimental-hlfir -emit-llvm %s -o - | FileCheck --check-prefixes=LLVM-DEVICE,LLVM-ALL %s
+! RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
 ! RUN: bbc -fopenmp -fopenmp-version=52 -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-HOST,MLIR-ALL %s
-! RUN: bbc -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
+! RUN: bbc -target amdgcn-amd-amdhsa -fopenmp -fopenmp-version=52 -fopenmp-is-target-device -emit-hlfir %s -o - | FileCheck --check-prefixes=MLIR-DEVICE,MLIR-ALL %s
 
 ! Check that the correct LLVM IR functions are kept for the host and device
 ! after running the whole set of translation and transformation passes from
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 9bdf3e11496f3a..9de4609ba67486 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -174,10 +174,6 @@ static LogicalResult checkImplementationStatus(Operation &op) {
     if (op.getHint())
       op.emitWarning("hint clause discarded");
   };
-  auto checkHostEval = [&todo](auto op, LogicalResult &result) {
-    if (!op.getHostEvalVars().empty())
-      result = todo("host_eval");
-  };
   auto checkIf = [&todo](auto op, LogicalResult &result) {
     if (op.getIfExpr())
       result = todo("if");
@@ -228,10 +224,6 @@ static LogicalResult checkImplementationStatus(Operation &op) {
         op.getReductionSyms())
       result = todo("reduction");
   };
-  auto checkThreadLimit = [&todo](auto op, LogicalResult &result) {
-    if (op.getThreadLimit())
-      result = todo("thread_limit");
-  };
   auto checkTaskReduction = [&todo](auto op, LogicalResult &result) {
     if (!op.getTaskReductionVars().empty() || op.getTaskReductionByref() ||
         op.getTaskReductionSyms())
@@ -295,7 +287,16 @@ static LogicalResult checkImplementationStatus(Operation &op) {
         checkAllocate(op, result);
         checkDevice(op, result);
         checkHasDeviceAddr(op, result);
-        checkHostEval(op, result);
+
+        // Host evaluated clauses are supported, except for target SPMD loop
+        // bounds.
+        for (BlockArgument arg :
+             cast<omp::BlockArgOpenMPOpInterface>(*op).getHostEvalBlockArgs())
+          for (Operation *user : arg.getUsers())
+            if (isa<omp::LoopNestOp>(user))
+              result = op.emitError("not yet implemented: host evaluation of "
+                                    "loop bounds in omp.target operation");
+
         checkIf(op, result);
         checkInReduction(op, result);
         checkIsDevicePtr(op, result);
@@ -316,7 +317,6 @@ static LogicalResult checkImplementationStatus(Operation &op) {
                                     "structures in omp.target operation");
           }
         }
-        checkThreadLimit(op, result);
       })
       .Default([](Operation &) {
         // Assume all clauses for an operation can be translated unless they are
@@ -3800,6 +3800,201 @@ createDeviceArgumentAccessor(MapInfoData &mapData, llvm::Argument &arg,
   return builder.saveIP();
 }
 
+/// Follow uses of `host_eval`-defined block arguments of the given `omp.target`
+/// operation and populate output variables with their corresponding host value
+/// (i.e. operand evaluated outside of the target region), based on their uses
+/// inside of the target region.
+///
+/// Loop bounds and steps are only optionally populated, if output vectors are
+/// provided.
+static void extractHostEvalClauses(omp::TargetOp targetOp, Value &numThreads,
+                                   Value &numTeamsLower, Value &numTeamsUpper,
+                                   Value &threadLimit) {
+  auto blockArgIface = llvm::cast<omp::BlockArgOpenMPOpInterface>(*targetOp);
+  for (auto item : llvm::zip_equal(targetOp.getHostEvalVars(),
+                                   blockArgIface.getHostEvalBlockArgs())) {
+    Value hostEvalVar = std::get<0>(item), blockArg = std::get<1>(item);
+
+    for (Operation *user : blockArg.getUsers()) {
+      llvm::TypeSwitch<Operation *>(user)
+          .Case([&](omp::TeamsOp teamsOp) {
+            if (teamsOp.getNumTeamsLower() == blockArg)
+              numTeamsLower = hostEvalVar;
+            else if (teamsOp.getNumTeamsUpper() == blockArg)
+              numTeamsUpper = hostEvalVar;
+            else if (teamsOp.getThreadLimit() == blockArg)
+              threadLimit = hostEvalVar;
+            else
+              llvm_unreachable("unsupported host_eval use");
+          })
+          .Case([&](omp::ParallelOp parallelOp) {
+            if (parallelOp.getNumThreads() == blockArg)
+              numThreads = hostEvalVar;
+            else
+              llvm_unreachable("unsupported host_eval use");
+          })
+          .Case([&](omp::LoopNestOp loopOp) {
+            // TODO: Extract bounds and step values.
+          })
+          .Default([](Operation *) {
+            llvm_unreachable("unsupported host_eval use");
+          });
+    }
+  }
+}
+
+/// If \p op is of the given type parameter, return it casted to that type.
+/// Otherwise, if its immediate parent operation (or some other higher-level
+/// parent, if \p immediateParent is false) is of that type, return that parent
+/// casted to the given type.
+///
+/// If \p op is \c null or neither it or its parent(s) are of the specified
+/// type, return a \c null operation.
+template <typename OpTy>
+static OpTy castOrGetParentOfType(Operation *op, bool immediateParent = false) {
+  if (!op)
+    return OpTy();
+
+  if (OpTy casted = dyn_cast<OpTy>(op))
+    return casted;
+
+  if (immediateParent)
+    return dyn_cast_if_present<OpTy>(op->getParentOp());
+
+  return op->getParentOfType<OpTy>();
+}
+
+/// Populate default `MinTeams`, `MaxTeams` and `MaxThreads` to their default
+/// values as stated by the corresponding clauses, if constant.
+///
+/// These default values must be set before the creation of the outlined LLVM
+/// function for the target region, so that they can be used to initialize the
+/// corresponding global `ConfigurationEnvironmentTy` structure.
+static void
+initTargetDefaultAttrs(omp::TargetOp targetOp,
+                       llvm::OpenMPIRBuilder::TargetKernelDefaultAttrs &attrs,
+                       bool isTargetDevice) {
+  Value hostNumThreads, hostNumTeamsLower, hostNumTeamsUpper, hostThreadLimit;
+  extractHostEvalClauses(targetOp, hostNumThreads, hostNumTeamsLower,
+                         hostNumTeamsUpper, hostThreadLimit);
+
+  // TODO: Handle constant 'if' clauses.
+  Operation *capturedOp = targetOp.getInnermostCapturedOmpOp();
+
+  auto extractConstInteger = [](Value value) -> std::optional<int64_t> {
+    if (auto constOp =
+            dyn_cast_if_present<LLVM::ConstantOp>(value.getDefiningOp()))
+      if (auto constAttr = dyn_cast<IntegerAttr>(constOp.getValue()))
+        return constAttr.getInt();
+
+    return std::nullopt;
+  };
+
+  // Handle clauses impacting the number of teams.
+
+  int32_t minTeamsVal = 1, maxTeamsVal = -1;
+  if (castOrGetParentOfType<omp::TeamsOp>(capturedOp)) {
+    // TODO: Use `hostNumTeamsLower` to initialize `minTeamsVal`. For now, match
+    // clang and set min and max to the same value.
+    if (hostNumTeamsUpper) {
+      if (auto val = extractConstInteger(hostNumTeamsUpper))
+        minTeamsVal = maxTeamsVal = *val;
+    } else {
+      minTeamsVal = maxTeamsVal = 0;
+    }
+  } else if (castOrGetParentOfType<omp::ParallelOp>(capturedOp,
+                                                    /*immediateParent=*/true) ||
+             castOrGetParentOfType<omp::SimdOp>(capturedOp,
+                                                /*immediateParent=*/true)) {
+    minTeamsVal = maxTeamsVal = 1;
+  } else {
+    minTeamsVal = maxTeamsVal = -1;
+  }
+
+  // Handle clauses impacting the number of threads.
+
+  auto setMaxValueFromClause = [&extractConstInteger](Value clauseValue,
+                                                      int32_t &result) {
+    if (!clauseValue)
+      return;
+
+    if (auto val = extractConstInteger(clauseValue))
+      result = *val;
+
+    // Found an applicable clause, so it's not undefined. Mark as unknown
+    // because it's not constant.
+    if (result < 0)
+      result = 0;
+  };
+
+  // Extract 'thread_limit' clause from 'target' and 'teams' directives.
+  int32_t targetThreadLimitVal = -1, teamsThreadLimitVal = -1;
+  setMaxValueFromClause(targetOp.getThreadLimit(), targetThreadLimitVal);
+  setMaxValueFromClause(hostThreadLimit, teamsThreadLimitVal);
+
+  // Extract 'max_threads' clause from 'parallel' or set to 1 if it's SIMD.
+  int32_t maxThreadsVal = -1;
+  if (auto parallelOp = castOrGetParentOfType<omp::ParallelOp>(capturedOp))
+    setMaxValueFromClause(hostNumThreads, maxThreadsVal);
+  else if (castOrGetParentOfType<omp::SimdOp>(capturedOp,
+                                              /*immediateParent=*/true))
+    maxThreadsVal = 1;
+
+  // For max values, < 0 means unset, == 0 means set but unknown. Select the
+  // minimum value between 'max_threads' and 'thread_limit' clauses that were
+  // set.
+  int32_t combinedMaxThreadsVal = targetThreadLimitVal;
+  if (combinedMaxThreadsVal < 0 ||
+      (teamsThreadLimitVal >= 0 && teamsThreadLimitVal < combinedMaxThreadsVal))
+    combinedMaxThreadsVal = teamsThreadLimitVal;
+
+  if (combinedMaxThreadsVal < 0 ||
+      (maxThreadsVal >= 0 && maxThreadsVal < combinedMaxThreadsVal))
+    combinedMaxThreadsVal = maxThreadsVal;
+
+  // Update kernel bounds structure for the `OpenMPIRBuilder` to use.
+  attrs.MinTeams = minTeamsVal;
+  attrs.MaxTeams.front() = maxTeamsVal;
+  attrs.MinThreads = 1;
+  attrs.MaxThreads.front() = combinedMaxThreadsVal;
+}
+
+/// Gather LLVM runtime values for all clauses evaluated in the host that are
+/// passed to the kernel invocation.
+///
+/// This function must be called only when compiling for the host. Also, it will
+/// only provide correct results if it's called after the body of \c targetOp
+/// has been fully generated.
+static void
+initTargetRuntimeAttrs(llvm::IRBuilderBase &builder,
+                       LLVM::ModuleTranslation &moduleTranslation,
+                       omp::TargetOp targetOp,
+                       llvm::OpenMPIRBuilder::TargetKernelRuntimeAttrs &attrs) {
+  Value numThreads, numTeamsLower, numTeamsUpper, teamsThreadLimit;
+  extractHostEvalClauses(targetOp, numThreads, numTeamsLower, numTeamsUpper,
+                         teamsThreadLimit);
+
+  // TODO: Handle constant 'if' clauses.
+  if (Value targetThreadLimit = targetOp.getThreadLimit())
+    attrs.TargetThreadLimit.front() =
+        moduleTranslation.lookupValue(targetThreadLimit);
+
+  if (numTeamsLower)
+    attrs.MinTeams = moduleTranslation.lookupValue(numTeamsLower);
+
+  if (numTeamsUpper)
+    attrs.MaxTeams.front() = moduleTranslation.lookupValue(numTeamsUpper);
+
+  if (teamsThreadLimit)
+    attrs.TeamsThreadLimit.front() =
+        moduleTranslation.lookupValue(teamsThreadLimit);
+
+  if (numThreads)
+    attrs.MaxThreads = moduleTranslation.lookupValue(numThreads);
+
+  // TODO: Populate attrs.LoopTripCount if it is target SPMD.
+}
+
 static LogicalResult
 convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
                  LLVM::ModuleTranslation &moduleTranslation) {
@@ -3809,12 +4004,13 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
 
   llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
   bool isTargetDevice = ompBuilder->Config.isTargetDevice();
+
   auto parentFn = opInst.getParentOfType<LLVM::LLVMFuncOp>();
+  auto blockIface = cast<omp::BlockArgOpenMPOpInterface>(opInst);
   auto &targetRegion = targetOp.getRegion();
   DataLayout dl = DataLayout(opInst.getParentOfType<ModuleOp>());
   SmallVector<Value> mapVars = targetOp.getMapVars();
-  ArrayRef<BlockArgument> mapBlockArgs =
-      cast<omp::BlockArgOpenMPOpInterface>(opInst).getMapBlockArgs();
+  ArrayRef<BlockArgument> mapBlockArgs = blockIface.getMapBlockArgs();
   llvm::Function *llvmOutlinedFn = nullptr;
 
   // TODO: It can also be false if a compile-time constant `false` IF clause is
@@ -3857,7 +4053,7 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
       OperandRange privateVars = targetOp.getPrivateVars();
       std::optional<ArrayAttr> privateSyms = targetOp.getPrivateSyms();
       MutableArrayRef<BlockArgument> privateBlockArgs =
-          cast<omp::BlockArgOpenMPOpInterface>(opInst).getPrivateBlockArgs();
+          blockIface.getPrivateBlockArgs();
 
       for (auto [privVar, privatizerNameAttr, privBlockArg] :
            llvm::zip_equal(privateVars, *privateSyms, privateBlockArgs)) {
@@ -3936,13 +4132,30 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
                                         allocaIP, codeGenIP);
   };
 
-  // TODO: Populate default and runtime attributes based on the construct and
-  // clauses.
+  llvm::SmallVector<llvm::Value *, 4> kernelInput;
   llvm::OpenMPIRBuilder::TargetKernelDefaultAttrs defaultAttrs = {
       /*MaxTeams=*/{-1}, /*MinTeams=*/0, /*MaxThreads=*/{0}, /*MinThreads=*/0};
+  initTargetDefaultAttrs(targetOp, defaultAttrs, isTargetDevice);
+
+  // Collect host-evaluated values needed to properly launch the kernel from the
+  // host.
   llvm::OpenMPIRBuilder::TargetKernelRuntimeAttrs runtimeAttrs;
+  if (!isTargetDevice)
+    initTargetRuntimeAttrs(builder, moduleTranslation, targetOp, runtimeAttrs);
+
+  // Pass host-evaluated values as parameters to the kernel / host fallback,
+  // except if they are constants. In any case, map the MLIR block argument to
+  // the corresponding LLVM values.
+  SmallVector<Value> hostEvalVars = targetOp.getHostEvalVars();
+  ArrayRef<BlockArgument> hostEvalBlockArgs = blockIface.getHostEvalBlockArgs();
+  for (auto [arg, var] : llvm::zip_equal(hostEvalBlockArgs, hostEvalVars)) {
+    llvm::Value *value = moduleTranslation.lookupValue(var);
+    moduleTranslation.mapValue(arg, value);
+
+    if (!llvm::isa<llvm::Constant>(value))
+      kernelInput.push_back(value);
+  }
 
-  llvm::SmallVector<llvm::Value *, 4> kernelInput;
   for (size_t i = 0; i < mapVars.size(); ++i) {
     // declare target arguments are not passed to kernels as arguments
     // TODO: We currently do not handle cases where a member is explicitly
diff --git a/mlir/test/Target/LLVMIR/omptarget-byref-bycopy-generation-device.mlir b/mlir/test/Target/LLVMIR/omptarget-byref-bycopy-generation-device.mlir
index 9549de1258efc5..89d8db5439bff6 100644
--- a/mlir/test/Target/LLVMIR/omptarget-byref-bycopy-generation-device.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-byref-bycopy-generation-device.mlir
@@ -1,6 +1,6 @@
 // RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
 
-module attributes {omp.is_target_device = true} {
+module attributes {llvm.target_triple = "amdgcn-amd-amdhsa", omp.is_target_device = true} {
   llvm.func @_QQmain() attributes {fir.bindc_name = "main"} {
     %0 = llvm.mlir.addressof ...
[truncated]

@skatrak skatrak requested a review from ivanradanov November 13, 2024 17:05
@skatrak skatrak force-pushed the users/skatrak/host-eval-05-mlir-llvmir-generic branch from 8ff0d3b to b5571be Compare November 15, 2024 16:27
@skatrak skatrak force-pushed the users/skatrak/host-eval-04-ompirbuilder-teams-threads branch from cc5c5cc to e2b3ac4 Compare November 27, 2024 12:26
@skatrak skatrak force-pushed the users/skatrak/host-eval-05-mlir-llvmir-generic branch from b5571be to 58bd5ff Compare November 27, 2024 12:31
@llvmbot llvmbot added the clang:openmp OpenMP related changes to Clang label Nov 27, 2024
@skatrak skatrak changed the base branch from users/skatrak/host-eval-04-ompirbuilder-teams-threads to users/skatrak/host-eval-04-2-ompirbuilder-attrs November 27, 2024 12:46
@skatrak skatrak force-pushed the users/skatrak/host-eval-04-2-ompirbuilder-attrs branch from c7ca41f to 7de5d58 Compare December 4, 2024 14:19
@skatrak skatrak force-pushed the users/skatrak/host-eval-05-mlir-llvmir-generic branch from 58bd5ff to a323719 Compare December 4, 2024 14:35
@skatrak
Copy link
Member Author

skatrak commented Dec 4, 2024

Ping for reviews!

@skatrak skatrak force-pushed the users/skatrak/host-eval-04-2-ompirbuilder-attrs branch from 7de5d58 to ab3e0d2 Compare January 9, 2025 12:27
@skatrak skatrak force-pushed the users/skatrak/host-eval-05-mlir-llvmir-generic branch 2 times, most recently from 9035b53 to 9cf7001 Compare January 9, 2025 13:18
Copy link
Contributor

@agozillon agozillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the patch @skatrak LGTM, left a few nits but feel free to ignore any you think make no sense!

@@ -289,7 +281,16 @@ static LogicalResult checkImplementationStatus(Operation &op) {
checkBare(op, result);
checkDevice(op, result);
checkHasDeviceAddr(op, result);
checkHostEval(op, result);

// Host evaluated clauses are supported, except for target SPMD loop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Would it make sense to keep this segment inside of the checkHostEval function just to keep things "tidy"? I'll leave it up to you, not overly fussed personally!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, initially I wasn't sure between doing what you suggest and what I ended up doing. Since other check<ClauseName> helpers only tested that the clause exists at all, and special cases so far were handled within the switch (i.e. firstprivate in target) to allow these functions still be reused by other operations, I decided to do the latter even though that clause cannot appear in any other operation.

But I agree that it makes things simpler, so I'm following your suggestion. In fact, I'm moving type-specific checks to the associated lambda as well and get things straightened out.

Value &numTeamsLower, Value &numTeamsUpper,
Value &threadLimit) {
auto blockArgIface = llvm::cast<omp::BlockArgOpenMPOpInterface>(*targetOp);
for (auto item : llvm::zip_equal(targetOp.getHostEvalVars(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit: Not sure if it makes sense to do something like: Value [hostEvalVar, blockArg] = llvm::zip_equal.... here but might save you the std::get's below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing a decomposition declaration for hostEvalVar and blockArg, and then capturing these values in the TypeSwitch lambdas below apparently triggers a weird corner in the C++ standard that was disallowed at one point. So, apparently we can only do this if compiling for C++20 or later.

llvm_unreachable("unsupported host_eval use");
})
.Case([&](omp::LoopNestOp loopOp) {
// TODO: Extract bounds and step values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not worth the effort and maybe it doesn't make sense, but I suppose we could do a TODO/llvm_unreachable here when we check (excuse the pseudo code) if (loopOp.step() == blockArg), just so that if we ever happen to apply this in lowering we'll emit a TODO reminder effectively reminding people it has no effect, but might be more trouble than it's worth and silently failing is more ideal, I'll leave the decision to you! Shame there isn't a TODO warning as opposed to hard failure for cases where it won't generate wrong code, but unoptimized/ignored requests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case must have already been checked and reported by the checkImplementationStatus of the omp.target operation, stopping the translation process. So, an llvm_unreachable makes sense to add here, thanks for the suggestion. I also added that explanation to the comment above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shame there isn't a TODO warning as opposed to hard failure for cases where it won't generate wrong code, but unoptimized/ignored requests

Whenever we need to do that, we can just emitWarning instead of emitError and just avoid returning a failure value. For example, we do that in convertIgnoredWrapper. We just don't want to do that in this case because it's actually a compilation-stopping error, it's just reported elsewhere.

// clang and set min and max to the same value.
if (numTeamsUpper) {
if (auto val = extractConstInteger(numTeamsUpper))
minTeamsVal = maxTeamsVal = *val;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likely a dumb question, would it make sense to have an else here, to set minTeamsVal/maxTeamsVal to 0 as is done if numTeamsUpper is not retrievable? Perhaps it's fine to leave it as the default we set above here though and it's something Clang does! If so feel free to disregard :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a dumb question, this code tries to replicate the logic in CGOpenMPRuntime::getNumTeamsExprForTargetDirective and even I don't actually completely follow the logic there 😅...

I know that "max" and "upper bound" default values are meant to be set to < 0 if they weren't set by the user, 0 if they are set to a runtime evaluated value and they will be > 0 if set to a constant. However, that doesn't seem to be followed entirely here initializing these values based on what constructs are present, so I'm not sure.

numThreads = parallelOp.getNumThreads();
}

auto extractConstInteger = [](Value value) -> std::optional<int64_t> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might honestly be worth just making this a function as it's possibly useful elsewhere in the file, but I'll leave that up to you as usual! :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done.

@skatrak skatrak force-pushed the users/skatrak/host-eval-04-2-ompirbuilder-attrs branch from ab3e0d2 to 0f17338 Compare January 14, 2025 12:35
Base automatically changed from users/skatrak/host-eval-04-2-ompirbuilder-attrs to main January 14, 2025 12:35
This patch adds support for processing the `host_eval` clause of `omp.target`
to populate default and runtime kernel launch attributes. Specifically, these
related to the `num_teams`, `thread_limit` and `num_threads` clauses attached
to operations nested inside of `omp.target`. As a result, the `thread_limit`
clause of `omp.target` is also supported.

The implementation of `initTargetDefaultAttrs()` is intended to reflect clang's
own processing of multiple constructs and clauses in order to define a default
number of teams and threads to be used as kernel attributes and to populate
global variables in the target device module.

One side effect of this change is that it is no longer possible to translate to
LLVM IR target device MLIR modules unless they have a supported target triple.
This is because the local `getGridValue()` function in the `OpenMPIRBuilder`
only works for certain architectures, and it is called whenever the maximum
number of threads has not been explicitly defined. This limitation also matches
clang.

Evaluating the collapsed loop trip count of target SPMD kernels remains
unsupported.
@skatrak skatrak force-pushed the users/skatrak/host-eval-05-mlir-llvmir-generic branch from 64adf93 to 352b7df Compare January 14, 2025 12:38
@skatrak skatrak merged commit d2d4c3b into main Jan 14, 2025
6 of 7 checks passed
@skatrak skatrak deleted the users/skatrak/host-eval-05-mlir-llvmir-generic branch January 14, 2025 13:07
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 14, 2025

LLVM Buildbot has detected a new failure on builder flang-aarch64-libcxx running on linaro-flang-aarch64-libcxx while building flang,mlir at step 6 "test-build-unified-tree-check-flang".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/89/builds/14319

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-flang) failure: test (failure)
******************** TEST 'Flang :: Integration/OpenMP/target-filtering.f90' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 9: /home/tcwg-buildbot/worker/flang-aarch64-libcxx/build/bin/flang -fc1 -emit-llvm -fopenmp /home/tcwg-buildbot/worker/flang-aarch64-libcxx/llvm-project/flang/test/Integration/OpenMP/target-filtering.f90 -o - | /home/tcwg-buildbot/worker/flang-aarch64-libcxx/build/bin/FileCheck /home/tcwg-buildbot/worker/flang-aarch64-libcxx/llvm-project/flang/test/Integration/OpenMP/target-filtering.f90 --check-prefixes HOST,ALL
+ /home/tcwg-buildbot/worker/flang-aarch64-libcxx/build/bin/flang -fc1 -emit-llvm -fopenmp /home/tcwg-buildbot/worker/flang-aarch64-libcxx/llvm-project/flang/test/Integration/OpenMP/target-filtering.f90 -o -
+ /home/tcwg-buildbot/worker/flang-aarch64-libcxx/build/bin/FileCheck /home/tcwg-buildbot/worker/flang-aarch64-libcxx/llvm-project/flang/test/Integration/OpenMP/target-filtering.f90 --check-prefixes HOST,ALL
RUN: at line 10: /home/tcwg-buildbot/worker/flang-aarch64-libcxx/build/bin/flang -fc1 -triple amdgcn-amd-amdhsa -emit-llvm -fopenmp -fopenmp-is-target-device /home/tcwg-buildbot/worker/flang-aarch64-libcxx/llvm-project/flang/test/Integration/OpenMP/target-filtering.f90 -o - | /home/tcwg-buildbot/worker/flang-aarch64-libcxx/build/bin/FileCheck /home/tcwg-buildbot/worker/flang-aarch64-libcxx/llvm-project/flang/test/Integration/OpenMP/target-filtering.f90 --check-prefixes DEVICE,ALL
+ /home/tcwg-buildbot/worker/flang-aarch64-libcxx/build/bin/FileCheck /home/tcwg-buildbot/worker/flang-aarch64-libcxx/llvm-project/flang/test/Integration/OpenMP/target-filtering.f90 --check-prefixes DEVICE,ALL
+ /home/tcwg-buildbot/worker/flang-aarch64-libcxx/build/bin/flang -fc1 -triple amdgcn-amd-amdhsa -emit-llvm -fopenmp -fopenmp-is-target-device /home/tcwg-buildbot/worker/flang-aarch64-libcxx/llvm-project/flang/test/Integration/OpenMP/target-filtering.f90 -o -
error: unable to create target: 'No available targets are compatible with triple "amdgcn-amd-amdhsa"'
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/tcwg-buildbot/worker/flang-aarch64-libcxx/build/bin/FileCheck /home/tcwg-buildbot/worker/flang-aarch64-libcxx/llvm-project/flang/test/Integration/OpenMP/target-filtering.f90 --check-prefixes DEVICE,ALL

--

********************


@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 14, 2025

LLVM Buildbot has detected a new failure on builder ppc64le-flang-rhel-clang running on ppc64le-flang-rhel-test while building flang,mlir at step 6 "test-build-unified-tree-check-flang".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/157/builds/17148

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-flang) failure: test (failure)
******************** TEST 'Flang :: Integration/OpenMP/target-filtering.f90' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 9: /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/flang -fc1 -emit-llvm -fopenmp /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Integration/OpenMP/target-filtering.f90 -o - | /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Integration/OpenMP/target-filtering.f90 --check-prefixes HOST,ALL
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/flang -fc1 -emit-llvm -fopenmp /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Integration/OpenMP/target-filtering.f90 -o -
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Integration/OpenMP/target-filtering.f90 --check-prefixes HOST,ALL
RUN: at line 10: /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/flang -fc1 -triple amdgcn-amd-amdhsa -emit-llvm -fopenmp -fopenmp-is-target-device /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Integration/OpenMP/target-filtering.f90 -o - | /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Integration/OpenMP/target-filtering.f90 --check-prefixes DEVICE,ALL
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/flang -fc1 -triple amdgcn-amd-amdhsa -emit-llvm -fopenmp -fopenmp-is-target-device /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Integration/OpenMP/target-filtering.f90 -o -
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Integration/OpenMP/target-filtering.f90 --check-prefixes DEVICE,ALL
error: unable to create target: 'No available targets are compatible with triple "amdgcn-amd-amdhsa"'
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Integration/OpenMP/target-filtering.f90 --check-prefixes DEVICE,ALL

--

********************


@skatrak
Copy link
Member Author

skatrak commented Jan 14, 2025

Created #122898 to address post-merge build failures.

skatrak added a commit that referenced this pull request Jan 14, 2025
Fix Fortran test failures caused by the introduction of the
amdgcn-amd-amdhsa target triple in #116052.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:openmp OpenMP related changes to Clang flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category mlir:llvm mlir:openmp mlir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants