-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[flang] Inline hlfir.copy_in for trivial types #138718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach used in your draft looks good to me. Just style nits from me.
Please add lit tests before taking this out of draft.
0c9ef1c
to
cc62dcd
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
24bbcbd
to
36a6257
Compare
@llvm/pr-subscribers-flang-fir-hlfir Author: Kajetan Puchalski (mrkajetanp) Changeshlfir.copy_in implements copying non-contiguous array slices for functions that take in arrays required to be contiguous through flang-rt. For large arrays of trivial types, this can incur overhead compared to a plain, inlined copy loop. To address that, add a new InlineHLFIRCopyIn optimisation pass to inline hlfir.copy_in operations for trivial types. For the time being, the pattern is only applied in cases where the copy-in does not require a corresponding copy-out, such as when the function being called declares the array parameter as intent(in), and when the input array is not behind a pointer. Applying this optimisation reduces the runtime of thornado-mini's DeleptonizationProblem by about 10%. Accompanying implementation for the RFC which can be found here. Patch is 21.97 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/138718.diff 5 Files Affected:
diff --git a/flang/include/flang/Optimizer/HLFIR/Passes.td b/flang/include/flang/Optimizer/HLFIR/Passes.td
index d445140118073..04d7aec5fe489 100644
--- a/flang/include/flang/Optimizer/HLFIR/Passes.td
+++ b/flang/include/flang/Optimizer/HLFIR/Passes.td
@@ -69,6 +69,10 @@ def InlineHLFIRAssign : Pass<"inline-hlfir-assign"> {
let summary = "Inline hlfir.assign operations";
}
+def InlineHLFIRCopyIn : Pass<"inline-hlfir-copy-in"> {
+ let summary = "Inline hlfir.copy_in operations";
+}
+
def PropagateFortranVariableAttributes : Pass<"propagate-fortran-attrs"> {
let summary = "Propagate FortranVariableFlagsAttr attributes through HLFIR";
}
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/CMakeLists.txt b/flang/lib/Optimizer/HLFIR/Transforms/CMakeLists.txt
index d959428ebd203..cc74273d9c5d9 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/CMakeLists.txt
+++ b/flang/lib/Optimizer/HLFIR/Transforms/CMakeLists.txt
@@ -5,6 +5,7 @@ add_flang_library(HLFIRTransforms
ConvertToFIR.cpp
InlineElementals.cpp
InlineHLFIRAssign.cpp
+ InlineHLFIRCopyIn.cpp
LowerHLFIRIntrinsics.cpp
LowerHLFIROrderedAssignments.cpp
ScheduleOrderedAssignments.cpp
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/InlineHLFIRCopyIn.cpp b/flang/lib/Optimizer/HLFIR/Transforms/InlineHLFIRCopyIn.cpp
new file mode 100644
index 0000000000000..1e2aecaf535a0
--- /dev/null
+++ b/flang/lib/Optimizer/HLFIR/Transforms/InlineHLFIRCopyIn.cpp
@@ -0,0 +1,180 @@
+//===- InlineHLFIRCopyIn.cpp - Inline hlfir.copy_in ops -------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// Transform hlfir.copy_in array operations into loop nests performing element
+// per element assignments. For simplicity, the inlining is done for trivial
+// data types when the copy_in does not require a corresponding copy_out and
+// when the input array is not behind a pointer. This may change in the future.
+//===----------------------------------------------------------------------===//
+
+#include "flang/Optimizer/Builder/FIRBuilder.h"
+#include "flang/Optimizer/Builder/HLFIRTools.h"
+#include "flang/Optimizer/Dialect/FIRType.h"
+#include "flang/Optimizer/HLFIR/HLFIROps.h"
+#include "flang/Optimizer/OpenMP/Passes.h"
+#include "mlir/IR/PatternMatch.h"
+#include "mlir/Support/LLVM.h"
+#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
+
+namespace hlfir {
+#define GEN_PASS_DEF_INLINEHLFIRCOPYIN
+#include "flang/Optimizer/HLFIR/Passes.h.inc"
+} // namespace hlfir
+
+#define DEBUG_TYPE "inline-hlfir-copy-in"
+
+static llvm::cl::opt<bool> noInlineHLFIRCopyIn(
+ "no-inline-hlfir-copy-in",
+ llvm::cl::desc("Do not inline hlfir.copy_in operations"),
+ llvm::cl::init(false));
+
+namespace {
+class InlineCopyInConversion : public mlir::OpRewritePattern<hlfir::CopyInOp> {
+public:
+ using mlir::OpRewritePattern<hlfir::CopyInOp>::OpRewritePattern;
+
+ llvm::LogicalResult
+ matchAndRewrite(hlfir::CopyInOp copyIn,
+ mlir::PatternRewriter &rewriter) const override;
+};
+
+llvm::LogicalResult
+InlineCopyInConversion::matchAndRewrite(hlfir::CopyInOp copyIn,
+ mlir::PatternRewriter &rewriter) const {
+ fir::FirOpBuilder builder(rewriter, copyIn.getOperation());
+ mlir::Location loc = copyIn.getLoc();
+ hlfir::Entity inputVariable{copyIn.getVar()};
+ if (!fir::isa_trivial(inputVariable.getFortranElementType()))
+ return rewriter.notifyMatchFailure(copyIn,
+ "CopyInOp's data type is not trivial");
+
+ if (fir::isPointerType(inputVariable.getType()))
+ return rewriter.notifyMatchFailure(
+ copyIn, "CopyInOp's input variable is a pointer");
+
+ // There should be exactly one user of WasCopied - the corresponding
+ // CopyOutOp.
+ if (copyIn.getWasCopied().getUses().empty())
+ return rewriter.notifyMatchFailure(copyIn,
+ "CopyInOp's WasCopied has no uses");
+ // The copy out should always be present, either to actually copy or just
+ // deallocate memory.
+ auto copyOut = mlir::dyn_cast<hlfir::CopyOutOp>(
+ copyIn.getWasCopied().getUsers().begin().getCurrent().getUser());
+
+ if (!copyOut)
+ return rewriter.notifyMatchFailure(copyIn,
+ "CopyInOp has no direct CopyOut");
+
+ // Only inline the copy_in when copy_out does not need to be done, i.e. in
+ // case of intent(in).
+ if (copyOut.getVar())
+ return rewriter.notifyMatchFailure(copyIn, "CopyIn needs a copy-out");
+
+ inputVariable =
+ hlfir::derefPointersAndAllocatables(loc, builder, inputVariable);
+ mlir::Type resultAddrType = copyIn.getCopiedIn().getType();
+ mlir::Value isContiguous =
+ builder.create<fir::IsContiguousBoxOp>(loc, inputVariable);
+ mlir::Operation::result_range results =
+ builder
+ .genIfOp(loc, {resultAddrType, builder.getI1Type()}, isContiguous,
+ /*withElseRegion=*/true)
+ .genThen([&]() {
+ mlir::Value falseVal = builder.create<mlir::arith::ConstantOp>(
+ loc, builder.getI1Type(), builder.getBoolAttr(false));
+ builder.create<fir::ResultOp>(
+ loc, mlir::ValueRange{inputVariable, falseVal});
+ })
+ .genElse([&] {
+ auto [temp, cleanup] =
+ hlfir::createTempFromMold(loc, builder, inputVariable);
+ mlir::Value shape = hlfir::genShape(loc, builder, inputVariable);
+ llvm::SmallVector<mlir::Value> extents =
+ hlfir::getIndexExtents(loc, builder, shape);
+ hlfir::LoopNest loopNest = hlfir::genLoopNest(
+ loc, builder, extents, /*isUnordered=*/true,
+ flangomp::shouldUseWorkshareLowering(copyIn));
+ builder.setInsertionPointToStart(loopNest.body);
+ hlfir::Entity elem = hlfir::getElementAt(
+ loc, builder, inputVariable, loopNest.oneBasedIndices);
+ elem = hlfir::loadTrivialScalar(loc, builder, elem);
+ hlfir::Entity tempElem = hlfir::getElementAt(
+ loc, builder, temp, loopNest.oneBasedIndices);
+ builder.create<hlfir::AssignOp>(loc, elem, tempElem);
+ builder.setInsertionPointAfter(loopNest.outerOp);
+
+ mlir::Value result;
+ // Make sure the result is always a boxed array by boxing it
+ // ourselves if need be.
+ if (mlir::isa<fir::BaseBoxType>(temp.getType())) {
+ result = temp;
+ } else {
+ fir::ReferenceType refTy =
+ fir::ReferenceType::get(temp.getElementOrSequenceType());
+ mlir::Value refVal = builder.createConvert(loc, refTy, temp);
+ result =
+ builder.create<fir::EmboxOp>(loc, resultAddrType, refVal);
+ }
+
+ builder.create<fir::ResultOp>(loc,
+ mlir::ValueRange{result, cleanup});
+ })
+ .getResults();
+
+ mlir::OpResult addr = results[0];
+ mlir::OpResult needsCleanup = results[1];
+
+ builder.setInsertionPoint(copyOut);
+ builder.genIfOp(loc, {}, needsCleanup, /*withElseRegion=*/false).genThen([&] {
+ auto boxAddr = builder.create<fir::BoxAddrOp>(loc, addr);
+ fir::HeapType heapType =
+ fir::HeapType::get(fir::BoxValue(addr).getBaseTy());
+ mlir::Value heapVal =
+ builder.createConvert(loc, heapType, boxAddr.getResult());
+ builder.create<fir::FreeMemOp>(loc, heapVal);
+ });
+ rewriter.eraseOp(copyOut);
+
+ mlir::Value tempBox = copyIn.getTempBox();
+
+ rewriter.replaceOp(copyIn, {addr, builder.genNot(loc, isContiguous)});
+
+ // The TempBox is only needed for flang-rt calls which we're no longer
+ // generating. It should have no uses left at this stage.
+ if (!tempBox.getUses().empty())
+ return mlir::failure();
+ rewriter.eraseOp(tempBox.getDefiningOp());
+
+ return mlir::success();
+}
+
+class InlineHLFIRCopyInPass
+ : public hlfir::impl::InlineHLFIRCopyInBase<InlineHLFIRCopyInPass> {
+public:
+ void runOnOperation() override {
+ mlir::MLIRContext *context = &getContext();
+
+ mlir::GreedyRewriteConfig config;
+ // Prevent the pattern driver from merging blocks.
+ config.setRegionSimplificationLevel(
+ mlir::GreedySimplifyRegionLevel::Disabled);
+
+ mlir::RewritePatternSet patterns(context);
+ if (!noInlineHLFIRCopyIn) {
+ patterns.insert<InlineCopyInConversion>(context);
+ }
+
+ if (mlir::failed(mlir::applyPatternsGreedily(
+ getOperation(), std::move(patterns), config))) {
+ mlir::emitError(getOperation()->getLoc(),
+ "failure in hlfir.copy_in inlining");
+ signalPassFailure();
+ }
+ }
+};
+} // namespace
diff --git a/flang/lib/Optimizer/Passes/Pipelines.cpp b/flang/lib/Optimizer/Passes/Pipelines.cpp
index 77751908e35be..1779623fddc5a 100644
--- a/flang/lib/Optimizer/Passes/Pipelines.cpp
+++ b/flang/lib/Optimizer/Passes/Pipelines.cpp
@@ -255,6 +255,11 @@ void createHLFIRToFIRPassPipeline(mlir::PassManager &pm, bool enableOpenMP,
pm, hlfir::createOptimizedBufferization);
addNestedPassToAllTopLevelOperations<PassConstructor>(
pm, hlfir::createInlineHLFIRAssign);
+
+ if (optLevel == llvm::OptimizationLevel::O3) {
+ addNestedPassToAllTopLevelOperations<PassConstructor>(
+ pm, hlfir::createInlineHLFIRCopyIn);
+ }
}
pm.addPass(hlfir::createLowerHLFIROrderedAssignments());
pm.addPass(hlfir::createLowerHLFIRIntrinsics());
diff --git a/flang/test/HLFIR/inline-hlfir-copy-in.fir b/flang/test/HLFIR/inline-hlfir-copy-in.fir
new file mode 100644
index 0000000000000..7140e93f19979
--- /dev/null
+++ b/flang/test/HLFIR/inline-hlfir-copy-in.fir
@@ -0,0 +1,146 @@
+// Test inlining of hlfir.copy_in
+// RUN: fir-opt --inline-hlfir-copy-in %s | FileCheck %s
+
+// Test inlining of hlfir.copy_in that does not require the array to be copied out
+func.func private @_test_inline_copy_in(%arg0: !fir.box<!fir.array<?x?x?xf64>> {fir.bindc_name = "x"}, %arg1: !fir.ref<i32> {fir.bindc_name = "i"}, %arg2: !fir.ref<i32> {fir.bindc_name = "j"}) {
+ %0 = fir.alloca !fir.box<!fir.heap<!fir.array<?xf64>>>
+ %1 = fir.dummy_scope : !fir.dscope
+ %2:2 = hlfir.declare %arg1 dummy_scope %1 {uniq_name = "_QFFsb2Ei"} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
+ %3:2 = hlfir.declare %arg2 dummy_scope %1 {uniq_name = "_QFFsb2Ej"} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
+ %4:2 = hlfir.declare %arg0 dummy_scope %1 {uniq_name = "_QFFsb2Ex"} : (!fir.box<!fir.array<?x?x?xf64>>, !fir.dscope) -> (!fir.box<!fir.array<?x?x?xf64>>, !fir.box<!fir.array<?x?x?xf64>>)
+ %5 = fir.load %2#0 : !fir.ref<i32>
+ %6 = fir.convert %5 : (i32) -> i64
+ %c1 = arith.constant 1 : index
+ %c1_0 = arith.constant 1 : index
+ %7:3 = fir.box_dims %4#1, %c1_0 : (!fir.box<!fir.array<?x?x?xf64>>, index) -> (index, index, index)
+ %c1_1 = arith.constant 1 : index
+ %c0 = arith.constant 0 : index
+ %8 = arith.subi %7#1, %c1 : index
+ %9 = arith.addi %8, %c1_1 : index
+ %10 = arith.divsi %9, %c1_1 : index
+ %11 = arith.cmpi sgt, %10, %c0 : index
+ %12 = arith.select %11, %10, %c0 : index
+ %13 = fir.load %3#0 : !fir.ref<i32>
+ %14 = fir.convert %13 : (i32) -> i64
+ %15 = fir.shape %12 : (index) -> !fir.shape<1>
+ %16 = hlfir.designate %4#0 (%6, %c1:%7#1:%c1_1, %14) shape %15 : (!fir.box<!fir.array<?x?x?xf64>>, i64, index, index, index, i64, !fir.shape<1>) -> !fir.box<!fir.array<?xf64>>
+ %c100_i32 = arith.constant 100 : i32
+ %17:2 = hlfir.copy_in %16 to %0 : (!fir.box<!fir.array<?xf64>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xf64>>>>) -> (!fir.box<!fir.array<?xf64>>, i1)
+ %18 = fir.box_addr %17#0 : (!fir.box<!fir.array<?xf64>>) -> !fir.ref<!fir.array<?xf64>>
+ %19:3 = hlfir.associate %c100_i32 {adapt.valuebyref} : (i32) -> (!fir.ref<i32>, !fir.ref<i32>, i1)
+ fir.call @_QFPsb(%18, %19#0) fastmath<contract> : (!fir.ref<!fir.array<?xf64>>, !fir.ref<i32>) -> ()
+ hlfir.copy_out %0, %17#1 : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xf64>>>>, i1) -> ()
+ hlfir.end_associate %19#1, %19#2 : !fir.ref<i32>, i1
+ return
+}
+
+// CHECK-LABEL: func.func private @_test_inline_copy_in(
+// CHECK-SAME: %[[VAL_0:.*]]: !fir.box<!fir.array<?x?x?xf64>> {fir.bindc_name = "x"},
+// CHECK-SAME: %[[VAL_1:.*]]: !fir.ref<i32> {fir.bindc_name = "i"},
+// CHECK-SAME: %[[VAL_2:.*]]: !fir.ref<i32> {fir.bindc_name = "j"}) {
+// CHECK: %[[VAL_3:.*]] = arith.constant true
+// CHECK: %[[VAL_4:.*]] = arith.constant false
+// CHECK: %[[VAL_5:.*]] = arith.constant 100 : i32
+// CHECK: %[[VAL_6:.*]] = arith.constant 0 : index
+// CHECK: %[[VAL_7:.*]] = arith.constant 1 : index
+// CHECK: %[[VAL_8:.*]] = fir.dummy_scope : !fir.dscope
+// CHECK: %[[VAL_22:.*]]:2 = hlfir.declare %[[VAL_1:.*]] dummy_scope %[[VAL_8:.*]] {uniq_name = "_QFFsb2Ei"} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
+// CHECK: %[[VAL_9:.*]]:2 = hlfir.declare %[[VAL_2:.*]] dummy_scope %[[VAL_8:.*]] {uniq_name = "_QFFsb2Ej"} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
+// CHECK: %[[VAL_10:.*]]:2 = hlfir.declare %[[VAL_0:.*]] dummy_scope %[[VAL_8:.*]] {uniq_name = "_QFFsb2Ex"} : (!fir.box<!fir.array<?x?x?xf64>>, !fir.dscope) -> (!fir.box<!fir.array<?x?x?xf64>>, !fir.box<!fir.array<?x?x?xf64>>)
+// CHECK: %[[VAL_11:.*]] = fir.load %[[VAL_22:.*]]#0 : !fir.ref<i32>
+// CHECK: %[[VAL_12:.*]] = fir.convert %[[VAL_11:.*]] : (i32) -> i64
+// CHECK: %[[VAL_13:.*]]:3 = fir.box_dims %[[VAL_10:.*]]#1, %[[VAL_7:.*]] : (!fir.box<!fir.array<?x?x?xf64>>, index) -> (index, index, index)
+// CHECK: %[[VAL_14:.*]] = arith.cmpi sgt, %[[VAL_13:.*]]#1, %[[VAL_6:.*]] : index
+// CHECK: %[[VAL_15:.*]] = arith.select %[[VAL_14:.*]], %[[VAL_13:.*]]#1, %[[VAL_6:.*]] : index
+// CHECK: %[[VAL_16:.*]] = fir.load %[[VAL_9:.*]]#0 : !fir.ref<i32>
+// CHECK: %[[VAL_17:.*]] = fir.convert %[[VAL_16:.*]] : (i32) -> i64
+// CHECK: %[[VAL_18:.*]] = fir.shape %[[VAL_15:.*]] : (index) -> !fir.shape<1>
+// CHECK: %[[VAL_19:.*]] = hlfir.designate %[[VAL_10:.*]]#0 (%[[VAL_12:.*]], %[[VAL_7:.*]]:%[[VAL_13:.*]]#1:%[[VAL_7:.*]], %[[VAL_17:.*]]) shape %[[VAL_18:.*]] : (!fir.box<!fir.array<?x?x?xf64>>, i64, index, index, index, i64, !fir.shape<1>) -> !fir.box<!fir.array<?xf64>>
+// CHECK: %[[VAL_20:.*]] = fir.is_contiguous_box %[[VAL_19:.*]] whole : (!fir.box<!fir.array<?xf64>>) -> i1
+// CHECK: %[[VAL_21:.*]]:2 = fir.if %[[VAL_20:.*]] -> (!fir.box<!fir.array<?xf64>>, i1) {
+// CHECK: fir.result %[[VAL_19:.*]], %[[VAL_4:.*]] : !fir.box<!fir.array<?xf64>>, i1
+// CHECK: } else {
+// CHECK: %[[VAL_24:.*]] = fir.allocmem !fir.array<?xf64>, %[[VAL_15:.*]] {bindc_name = ".tmp", uniq_name = ""}
+// CHECK: %[[VAL_25:.*]]:2 = hlfir.declare %[[VAL_24:.*]](%[[VAL_18:.*]]) {uniq_name = ".tmp"} : (!fir.heap<!fir.array<?xf64>>, !fir.shape<1>) -> (!fir.box<!fir.array<?xf64>>, !fir.heap<!fir.array<?xf64>>)
+// CHECK: fir.do_loop %arg3 = %[[VAL_7:.*]] to %[[VAL_15:.*]] step %[[VAL_7:.*]] unordered {
+// CHECK: %[[VAL_26:.*]] = hlfir.designate %[[VAL_19:.*]] (%arg3) : (!fir.box<!fir.array<?xf64>>, index) -> !fir.ref<f64>
+// CHECK: %[[VAL_27:.*]] = fir.load %[[VAL_26:.*]] : !fir.ref<f64>
+// CHECK: %[[VAL_28:.*]] = hlfir.designate %[[VAL_25:.*]]#0 (%arg3) : (!fir.box<!fir.array<?xf64>>, index) -> !fir.ref<f64>
+// CHECK: hlfir.assign %[[VAL_27:.*]] to %[[VAL_28:.*]] : f64, !fir.ref<f64>
+// CHECK: }
+// CHECK: fir.result %[[VAL_25:.*]]#0, %[[VAL_3:.*]] : !fir.box<!fir.array<?xf64>>, i1
+// CHECK: }
+// CHECK: %[[VAL_22:.*]] = fir.box_addr %[[VAL_21:.*]]#0 : (!fir.box<!fir.array<?xf64>>) -> !fir.ref<!fir.array<?xf64>>
+// CHECK: %[[VAL_23:.*]]:3 = hlfir.associate %[[VAL_5:.*]] {adapt.valuebyref} : (i32) -> (!fir.ref<i32>, !fir.ref<i32>, i1)
+// CHECK: fir.call @_QFPsb(%[[VAL_22:.*]], %[[VAL_23:.*]]#0) fastmath<contract> : (!fir.ref<!fir.array<?xf64>>, !fir.ref<i32>) -> ()
+// CHECK: fir.if %[[VAL_21:.*]]#1 {
+// CHECK: %[[VAL_24:.*]] = fir.box_addr %[[VAL_21:.*]]#0 : (!fir.box<!fir.array<?xf64>>) -> !fir.ref<!fir.array<?xf64>>
+// CHECK: %[[VAL_25:.*]] = fir.convert %[[VAL_24:.*]] : (!fir.ref<!fir.array<?xf64>>) -> !fir.heap<!fir.array<?xf64>>
+// CHECK: fir.freemem %[[VAL_25:.*]] : !fir.heap<!fir.array<?xf64>>
+// CHECK: }
+// CHECK: hlfir.end_associate %[[VAL_23:.*]]#1, %[[VAL_23:.*]]#2 : !fir.ref<i32>, i1
+// CHECK: return
+// CHECK: }
+
+// Test not inlining of hlfir.copy_in that requires the array to be copied out
+func.func private @_test_no_inline_copy_in(%arg0: !fir.box<!fir.array<?x?x?xf64>> {fir.bindc_name = "x"}, %arg1: !fir.ref<i32> {fir.bindc_name = "i"}, %arg2: !fir.ref<i32> {fir.bindc_name = "j"}) {
+ %0 = fir.alloca !fir.box<!fir.heap<!fir.array<?xf64>>>
+ %1 = fir.dummy_scope : !fir.dscope
+ %2:2 = hlfir.declare %arg1 dummy_scope %1 {uniq_name = "_QFFsb2Ei"} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
+ %3:2 = hlfir.declare %arg2 dummy_scope %1 {uniq_name = "_QFFsb2Ej"} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
+ %4:2 = hlfir.declare %arg0 dummy_scope %1 {uniq_name = "_QFFsb2Ex"} : (!fir.box<!fir.array<?x?x?xf64>>, !fir.dscope) -> (!fir.box<!fir.array<?x?x?xf64>>, !fir.box<!fir.array<?x?x?xf64>>)
+ %5 = fir.load %2#0 : !fir.ref<i32>
+ %6 = fir.convert %5 : (i32) -> i64
+ %c1 = arith.constant 1 : index
+ %c1_0 = arith.constant 1 : index
+ %7:3 = fir.box_dims %4#1, %c1_0 : (!fir.box<!fir.array<?x?x?xf64>>, index) -> (index, index, index)
+ %c1_1 = arith.constant 1 : index
+ %c0 = arith.constant 0 : index
+ %8 = arith.subi %7#1, %c1 : index
+ %9 = arith.addi %8, %c1_1 : index
+ %10 = arith.divsi %9, %c1_1 : index
+ %11 = arith.cmpi sgt, %10, %c0 : index
+ %12 = arith.select %11, %10, %c0 : index
+ %13 = fir.load %3#0 : !fir.ref<i32>
+ %14 = fir.convert %13 : (i32) -> i64
+ %15 = fir.shape %12 : (index) -> !fir.shape<1>
+ %16 = hlfir.designate %4#0 (%6, %c1:%7#1:%c1_1, %14) shape %15 : (!fir.box<!fir.array<?x?x?xf64>>, i64, index, index, index, i64, !fir.shape<1>) -> !fir.box<!fir.array<?xf64>>
+ %c100_i32 = arith.constant 100 : i32
+ %17:2 = hlfir.copy_in %16 to %0 : (!fir.box<!fir.array<?xf64>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xf64>>>>) -> (!fir.box<!fir.array<?xf64>>, i1)
+ %18 = fir.box_addr %17#0 : (!fir.box<!fir.array<?xf64>>) -> !fir.ref<!fir.array<?xf64>>
+ %19:3 = hlfir.associate %c100_i32 {adapt.valuebyref} : (i32) -> (!fir.ref<i32>, !fir.ref<i32>, i1)
+ fir.call @_QFPsb(%18, %19#1) fastmath<contract> : (!fir.ref<!fir.array<?xf64>>, !fir.ref<i32>) -> ()
+ hlfir.copy_out %0, %17#1 to %16 : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xf64>>>>, i1, !fir.box<!fir.array<?xf64>>) -> ()
+ hlfir.end_associate %19#1, %19#2 : !fir.ref<i32>, i1
+ return
+}
+
+// CHECK-LABEL: func.func private @_test_no_inline_copy_in(
+// CHECK-SAME: %[[VAL_0:.*]]: !fir.box<!fir.array<?x?x?xf64>> {fir.bindc_name = "x"},
+// CHECK-SAME: %[[VAL_1:.*]]: !fir.ref<i32> {fir.bindc_name = "i"},
+// CHECK-SAME: %[[VAL_2:.*]]: !fir.ref<i32> {fir.bindc_name = "j"}) {
+// CHECK: %[[VAL_3:.*]] = arith.constant 100 : i32
+// CHECK: %[[VAL_4:.*]] = arith.constant 0 : index
+// CHECK: %[[VAL_5:.*]] = arith.constant 1 : index
+// CHECK: %[[VAL_6:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xf64>>>
+// CHECK: %[[VAL_7:.*]] = fir.dummy_scope : !fir.dscope
+// CHECK: %[[VAL_8:.*]]:2 = hlfir.declare %[[VAL_1:.*]] dummy_scope %[[VAL_7:.*]] {uniq_name = "_QFFsb2Ei"} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
+// CHECK: %[[VAL_9:.*]]:2 = hlfir.declare %[[VAL_2:.*]] dummy_scope %[[VAL_7:.*]] {uniq_name = "_QFFsb2Ej"} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
+// CHECK: %[[VAL_10:.*]]:2 = hlfir.declare %[[VAL_0:.*]] dummy_scope %[[V...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the patch!
36a6257
to
5d27cf2
Compare
5d27cf2
to
bed8a6a
Compare
hlfir.copy_in implements copying non-contiguous array slices for functions that take in arrays required to be contiguous through a flang-rt function that calls memcpy/memmove separately on each element. For large arrays of trivial types, this can incur considerable overhead compared to a plain copy loop that is better able to take advantage of hardware pipelines. To address that, extend the InlineHLFIRAssign optimisation pass with a new pattern for inlining hlfir.copy_in operations for trivial types. For the time being, the pattern is only applied in cases where the copy-in does not require a corresponding copy-out, such as when the function being called declares the array parameter as intent(in). Applying this optimisation reduces the runtime of thornado-mini's DeleptonizationProblem by a factor of about 1/3rd. Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
bed8a6a
to
837d45b
Compare
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment. This looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
hlfir.copy_in implements copying non-contiguous array slices for functions that take in arrays required to be contiguous through flang-rt.
For large arrays of trivial types, this can incur overhead compared to a plain, inlined copy loop.
To address that, add a new InlineHLFIRCopyIn optimisation pass to inline hlfir.copy_in operations for trivial types.
For the time being, the pattern is only applied in cases where the copy-in does not require a corresponding copy-out, such as when the function being called declares the array parameter as intent(in).
Applying this optimisation reduces the runtime of thornado-mini's DeleptonizationProblem by about 10%.
Accompanying implementation for the RFC which can be found here.