[flang] Inline hlfir.copy_in for trivial types #138718

mrkajetanp · 2025-05-06T16:23:46Z

hlfir.copy_in implements copying non-contiguous array slices for functions that take in arrays required to be contiguous through flang-rt.

For large arrays of trivial types, this can incur overhead compared to a plain, inlined copy loop.

To address that, add a new InlineHLFIRCopyIn optimisation pass to inline hlfir.copy_in operations for trivial types.

For the time being, the pattern is only applied in cases where the copy-in does not require a corresponding copy-out, such as when the function being called declares the array parameter as intent(in).

Applying this optimisation reduces the runtime of thornado-mini's DeleptonizationProblem by about 10%.

Accompanying implementation for the RFC which can be found here.

tblah

The approach used in your draft looks good to me. Just style nits from me.

Please add lit tests before taking this out of draft.

flang/lib/Optimizer/HLFIR/Transforms/InlineHLFIRAssign.cpp

github-actions · 2025-05-22T14:24:08Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvmbot · 2025-05-22T14:57:05Z

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kajetan Puchalski (mrkajetanp)

Changes

hlfir.copy_in implements copying non-contiguous array slices for functions that take in arrays required to be contiguous through flang-rt.

For large arrays of trivial types, this can incur overhead compared to a plain, inlined copy loop.

To address that, add a new InlineHLFIRCopyIn optimisation pass to inline hlfir.copy_in operations for trivial types.

For the time being, the pattern is only applied in cases where the copy-in does not require a corresponding copy-out, such as when the function being called declares the array parameter as intent(in), and when the input array is not behind a pointer.

Applying this optimisation reduces the runtime of thornado-mini's DeleptonizationProblem by about 10%.

Accompanying implementation for the RFC which can be found here.

Patch is 21.97 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/138718.diff

5 Files Affected:

(modified) flang/include/flang/Optimizer/HLFIR/Passes.td (+4)
(modified) flang/lib/Optimizer/HLFIR/Transforms/CMakeLists.txt (+1)
(added) flang/lib/Optimizer/HLFIR/Transforms/InlineHLFIRCopyIn.cpp (+180)
(modified) flang/lib/Optimizer/Passes/Pipelines.cpp (+5)
(added) flang/test/HLFIR/inline-hlfir-copy-in.fir (+146)

diff --git a/flang/include/flang/Optimizer/HLFIR/Passes.td b/flang/include/flang/Optimizer/HLFIR/Passes.td
index d445140118073..04d7aec5fe489 100644
--- a/flang/include/flang/Optimizer/HLFIR/Passes.td
+++ b/flang/include/flang/Optimizer/HLFIR/Passes.td
@@ -69,6 +69,10 @@ def InlineHLFIRAssign : Pass<"inline-hlfir-assign"> {
   let summary = "Inline hlfir.assign operations";
 }
 
+def InlineHLFIRCopyIn : Pass<"inline-hlfir-copy-in"> {
+  let summary = "Inline hlfir.copy_in operations";
+}
+
 def PropagateFortranVariableAttributes : Pass<"propagate-fortran-attrs"> {
   let summary = "Propagate FortranVariableFlagsAttr attributes through HLFIR";
 }
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/CMakeLists.txt b/flang/lib/Optimizer/HLFIR/Transforms/CMakeLists.txt
index d959428ebd203..cc74273d9c5d9 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/CMakeLists.txt
+++ b/flang/lib/Optimizer/HLFIR/Transforms/CMakeLists.txt
@@ -5,6 +5,7 @@ add_flang_library(HLFIRTransforms
   ConvertToFIR.cpp
   InlineElementals.cpp
   InlineHLFIRAssign.cpp
+  InlineHLFIRCopyIn.cpp
   LowerHLFIRIntrinsics.cpp
   LowerHLFIROrderedAssignments.cpp
   ScheduleOrderedAssignments.cpp
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/InlineHLFIRCopyIn.cpp b/flang/lib/Optimizer/HLFIR/Transforms/InlineHLFIRCopyIn.cpp
new file mode 100644
index 0000000000000..1e2aecaf535a0
--- /dev/null
+++ b/flang/lib/Optimizer/HLFIR/Transforms/InlineHLFIRCopyIn.cpp
@@ -0,0 +1,180 @@
+//===- InlineHLFIRCopyIn.cpp - Inline hlfir.copy_in ops -------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// Transform hlfir.copy_in array operations into loop nests performing element
+// per element assignments. For simplicity, the inlining is done for trivial
+// data types when the copy_in does not require a corresponding copy_out and
+// when the input array is not behind a pointer. This may change in the future.
+//===----------------------------------------------------------------------===//
+
+#include "flang/Optimizer/Builder/FIRBuilder.h"
+#include "flang/Optimizer/Builder/HLFIRTools.h"
+#include "flang/Optimizer/Dialect/FIRType.h"
+#include "flang/Optimizer/HLFIR/HLFIROps.h"
+#include "flang/Optimizer/OpenMP/Passes.h"
+#include "mlir/IR/PatternMatch.h"
+#include "mlir/Support/LLVM.h"
+#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
+
+namespace hlfir {
+#define GEN_PASS_DEF_INLINEHLFIRCOPYIN
+#include "flang/Optimizer/HLFIR/Passes.h.inc"
+} // namespace hlfir
+
+#define DEBUG_TYPE "inline-hlfir-copy-in"
+
+static llvm::cl::opt<bool> noInlineHLFIRCopyIn(
+    "no-inline-hlfir-copy-in",
+    llvm::cl::desc("Do not inline hlfir.copy_in operations"),
+    llvm::cl::init(false));
+
+namespace {
+class InlineCopyInConversion : public mlir::OpRewritePattern<hlfir::CopyInOp> {
+public:
+  using mlir::OpRewritePattern<hlfir::CopyInOp>::OpRewritePattern;
+
+  llvm::LogicalResult
+  matchAndRewrite(hlfir::CopyInOp copyIn,
+                  mlir::PatternRewriter &rewriter) const override;
+};
+
+llvm::LogicalResult
+InlineCopyInConversion::matchAndRewrite(hlfir::CopyInOp copyIn,
+                                        mlir::PatternRewriter &rewriter) const {
+  fir::FirOpBuilder builder(rewriter, copyIn.getOperation());
+  mlir::Location loc = copyIn.getLoc();
+  hlfir::Entity inputVariable{copyIn.getVar()};
+  if (!fir::isa_trivial(inputVariable.getFortranElementType()))
+    return rewriter.notifyMatchFailure(copyIn,
+                                       "CopyInOp's data type is not trivial");
+
+  if (fir::isPointerType(inputVariable.getType()))
+    return rewriter.notifyMatchFailure(
+        copyIn, "CopyInOp's input variable is a pointer");
+
+  // There should be exactly one user of WasCopied - the corresponding
+  // CopyOutOp.
+  if (copyIn.getWasCopied().getUses().empty())
+    return rewriter.notifyMatchFailure(copyIn,
+                                       "CopyInOp's WasCopied has no uses");
+  // The copy out should always be present, either to actually copy or just
+  // deallocate memory.
+  auto copyOut = mlir::dyn_cast<hlfir::CopyOutOp>(
+      copyIn.getWasCopied().getUsers().begin().getCurrent().getUser());
+
+  if (!copyOut)
+    return rewriter.notifyMatchFailure(copyIn,
+                                       "CopyInOp has no direct CopyOut");
+
+  // Only inline the copy_in when copy_out does not need to be done, i.e. in
+  // case of intent(in).
+  if (copyOut.getVar())
+    return rewriter.notifyMatchFailure(copyIn, "CopyIn needs a copy-out");
+
+  inputVariable =
+      hlfir::derefPointersAndAllocatables(loc, builder, inputVariable);
+  mlir::Type resultAddrType = copyIn.getCopiedIn().getType();
+  mlir::Value isContiguous =
+      builder.create<fir::IsContiguousBoxOp>(loc, inputVariable);
+  mlir::Operation::result_range results =
+      builder
+          .genIfOp(loc, {resultAddrType, builder.getI1Type()}, isContiguous,
+                   /*withElseRegion=*/true)
+          .genThen([&]() {
+            mlir::Value falseVal = builder.create<mlir::arith::ConstantOp>(
+                loc, builder.getI1Type(), builder.getBoolAttr(false));
+            builder.create<fir::ResultOp>(
+                loc, mlir::ValueRange{inputVariable, falseVal});
+          })
+          .genElse([&] {
+            auto [temp, cleanup] =
+                hlfir::createTempFromMold(loc, builder, inputVariable);
+            mlir::Value shape = hlfir::genShape(loc, builder, inputVariable);
+            llvm::SmallVector<mlir::Value> extents =
+                hlfir::getIndexExtents(loc, builder, shape);
+            hlfir::LoopNest loopNest = hlfir::genLoopNest(
+                loc, builder, extents, /*isUnordered=*/true,
+                flangomp::shouldUseWorkshareLowering(copyIn));
+            builder.setInsertionPointToStart(loopNest.body);
+            hlfir::Entity elem = hlfir::getElementAt(
+                loc, builder, inputVariable, loopNest.oneBasedIndices);
+            elem = hlfir::loadTrivialScalar(loc, builder, elem);
+            hlfir::Entity tempElem = hlfir::getElementAt(
+                loc, builder, temp, loopNest.oneBasedIndices);
+            builder.create<hlfir::AssignOp>(loc, elem, tempElem);
+            builder.setInsertionPointAfter(loopNest.outerOp);
+
+            mlir::Value result;
+            // Make sure the result is always a boxed array by boxing it
+            // ourselves if need be.
+            if (mlir::isa<fir::BaseBoxType>(temp.getType())) {
+              result = temp;
+            } else {
+              fir::ReferenceType refTy =
+                  fir::ReferenceType::get(temp.getElementOrSequenceType());
+              mlir::Value refVal = builder.createConvert(loc, refTy, temp);
+              result =
+                  builder.create<fir::EmboxOp>(loc, resultAddrType, refVal);
+            }
+
+            builder.create<fir::ResultOp>(loc,
+                                          mlir::ValueRange{result, cleanup});
+          })
+          .getResults();
+
+  mlir::OpResult addr = results[0];
+  mlir::OpResult needsCleanup = results[1];
+
+  builder.setInsertionPoint(copyOut);
+  builder.genIfOp(loc, {}, needsCleanup, /*withElseRegion=*/false).genThen([&] {
+    auto boxAddr = builder.create<fir::BoxAddrOp>(loc, addr);
+    fir::HeapType heapType =
+        fir::HeapType::get(fir::BoxValue(addr).getBaseTy());
+    mlir::Value heapVal =
+        builder.createConvert(loc, heapType, boxAddr.getResult());
+    builder.create<fir::FreeMemOp>(loc, heapVal);
+  });
+  rewriter.eraseOp(copyOut);
+
+  mlir::Value tempBox = copyIn.getTempBox();
+
+  rewriter.replaceOp(copyIn, {addr, builder.genNot(loc, isContiguous)});
+
+  // The TempBox is only needed for flang-rt calls which we're no longer
+  // generating. It should have no uses left at this stage.
+  if (!tempBox.getUses().empty())
+    return mlir::failure();
+  rewriter.eraseOp(tempBox.getDefiningOp());
+
+  return mlir::success();
+}
+
+class InlineHLFIRCopyInPass
+    : public hlfir::impl::InlineHLFIRCopyInBase<InlineHLFIRCopyInPass> {
+public:
+  void runOnOperation() override {
+    mlir::MLIRContext *context = &getContext();
+
+    mlir::GreedyRewriteConfig config;
+    // Prevent the pattern driver from merging blocks.
+    config.setRegionSimplificationLevel(
+        mlir::GreedySimplifyRegionLevel::Disabled);
+
+    mlir::RewritePatternSet patterns(context);
+    if (!noInlineHLFIRCopyIn) {
+      patterns.insert<InlineCopyInConversion>(context);
+    }
+
+    if (mlir::failed(mlir::applyPatternsGreedily(
+            getOperation(), std::move(patterns), config))) {
+      mlir::emitError(getOperation()->getLoc(),
+                      "failure in hlfir.copy_in inlining");
+      signalPassFailure();
+    }
+  }
+};
+} // namespace
diff --git a/flang/lib/Optimizer/Passes/Pipelines.cpp b/flang/lib/Optimizer/Passes/Pipelines.cpp
index 77751908e35be..1779623fddc5a 100644
--- a/flang/lib/Optimizer/Passes/Pipelines.cpp
+++ b/flang/lib/Optimizer/Passes/Pipelines.cpp
@@ -255,6 +255,11 @@ void createHLFIRToFIRPassPipeline(mlir::PassManager &pm, bool enableOpenMP,
         pm, hlfir::createOptimizedBufferization);
     addNestedPassToAllTopLevelOperations<PassConstructor>(
         pm, hlfir::createInlineHLFIRAssign);
+
+    if (optLevel == llvm::OptimizationLevel::O3) {
+      addNestedPassToAllTopLevelOperations<PassConstructor>(
+          pm, hlfir::createInlineHLFIRCopyIn);
+    }
   }
   pm.addPass(hlfir::createLowerHLFIROrderedAssignments());
   pm.addPass(hlfir::createLowerHLFIRIntrinsics());
diff --git a/flang/test/HLFIR/inline-hlfir-copy-in.fir b/flang/test/HLFIR/inline-hlfir-copy-in.fir
new file mode 100644
index 0000000000000..7140e93f19979
--- /dev/null
+++ b/flang/test/HLFIR/inline-hlfir-copy-in.fir
@@ -0,0 +1,146 @@
+// Test inlining of hlfir.copy_in
+// RUN: fir-opt --inline-hlfir-copy-in %s | FileCheck %s
+
+// Test inlining of hlfir.copy_in that does not require the array to be copied out
+func.func private @_test_inline_copy_in(%arg0: !fir.box<!fir.array<?x?x?xf64>> {fir.bindc_name = "x"}, %arg1: !fir.ref<i32> {fir.bindc_name = "i"}, %arg2: !fir.ref<i32> {fir.bindc_name = "j"}) {
+  %0 = fir.alloca !fir.box<!fir.heap<!fir.array<?xf64>>>
+  %1 = fir.dummy_scope : !fir.dscope
+  %2:2 = hlfir.declare %arg1 dummy_scope %1 {uniq_name = "_QFFsb2Ei"} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
+  %3:2 = hlfir.declare %arg2 dummy_scope %1 {uniq_name = "_QFFsb2Ej"} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
+  %4:2 = hlfir.declare %arg0 dummy_scope %1 {uniq_name = "_QFFsb2Ex"} : (!fir.box<!fir.array<?x?x?xf64>>, !fir.dscope) -> (!fir.box<!fir.array<?x?x?xf64>>, !fir.box<!fir.array<?x?x?xf64>>)
+  %5 = fir.load %2#0 : !fir.ref<i32>
+  %6 = fir.convert %5 : (i32) -> i64
+  %c1 = arith.constant 1 : index
+  %c1_0 = arith.constant 1 : index
+  %7:3 = fir.box_dims %4#1, %c1_0 : (!fir.box<!fir.array<?x?x?xf64>>, index) -> (index, index, index)
+  %c1_1 = arith.constant 1 : index
+  %c0 = arith.constant 0 : index
+  %8 = arith.subi %7#1, %c1 : index
+  %9 = arith.addi %8, %c1_1 : index
+  %10 = arith.divsi %9, %c1_1 : index
+  %11 = arith.cmpi sgt, %10, %c0 : index
+  %12 = arith.select %11, %10, %c0 : index
+  %13 = fir.load %3#0 : !fir.ref<i32>
+  %14 = fir.convert %13 : (i32) -> i64
+  %15 = fir.shape %12 : (index) -> !fir.shape<1>
+  %16 = hlfir.designate %4#0 (%6, %c1:%7#1:%c1_1, %14)  shape %15 : (!fir.box<!fir.array<?x?x?xf64>>, i64, index, index, index, i64, !fir.shape<1>) -> !fir.box<!fir.array<?xf64>>
+  %c100_i32 = arith.constant 100 : i32
+  %17:2 = hlfir.copy_in %16 to %0 : (!fir.box<!fir.array<?xf64>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xf64>>>>) -> (!fir.box<!fir.array<?xf64>>, i1)
+  %18 = fir.box_addr %17#0 : (!fir.box<!fir.array<?xf64>>) -> !fir.ref<!fir.array<?xf64>>
+  %19:3 = hlfir.associate %c100_i32 {adapt.valuebyref} : (i32) -> (!fir.ref<i32>, !fir.ref<i32>, i1)
+  fir.call @_QFPsb(%18, %19#0) fastmath<contract> : (!fir.ref<!fir.array<?xf64>>, !fir.ref<i32>) -> ()
+  hlfir.copy_out %0, %17#1 : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xf64>>>>, i1) -> ()
+  hlfir.end_associate %19#1, %19#2 : !fir.ref<i32>, i1
+  return
+}
+
+// CHECK-LABEL:   func.func private @_test_inline_copy_in(
+// CHECK-SAME:                                          %[[VAL_0:.*]]: !fir.box<!fir.array<?x?x?xf64>> {fir.bindc_name = "x"},
+// CHECK-SAME:                                          %[[VAL_1:.*]]: !fir.ref<i32> {fir.bindc_name = "i"},
+// CHECK-SAME:                                          %[[VAL_2:.*]]: !fir.ref<i32> {fir.bindc_name = "j"}) {
+// CHECK:    %[[VAL_3:.*]] = arith.constant true
+// CHECK:    %[[VAL_4:.*]] = arith.constant false
+// CHECK:    %[[VAL_5:.*]] = arith.constant 100 : i32
+// CHECK:    %[[VAL_6:.*]] = arith.constant 0 : index
+// CHECK:    %[[VAL_7:.*]] = arith.constant 1 : index
+// CHECK:    %[[VAL_8:.*]] = fir.dummy_scope : !fir.dscope
+// CHECK:    %[[VAL_22:.*]]:2 = hlfir.declare %[[VAL_1:.*]] dummy_scope %[[VAL_8:.*]] {uniq_name = "_QFFsb2Ei"} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
+// CHECK:    %[[VAL_9:.*]]:2 = hlfir.declare %[[VAL_2:.*]] dummy_scope %[[VAL_8:.*]] {uniq_name = "_QFFsb2Ej"} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
+// CHECK:    %[[VAL_10:.*]]:2 = hlfir.declare %[[VAL_0:.*]] dummy_scope %[[VAL_8:.*]] {uniq_name = "_QFFsb2Ex"} : (!fir.box<!fir.array<?x?x?xf64>>, !fir.dscope) -> (!fir.box<!fir.array<?x?x?xf64>>, !fir.box<!fir.array<?x?x?xf64>>)
+// CHECK:    %[[VAL_11:.*]] = fir.load %[[VAL_22:.*]]#0 : !fir.ref<i32>
+// CHECK:    %[[VAL_12:.*]] = fir.convert %[[VAL_11:.*]] : (i32) -> i64
+// CHECK:    %[[VAL_13:.*]]:3 = fir.box_dims %[[VAL_10:.*]]#1, %[[VAL_7:.*]] : (!fir.box<!fir.array<?x?x?xf64>>, index) -> (index, index, index)
+// CHECK:    %[[VAL_14:.*]] = arith.cmpi sgt, %[[VAL_13:.*]]#1, %[[VAL_6:.*]] : index
+// CHECK:    %[[VAL_15:.*]] = arith.select %[[VAL_14:.*]], %[[VAL_13:.*]]#1, %[[VAL_6:.*]] : index
+// CHECK:    %[[VAL_16:.*]] = fir.load %[[VAL_9:.*]]#0 : !fir.ref<i32>
+// CHECK:    %[[VAL_17:.*]] = fir.convert %[[VAL_16:.*]] : (i32) -> i64
+// CHECK:    %[[VAL_18:.*]] = fir.shape %[[VAL_15:.*]] : (index) -> !fir.shape<1>
+// CHECK:    %[[VAL_19:.*]] = hlfir.designate %[[VAL_10:.*]]#0 (%[[VAL_12:.*]], %[[VAL_7:.*]]:%[[VAL_13:.*]]#1:%[[VAL_7:.*]], %[[VAL_17:.*]])  shape %[[VAL_18:.*]] : (!fir.box<!fir.array<?x?x?xf64>>, i64, index, index, index, i64, !fir.shape<1>) -> !fir.box<!fir.array<?xf64>>
+// CHECK:    %[[VAL_20:.*]] = fir.is_contiguous_box %[[VAL_19:.*]] whole : (!fir.box<!fir.array<?xf64>>) -> i1
+// CHECK:    %[[VAL_21:.*]]:2 = fir.if %[[VAL_20:.*]] -> (!fir.box<!fir.array<?xf64>>, i1) {
+// CHECK:      fir.result %[[VAL_19:.*]], %[[VAL_4:.*]] : !fir.box<!fir.array<?xf64>>, i1
+// CHECK:    } else {
+// CHECK:      %[[VAL_24:.*]] = fir.allocmem !fir.array<?xf64>, %[[VAL_15:.*]] {bindc_name = ".tmp", uniq_name = ""}
+// CHECK:      %[[VAL_25:.*]]:2 = hlfir.declare %[[VAL_24:.*]](%[[VAL_18:.*]]) {uniq_name = ".tmp"} : (!fir.heap<!fir.array<?xf64>>, !fir.shape<1>) -> (!fir.box<!fir.array<?xf64>>, !fir.heap<!fir.array<?xf64>>)
+// CHECK:      fir.do_loop %arg3 = %[[VAL_7:.*]] to %[[VAL_15:.*]] step %[[VAL_7:.*]] unordered {
+// CHECK:        %[[VAL_26:.*]] = hlfir.designate %[[VAL_19:.*]] (%arg3)  : (!fir.box<!fir.array<?xf64>>, index) -> !fir.ref<f64>
+// CHECK:        %[[VAL_27:.*]] = fir.load %[[VAL_26:.*]] : !fir.ref<f64>
+// CHECK:        %[[VAL_28:.*]] = hlfir.designate %[[VAL_25:.*]]#0 (%arg3)  : (!fir.box<!fir.array<?xf64>>, index) -> !fir.ref<f64>
+// CHECK:        hlfir.assign %[[VAL_27:.*]] to %[[VAL_28:.*]] : f64, !fir.ref<f64>
+// CHECK:      }
+// CHECK:      fir.result %[[VAL_25:.*]]#0, %[[VAL_3:.*]] : !fir.box<!fir.array<?xf64>>, i1
+// CHECK:    }
+// CHECK:    %[[VAL_22:.*]] = fir.box_addr %[[VAL_21:.*]]#0 : (!fir.box<!fir.array<?xf64>>) -> !fir.ref<!fir.array<?xf64>>
+// CHECK:    %[[VAL_23:.*]]:3 = hlfir.associate %[[VAL_5:.*]] {adapt.valuebyref} : (i32) -> (!fir.ref<i32>, !fir.ref<i32>, i1)
+// CHECK:    fir.call @_QFPsb(%[[VAL_22:.*]], %[[VAL_23:.*]]#0) fastmath<contract> : (!fir.ref<!fir.array<?xf64>>, !fir.ref<i32>) -> ()
+// CHECK:    fir.if %[[VAL_21:.*]]#1 {
+// CHECK:      %[[VAL_24:.*]] = fir.box_addr %[[VAL_21:.*]]#0 : (!fir.box<!fir.array<?xf64>>) -> !fir.ref<!fir.array<?xf64>>
+// CHECK:      %[[VAL_25:.*]] = fir.convert %[[VAL_24:.*]] : (!fir.ref<!fir.array<?xf64>>) -> !fir.heap<!fir.array<?xf64>>
+// CHECK:      fir.freemem %[[VAL_25:.*]] : !fir.heap<!fir.array<?xf64>>
+// CHECK:    }
+// CHECK:    hlfir.end_associate %[[VAL_23:.*]]#1, %[[VAL_23:.*]]#2 : !fir.ref<i32>, i1
+// CHECK:    return
+// CHECK:  }
+
+// Test not inlining of hlfir.copy_in that requires the array to be copied out
+func.func private @_test_no_inline_copy_in(%arg0: !fir.box<!fir.array<?x?x?xf64>> {fir.bindc_name = "x"}, %arg1: !fir.ref<i32> {fir.bindc_name = "i"}, %arg2: !fir.ref<i32> {fir.bindc_name = "j"}) {
+  %0 = fir.alloca !fir.box<!fir.heap<!fir.array<?xf64>>>
+  %1 = fir.dummy_scope : !fir.dscope
+  %2:2 = hlfir.declare %arg1 dummy_scope %1 {uniq_name = "_QFFsb2Ei"} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
+  %3:2 = hlfir.declare %arg2 dummy_scope %1 {uniq_name = "_QFFsb2Ej"} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
+  %4:2 = hlfir.declare %arg0 dummy_scope %1 {uniq_name = "_QFFsb2Ex"} : (!fir.box<!fir.array<?x?x?xf64>>, !fir.dscope) -> (!fir.box<!fir.array<?x?x?xf64>>, !fir.box<!fir.array<?x?x?xf64>>)
+  %5 = fir.load %2#0 : !fir.ref<i32>
+  %6 = fir.convert %5 : (i32) -> i64
+  %c1 = arith.constant 1 : index
+  %c1_0 = arith.constant 1 : index
+  %7:3 = fir.box_dims %4#1, %c1_0 : (!fir.box<!fir.array<?x?x?xf64>>, index) -> (index, index, index)
+  %c1_1 = arith.constant 1 : index
+  %c0 = arith.constant 0 : index
+  %8 = arith.subi %7#1, %c1 : index
+  %9 = arith.addi %8, %c1_1 : index
+  %10 = arith.divsi %9, %c1_1 : index
+  %11 = arith.cmpi sgt, %10, %c0 : index
+  %12 = arith.select %11, %10, %c0 : index
+  %13 = fir.load %3#0 : !fir.ref<i32>
+  %14 = fir.convert %13 : (i32) -> i64
+  %15 = fir.shape %12 : (index) -> !fir.shape<1>
+  %16 = hlfir.designate %4#0 (%6, %c1:%7#1:%c1_1, %14)  shape %15 : (!fir.box<!fir.array<?x?x?xf64>>, i64, index, index, index, i64, !fir.shape<1>) -> !fir.box<!fir.array<?xf64>>
+  %c100_i32 = arith.constant 100 : i32
+  %17:2 = hlfir.copy_in %16 to %0 : (!fir.box<!fir.array<?xf64>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xf64>>>>) -> (!fir.box<!fir.array<?xf64>>, i1)
+  %18 = fir.box_addr %17#0 : (!fir.box<!fir.array<?xf64>>) -> !fir.ref<!fir.array<?xf64>>
+  %19:3 = hlfir.associate %c100_i32 {adapt.valuebyref} : (i32) -> (!fir.ref<i32>, !fir.ref<i32>, i1)
+  fir.call @_QFPsb(%18, %19#1) fastmath<contract> : (!fir.ref<!fir.array<?xf64>>, !fir.ref<i32>) -> ()
+  hlfir.copy_out %0, %17#1 to %16 : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xf64>>>>, i1, !fir.box<!fir.array<?xf64>>) -> ()
+  hlfir.end_associate %19#1, %19#2 : !fir.ref<i32>, i1
+  return
+}
+
+// CHECK-LABEL:  func.func private @_test_no_inline_copy_in(
+// CHECK-SAME:                                             %[[VAL_0:.*]]: !fir.box<!fir.array<?x?x?xf64>> {fir.bindc_name = "x"},
+// CHECK-SAME:                                             %[[VAL_1:.*]]: !fir.ref<i32> {fir.bindc_name = "i"},
+// CHECK-SAME:                                             %[[VAL_2:.*]]: !fir.ref<i32> {fir.bindc_name = "j"}) {
+// CHECK:    %[[VAL_3:.*]] = arith.constant 100 : i32
+// CHECK:    %[[VAL_4:.*]] = arith.constant 0 : index
+// CHECK:    %[[VAL_5:.*]] = arith.constant 1 : index
+// CHECK:    %[[VAL_6:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xf64>>>
+// CHECK:    %[[VAL_7:.*]] = fir.dummy_scope : !fir.dscope
+// CHECK:    %[[VAL_8:.*]]:2 = hlfir.declare %[[VAL_1:.*]] dummy_scope %[[VAL_7:.*]] {uniq_name = "_QFFsb2Ei"} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
+// CHECK:    %[[VAL_9:.*]]:2 = hlfir.declare %[[VAL_2:.*]] dummy_scope %[[VAL_7:.*]] {uniq_name = "_QFFsb2Ej"} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
+// CHECK:    %[[VAL_10:.*]]:2 = hlfir.declare %[[VAL_0:.*]] dummy_scope %[[V...
[truncated]

vzakhari

Thank you for the patch!

flang/lib/Optimizer/HLFIR/Transforms/InlineHLFIRCopyIn.cpp

hlfir.copy_in implements copying non-contiguous array slices for functions that take in arrays required to be contiguous through a flang-rt function that calls memcpy/memmove separately on each element. For large arrays of trivial types, this can incur considerable overhead compared to a plain copy loop that is better able to take advantage of hardware pipelines. To address that, extend the InlineHLFIRAssign optimisation pass with a new pattern for inlining hlfir.copy_in operations for trivial types. For the time being, the pattern is only applied in cases where the copy-in does not require a corresponding copy-out, such as when the function being called declares the array parameter as intent(in). Applying this optimisation reduces the runtime of thornado-mini's DeleptonizationProblem by a factor of about 1/3rd. Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>

flang/lib/Optimizer/HLFIR/Transforms/InlineHLFIRCopyIn.cpp

flang/lib/Optimizer/Builder/HLFIRTools.cpp

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>

flang/lib/Optimizer/HLFIR/Transforms/InlineHLFIRCopyIn.cpp

tblah

One minor comment. This looks great!

flang/lib/Optimizer/Builder/HLFIRTools.cpp

tblah

LGTM

tblah reviewed May 6, 2025

View reviewed changes

mrkajetanp force-pushed the hlfir-inline-copy-in branch from 0c9ef1c to cc62dcd Compare May 22, 2025 14:21

mrkajetanp force-pushed the hlfir-inline-copy-in branch 3 times, most recently from 24bbcbd to 36a6257 Compare May 22, 2025 14:52

mrkajetanp marked this pull request as ready for review May 22, 2025 14:56

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels May 22, 2025

mrkajetanp requested review from jeanPerier, vzakhari and yus3710-fj May 22, 2025 14:56

vzakhari reviewed May 27, 2025

View reviewed changes

mrkajetanp force-pushed the hlfir-inline-copy-in branch from 36a6257 to 5d27cf2 Compare May 28, 2025 14:32

mrkajetanp requested review from tblah and vzakhari May 28, 2025 14:34

mrkajetanp force-pushed the hlfir-inline-copy-in branch from 5d27cf2 to bed8a6a Compare May 28, 2025 19:53

mrkajetanp added 6 commits May 29, 2025 11:14

Add tests

154b758

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>

Address Tom's review comments

6d334d7

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>

Separate copy_in inlining into its own pass, add flag

6a9d0fd

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>

Support arrays behind a pointer, add metadata to disable vectorizing

63f66ae

Keep the copy-out to deallocate the temporary

837d45b

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>

mrkajetanp force-pushed the hlfir-inline-copy-in branch from bed8a6a to 837d45b Compare May 29, 2025 14:13

vzakhari reviewed May 29, 2025

View reviewed changes

Use rebox, assumed-rank handling, expand tests

4c0e8e8

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>

mrkajetanp requested a review from vzakhari June 4, 2025 12:59

vzakhari approved these changes Jun 5, 2025

View reviewed changes

flang/lib/Optimizer/HLFIR/Transforms/InlineHLFIRCopyIn.cpp Show resolved Hide resolved

tblah reviewed Jun 6, 2025

View reviewed changes

flang/lib/Optimizer/Builder/HLFIRTools.cpp Show resolved Hide resolved

tblah approved these changes Jun 6, 2025

View reviewed changes

Merge branch 'main' into hlfir-inline-copy-in

e649cfe

mrkajetanp merged commit 0d40574 into llvm:main Jun 6, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[flang] Inline hlfir.copy_in for trivial types #138718

[flang] Inline hlfir.copy_in for trivial types #138718

mrkajetanp commented May 6, 2025 •

edited

Loading

Uh oh!

tblah left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented May 22, 2025 •

edited

Loading

Uh oh!

llvmbot commented May 22, 2025

Uh oh!

vzakhari left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tblah left a comment

Uh oh!

Uh oh!

tblah left a comment

Uh oh!

Uh oh!

Uh oh!

[flang] Inline hlfir.copy_in for trivial types #138718

[flang] Inline hlfir.copy_in for trivial types #138718

Conversation

mrkajetanp commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tblah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented May 22, 2025

Uh oh!

vzakhari left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tblah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tblah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mrkajetanp commented May 6, 2025 •

edited

Loading

github-actions bot commented May 22, 2025 •

edited

Loading