-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[flang] Disable noalias by default #142128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-flang-openmp @llvm/pr-subscribers-flang-fir-hlfir Author: Tom Eccles (tblah) Changes…. (#140803)"" This reverts commit a0d699a. With this enabled we see a 70% performance regression for exchange2_r on neoverse-v1 (aws graviton 3) using This appears to be because function specialization is no longer kicking in during LTO for digits_2. This can be seen in the output executable: previously it contained specialized copies of the function with names like The bug is not in this patch as such, instead in the function specialization pass but due to the size of the regression I would like to request that this is reverted until function specialization has been fixed. Patch is 52.48 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142128.diff 30 Files Affected:
diff --git a/flang/include/flang/Optimizer/Transforms/Passes.td b/flang/include/flang/Optimizer/Transforms/Passes.td
index 71493535db8ba..8e7f12505c59c 100644
--- a/flang/include/flang/Optimizer/Transforms/Passes.td
+++ b/flang/include/flang/Optimizer/Transforms/Passes.td
@@ -431,12 +431,6 @@ def FunctionAttr : Pass<"function-attr", "mlir::func::FuncOp"> {
"Set the unsafe-fp-math attribute on functions in the module.">,
Option<"tuneCPU", "tune-cpu", "std::string", /*default=*/"",
"Set the tune-cpu attribute on functions in the module.">,
- Option<"setNoCapture", "set-nocapture", "bool", /*default=*/"false",
- "Set LLVM nocapture attribute on function arguments, "
- "if possible">,
- Option<"setNoAlias", "set-noalias", "bool", /*default=*/"false",
- "Set LLVM noalias attribute on function arguments, "
- "if possible">,
];
}
diff --git a/flang/lib/Optimizer/Passes/Pipelines.cpp b/flang/lib/Optimizer/Passes/Pipelines.cpp
index 378913fcb1329..77751908e35be 100644
--- a/flang/lib/Optimizer/Passes/Pipelines.cpp
+++ b/flang/lib/Optimizer/Passes/Pipelines.cpp
@@ -350,15 +350,11 @@ void createDefaultFIRCodeGenPassPipeline(mlir::PassManager &pm,
else
framePointerKind = mlir::LLVM::framePointerKind::FramePointerKind::None;
- bool setNoCapture = false, setNoAlias = false;
- if (config.OptLevel.isOptimizingForSpeed())
- setNoCapture = setNoAlias = true;
-
pm.addPass(fir::createFunctionAttr(
{framePointerKind, config.InstrumentFunctionEntry,
config.InstrumentFunctionExit, config.NoInfsFPMath, config.NoNaNsFPMath,
config.ApproxFuncFPMath, config.NoSignedZerosFPMath, config.UnsafeFPMath,
- /*tuneCPU=*/"", setNoCapture, setNoAlias}));
+ ""}));
if (config.EnableOpenMP) {
pm.addNestedPass<mlir::func::FuncOp>(
diff --git a/flang/lib/Optimizer/Transforms/FunctionAttr.cpp b/flang/lib/Optimizer/Transforms/FunctionAttr.cpp
index c8cdba0d6f9c4..43e4c1a7af3cd 100644
--- a/flang/lib/Optimizer/Transforms/FunctionAttr.cpp
+++ b/flang/lib/Optimizer/Transforms/FunctionAttr.cpp
@@ -27,8 +27,17 @@ namespace {
class FunctionAttrPass : public fir::impl::FunctionAttrBase<FunctionAttrPass> {
public:
- FunctionAttrPass(const fir::FunctionAttrOptions &options) : Base{options} {}
- FunctionAttrPass() = default;
+ FunctionAttrPass(const fir::FunctionAttrOptions &options) {
+ instrumentFunctionEntry = options.instrumentFunctionEntry;
+ instrumentFunctionExit = options.instrumentFunctionExit;
+ framePointerKind = options.framePointerKind;
+ noInfsFPMath = options.noInfsFPMath;
+ noNaNsFPMath = options.noNaNsFPMath;
+ approxFuncFPMath = options.approxFuncFPMath;
+ noSignedZerosFPMath = options.noSignedZerosFPMath;
+ unsafeFPMath = options.unsafeFPMath;
+ }
+ FunctionAttrPass() {}
void runOnOperation() override;
};
@@ -47,28 +56,14 @@ void FunctionAttrPass::runOnOperation() {
if ((isFromModule || !func.isDeclaration()) &&
!fir::hasBindcAttr(func.getOperation())) {
llvm::StringRef nocapture = mlir::LLVM::LLVMDialect::getNoCaptureAttrName();
- llvm::StringRef noalias = mlir::LLVM::LLVMDialect::getNoAliasAttrName();
mlir::UnitAttr unitAttr = mlir::UnitAttr::get(func.getContext());
for (auto [index, argType] : llvm::enumerate(func.getArgumentTypes())) {
- bool isNoCapture = false;
- bool isNoAlias = false;
if (mlir::isa<fir::ReferenceType>(argType) &&
!func.getArgAttr(index, fir::getTargetAttrName()) &&
!func.getArgAttr(index, fir::getAsynchronousAttrName()) &&
- !func.getArgAttr(index, fir::getVolatileAttrName())) {
- isNoCapture = true;
- isNoAlias = !fir::isPointerType(argType);
- } else if (mlir::isa<fir::BaseBoxType>(argType)) {
- // !fir.box arguments will be passed as descriptor pointers
- // at LLVM IR dialect level - they cannot be captured,
- // and cannot alias with anything within the function.
- isNoCapture = isNoAlias = true;
- }
- if (isNoCapture && setNoCapture)
+ !func.getArgAttr(index, fir::getVolatileAttrName()))
func.setArgAttr(index, nocapture, unitAttr);
- if (isNoAlias && setNoAlias)
- func.setArgAttr(index, noalias, unitAttr);
}
}
diff --git a/flang/test/Fir/array-coor.fir b/flang/test/Fir/array-coor.fir
index 2caa727a10c50..a765670d20b28 100644
--- a/flang/test/Fir/array-coor.fir
+++ b/flang/test/Fir/array-coor.fir
@@ -33,7 +33,7 @@ func.func @test_array_coor_box_component_slice(%arg0: !fir.box<!fir.array<2x!fir
func.func private @take_int(%arg0: !fir.ref<i32>) -> ()
// CHECK-LABEL: define void @test_array_coor_box_component_slice(
-// CHECK-SAME: ptr {{[^%]*}}%[[VAL_0:.*]])
+// CHECK-SAME: ptr %[[VAL_0:.*]])
// CHECK: %[[VAL_1:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]], ptr, [1 x i64] }, ptr %[[VAL_0]], i32 0, i32 7, i32 0, i32 2
// CHECK: %[[VAL_2:.*]] = load i64, ptr %[[VAL_1]]
// CHECK: %[[VAL_3:.*]] = mul nsw i64 1, %[[VAL_2]]
diff --git a/flang/test/Fir/arrayset.fir b/flang/test/Fir/arrayset.fir
index cb26971cb962d..dab939aba1702 100644
--- a/flang/test/Fir/arrayset.fir
+++ b/flang/test/Fir/arrayset.fir
@@ -1,7 +1,7 @@
// RUN: tco %s | FileCheck %s
// RUN: %flang_fc1 -emit-llvm %s -o - | FileCheck %s
-// CHECK-LABEL: define void @x(
+// CHECK-LABEL: define void @x(ptr captures(none) %0)
func.func @x(%arr : !fir.ref<!fir.array<10xf32>>) {
%1 = arith.constant 0 : index
%2 = arith.constant 9 : index
diff --git a/flang/test/Fir/arrexp.fir b/flang/test/Fir/arrexp.fir
index 6c7f71f6f1f9c..924c1fab8d84b 100644
--- a/flang/test/Fir/arrexp.fir
+++ b/flang/test/Fir/arrexp.fir
@@ -1,7 +1,7 @@
// RUN: tco %s | FileCheck %s
// CHECK-LABEL: define void @f1
-// CHECK: (ptr {{[^%]*}}%[[A:[^,]*]], {{.*}}, float %[[F:.*]])
+// CHECK: (ptr captures(none) %[[A:[^,]*]], {{.*}}, float %[[F:.*]])
func.func @f1(%a : !fir.ref<!fir.array<?x?xf32>>, %n : index, %m : index, %o : index, %p : index, %f : f32) {
%c1 = arith.constant 1 : index
%s = fir.shape_shift %o, %n, %p, %m : (index, index, index, index) -> !fir.shapeshift<2>
@@ -23,7 +23,7 @@ func.func @f1(%a : !fir.ref<!fir.array<?x?xf32>>, %n : index, %m : index, %o : i
}
// CHECK-LABEL: define void @f2
-// CHECK: (ptr {{[^%]*}}%[[A:[^,]*]], {{.*}}, float %[[F:.*]])
+// CHECK: (ptr captures(none) %[[A:[^,]*]], {{.*}}, float %[[F:.*]])
func.func @f2(%a : !fir.ref<!fir.array<?x?xf32>>, %b : !fir.ref<!fir.array<?x?xf32>>, %n : index, %m : index, %o : index, %p : index, %f : f32) {
%c1 = arith.constant 1 : index
%s = fir.shape_shift %o, %n, %p, %m : (index, index, index, index) -> !fir.shapeshift<2>
@@ -47,7 +47,7 @@ func.func @f2(%a : !fir.ref<!fir.array<?x?xf32>>, %b : !fir.ref<!fir.array<?x?xf
}
// CHECK-LABEL: define void @f3
-// CHECK: (ptr {{[^%]*}}%[[A:[^,]*]], {{.*}}, float %[[F:.*]])
+// CHECK: (ptr captures(none) %[[A:[^,]*]], {{.*}}, float %[[F:.*]])
func.func @f3(%a : !fir.ref<!fir.array<?x?xf32>>, %b : !fir.ref<!fir.array<?x?xf32>>, %n : index, %m : index, %o : index, %p : index, %f : f32) {
%c1 = arith.constant 1 : index
%s = fir.shape_shift %o, %n, %p, %m : (index, index, index, index) -> !fir.shapeshift<2>
@@ -72,7 +72,7 @@ func.func @f3(%a : !fir.ref<!fir.array<?x?xf32>>, %b : !fir.ref<!fir.array<?x?xf
}
// CHECK-LABEL: define void @f4
-// CHECK: (ptr {{[^%]*}}%[[A:[^,]*]], {{.*}}, float %[[F:.*]])
+// CHECK: (ptr captures(none) %[[A:[^,]*]], {{.*}}, float %[[F:.*]])
func.func @f4(%a : !fir.ref<!fir.array<?x?xf32>>, %b : !fir.ref<!fir.array<?x?xf32>>, %n : index, %m : index, %o : index, %p : index, %f : f32) {
%c1 = arith.constant 1 : index
%s = fir.shape_shift %o, %n, %p, %m : (index, index, index, index) -> !fir.shapeshift<2>
@@ -102,7 +102,7 @@ func.func @f4(%a : !fir.ref<!fir.array<?x?xf32>>, %b : !fir.ref<!fir.array<?x?xf
// `a = b + f`, with and v assumed shapes.
// Tests that the stride from the descriptor is used.
// CHECK-LABEL: define void @f5
-// CHECK: (ptr {{[^%]*}}%[[A:.*]], ptr {{[^%]*}}%[[B:.*]], float %[[F:.*]])
+// CHECK: (ptr %[[A:.*]], ptr %[[B:.*]], float %[[F:.*]])
func.func @f5(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: f32) {
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
@@ -135,7 +135,7 @@ func.func @f5(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf
// contiguous array (e.g. `a(2:10:1) = a(1:9:1) + f`, with a assumed shape).
// Test that a temp is created.
// CHECK-LABEL: define void @f6
-// CHECK: (ptr {{[^%]*}}%[[A:[^,]*]], float %[[F:.*]])
+// CHECK: (ptr %[[A:[^,]*]], float %[[F:.*]])
func.func @f6(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: f32) {
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
@@ -165,7 +165,7 @@ func.func @f6(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: f32) {
// Non contiguous array with lower bounds (x = y(100), with y(4:))
// Test array_coor offset computation.
// CHECK-LABEL: define void @f7(
-// CHECK: ptr {{[^%]*}}%[[X:[^,]*]], ptr {{[^%]*}}%[[Y:.*]])
+// CHECK: ptr captures(none) %[[X:[^,]*]], ptr %[[Y:.*]])
func.func @f7(%arg0: !fir.ref<f32>, %arg1: !fir.box<!fir.array<?xf32>>) {
%c4 = arith.constant 4 : index
%c100 = arith.constant 100 : index
@@ -181,7 +181,7 @@ func.func @f7(%arg0: !fir.ref<f32>, %arg1: !fir.box<!fir.array<?xf32>>) {
// Test A(:, :)%x reference codegen with A constant shape.
// CHECK-LABEL: define void @f8(
-// CHECK-SAME: ptr {{[^%]*}}%[[A:.*]], i32 %[[I:.*]])
+// CHECK-SAME: ptr captures(none) %[[A:.*]], i32 %[[I:.*]])
func.func @f8(%a : !fir.ref<!fir.array<2x2x!fir.type<t{i:i32}>>>, %i : i32) {
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
@@ -198,7 +198,7 @@ func.func @f8(%a : !fir.ref<!fir.array<2x2x!fir.type<t{i:i32}>>>, %i : i32) {
// Test casts in in array_coor offset computation when type parameters are not i64
// CHECK-LABEL: define ptr @f9(
-// CHECK-SAME: i32 %[[I:.*]], i64 %{{.*}}, i64 %{{.*}}, ptr {{[^%]*}}%[[C:.*]])
+// CHECK-SAME: i32 %[[I:.*]], i64 %{{.*}}, i64 %{{.*}}, ptr captures(none) %[[C:.*]])
func.func @f9(%i: i32, %e : i64, %j: i64, %c: !fir.ref<!fir.array<?x?x!fir.char<1,?>>>) -> !fir.ref<!fir.char<1,?>> {
%s = fir.shape %e, %e : (i64, i64) -> !fir.shape<2>
// CHECK: %[[CAST:.*]] = sext i32 %[[I]] to i64
diff --git a/flang/test/Fir/box-offset-codegen.fir b/flang/test/Fir/box-offset-codegen.fir
index 11d5750ffc385..15c9a11e5aefe 100644
--- a/flang/test/Fir/box-offset-codegen.fir
+++ b/flang/test/Fir/box-offset-codegen.fir
@@ -7,7 +7,7 @@ func.func @scalar_addr(%scalar : !fir.ref<!fir.box<!fir.type<t>>>) -> !fir.llvm_
return %addr : !fir.llvm_ptr<!fir.ref<!fir.type<t>>>
}
// CHECK-LABEL: define ptr @scalar_addr(
-// CHECK-SAME: ptr {{[^%]*}}%[[BOX:.*]]){{.*}}{
+// CHECK-SAME: ptr captures(none) %[[BOX:.*]]){{.*}}{
// CHECK: %[[VAL_0:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, ptr %[[BOX]], i32 0, i32 0
// CHECK: ret ptr %[[VAL_0]]
@@ -16,7 +16,7 @@ func.func @scalar_tdesc(%scalar : !fir.ref<!fir.box<!fir.type<t>>>) -> !fir.llvm
return %tdesc : !fir.llvm_ptr<!fir.tdesc<!fir.type<t>>>
}
// CHECK-LABEL: define ptr @scalar_tdesc(
-// CHECK-SAME: ptr {{[^%]*}}%[[BOX:.*]]){{.*}}{
+// CHECK-SAME: ptr captures(none) %[[BOX:.*]]){{.*}}{
// CHECK: %[[VAL_0:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, ptr %[[BOX]], i32 0, i32 7
// CHECK: ret ptr %[[VAL_0]]
@@ -25,7 +25,7 @@ func.func @array_addr(%array : !fir.ref<!fir.class<!fir.ptr<!fir.array<?x!fir.ty
return %addr : !fir.llvm_ptr<!fir.ptr<!fir.array<?x!fir.type<t>>>>
}
// CHECK-LABEL: define ptr @array_addr(
-// CHECK-SAME: ptr {{[^%]*}}%[[BOX:.*]]){{.*}}{
+// CHECK-SAME: ptr captures(none) %[[BOX:.*]]){{.*}}{
// CHECK: %[[VAL_0:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]], ptr, [1 x i64] }, ptr %[[BOX]], i32 0, i32 0
// CHECK: ret ptr %[[VAL_0]]
@@ -34,6 +34,6 @@ func.func @array_tdesc(%array : !fir.ref<!fir.class<!fir.ptr<!fir.array<?x!fir.t
return %tdesc : !fir.llvm_ptr<!fir.tdesc<!fir.type<t>>>
}
// CHECK-LABEL: define ptr @array_tdesc(
-// CHECK-SAME: ptr {{[^%]*}}%[[BOX:.*]]){{.*}}{
+// CHECK-SAME: ptr captures(none) %[[BOX:.*]]){{.*}}{
// CHECK: %[[VAL_0:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]], ptr, [1 x i64] }, ptr %[[BOX]], i32 0, i32 8
// CHECK: ret ptr %[[VAL_0]]
diff --git a/flang/test/Fir/box-typecode.fir b/flang/test/Fir/box-typecode.fir
index a8d43eba39889..766c5165b947c 100644
--- a/flang/test/Fir/box-typecode.fir
+++ b/flang/test/Fir/box-typecode.fir
@@ -6,7 +6,7 @@ func.func @test_box_typecode(%a: !fir.class<none>) -> i32 {
}
// CHECK-LABEL: @test_box_typecode(
-// CHECK-SAME: ptr {{[^%]*}}%[[BOX:.*]])
+// CHECK-SAME: ptr %[[BOX:.*]])
// CHECK: %[[GEP:.*]] = getelementptr { ptr, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}} }, ptr %[[BOX]], i32 0, i32 4
// CHECK: %[[TYPE_CODE:.*]] = load i8, ptr %[[GEP]]
// CHECK: %[[TYPE_CODE_CONV:.*]] = sext i8 %[[TYPE_CODE]] to i32
diff --git a/flang/test/Fir/box.fir b/flang/test/Fir/box.fir
index c0cf3d8375983..5e931a2e0d9aa 100644
--- a/flang/test/Fir/box.fir
+++ b/flang/test/Fir/box.fir
@@ -24,7 +24,7 @@ func.func private @g(%b : !fir.box<f32>)
func.func private @ga(%b : !fir.box<!fir.array<?xf32>>)
// CHECK-LABEL: define void @f
-// CHECK: (ptr {{[^%]*}}%[[ARG:.*]])
+// CHECK: (ptr captures(none) %[[ARG:.*]])
func.func @f(%a : !fir.ref<f32>) {
// CHECK: %[[DESC:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
// CHECK: %[[INS0:.*]] = insertvalue {{.*}} { ptr undef, i64 4, i32 20240719, i8 0, i8 27, i8 0, i8 0 }, ptr %[[ARG]], 0
@@ -38,7 +38,7 @@ func.func @f(%a : !fir.ref<f32>) {
}
// CHECK-LABEL: define void @fa
-// CHECK: (ptr {{[^%]*}}%[[ARG:.*]])
+// CHECK: (ptr captures(none) %[[ARG:.*]])
func.func @fa(%a : !fir.ref<!fir.array<100xf32>>) {
%c = fir.convert %a : (!fir.ref<!fir.array<100xf32>>) -> !fir.ref<!fir.array<?xf32>>
%c1 = arith.constant 1 : index
@@ -54,7 +54,7 @@ func.func @fa(%a : !fir.ref<!fir.array<100xf32>>) {
// Boxing of a scalar character of dynamic length
// CHECK-LABEL: define void @b1(
-// CHECK-SAME: ptr {{[^%]*}}%[[res:.*]], ptr {{[^%]*}}%[[arg0:.*]], i64 %[[arg1:.*]])
+// CHECK-SAME: ptr captures(none) %[[res:.*]], ptr captures(none) %[[arg0:.*]], i64 %[[arg1:.*]])
func.func @b1(%arg0 : !fir.ref<!fir.char<1,?>>, %arg1 : index) -> !fir.box<!fir.char<1,?>> {
// CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
// CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
@@ -69,8 +69,8 @@ func.func @b1(%arg0 : !fir.ref<!fir.char<1,?>>, %arg1 : index) -> !fir.box<!fir.
// Boxing of a dynamic array of character with static length (5)
// CHECK-LABEL: define void @b2(
-// CHECK-SAME: ptr {{[^%]*}}%[[res]],
-// CHECK-SAME: ptr {{[^%]*}}%[[arg0:.*]], i64 %[[arg1:.*]])
+// CHECK-SAME: ptr captures(none) %[[res]],
+// CHECK-SAME: ptr captures(none) %[[arg0:.*]], i64 %[[arg1:.*]])
func.func @b2(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,5>>>, %arg1 : index) -> !fir.box<!fir.array<?x!fir.char<1,5>>> {
%1 = fir.shape %arg1 : (index) -> !fir.shape<1>
// CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
@@ -85,7 +85,7 @@ func.func @b2(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,5>>>, %arg1 : index) ->
// Boxing of a dynamic array of character of dynamic length
// CHECK-LABEL: define void @b3(
-// CHECK-SAME: ptr {{[^%]*}}%[[res:.*]], ptr {{[^%]*}}%[[arg0:.*]], i64 %[[arg1:.*]], i64 %[[arg2:.*]])
+// CHECK-SAME: ptr captures(none) %[[res:.*]], ptr captures(none) %[[arg0:.*]], i64 %[[arg1:.*]], i64 %[[arg2:.*]])
func.func @b3(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,?>>>, %arg1 : index, %arg2 : index) -> !fir.box<!fir.array<?x!fir.char<1,?>>> {
%1 = fir.shape %arg2 : (index) -> !fir.shape<1>
// CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
@@ -103,7 +103,7 @@ func.func @b3(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,?>>>, %arg1 : index, %ar
// Boxing of a static array of character of dynamic length
// CHECK-LABEL: define void @b4(
-// CHECK-SAME: ptr {{[^%]*}}%[[res:.*]], ptr {{[^%]*}}%[[arg0:.*]], i64 %[[arg1:.*]])
+// CHECK-SAME: ptr captures(none) %[[res:.*]], ptr captures(none) %[[arg0:.*]], i64 %[[arg1:.*]])
func.func @b4(%arg0 : !fir.ref<!fir.array<7x!fir.char<1,?>>>, %arg1 : index) -> !fir.box<!fir.array<7x!fir.char<1,?>>> {
%c_7 = arith.constant 7 : index
%1 = fir.shape %c_7 : (index) -> !fir.shape<1>
@@ -122,7 +122,7 @@ func.func @b4(%arg0 : !fir.ref<!fir.array<7x!fir.char<1,?>>>, %arg1 : index) ->
// Storing a fir.box into a fir.ref<fir.box> (modifying descriptors).
// CHECK-LABEL: define void @b5(
-// CHECK-SAME: ptr {{[^%]*}}%[[arg0:.*]], ptr {{[^%]*}}%[[arg1:.*]])
+// CHECK-SAME: ptr captures(none) %[[arg0:.*]], ptr %[[arg1:.*]])
func.func @b5(%arg0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x?xf32>>>>, %arg1 : !fir.box<!fir.heap<!fir.array<?x?xf32>>>) {
fir.store %arg1 to %arg0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x?xf32>>>>
// CHECK: call void @llvm.memcpy.p0.p0.i32(ptr %0, ptr %1, i32 72, i1 false)
@@ -132,7 +132,7 @@ func.func @b5(%arg0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x?xf32>>>>, %arg1
func.func private @callee6(!fir.box<none>) -> i32
// CHECK-LABEL: define i32 @box6(
-// CHECK-SAME: ptr {{[^%]*}}%[[ARG0:.*]], i64 %[[ARG1:.*]], i64 %[[ARG2:.*]])
+// CHECK-SAME: ptr captures(none) %[[ARG0:.*]], i64 %[[ARG1:.*]], i64 %[[ARG2:.*]])
func.func @box6(%0 : !fir.ref<!fir.array<?x?x?x?xf32>>, %1 : index, %2 : index) -> i32 {
%c100 = arith.constant 100 : index
%c50 = arith.constant 50 : index
diff --git a/flang/test/Fir/boxproc.fir b/flang/test/Fir/boxproc.fir
index 5d82522055adc..e99dfd0b92afd 100644
--- a/flang/test/Fir/boxproc.fir
+++ b/flang/test/Fir/boxproc.fir
@@ -16,7 +16,7 @@
// CHECK: call void @_QPtest_proc_dummy_other(ptr %[[VAL_6]])
// CHECK-LABEL: define void @_QFtest_proc_dummyPtest_proc_dummy_a(ptr
-// CHECK-SAME: {{[^%]*}}%[[VAL_0:.*]], ptr nest {{[^%]*}}%[[VAL_1:.*]])
+// CHECK-SAME: captures(none) %[[VAL_0:.*]], ptr nest captures(none) %[[VAL_1:.*]])
// CHECK-LABEL: define void @_QPtest_proc_dummy_other(ptr
// CHECK-SAME: %[[VAL_0:.*]])
@@ -92,7 +92,7 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
// CHECK: call void @llvm.stackrestore.p0(ptr %[[VAL_27]])
// CHECK-LABEL: define { ptr, i64 } @_QFtest_proc_dummy_charPgen_message(ptr
-// CHECK-SAME: {{[^%]*}}%[[VAL_0:.*]], i64 %[[VAL_1:.*]], ptr nest {{[^%]*}}%[[VAL_2:.*]])
+// CHECK-SAME: captures(none) %[[VAL_0:.*]], i64 %[[VAL_1:.*]], ptr nest captures(none) %[[VAL_2:.*]])
// CHECK: %[[VAL_3:.*]] = getelementptr { { ptr, i64 } }, ptr %[[VAL_2]], i32 0, i32 0
// CHECK: %[[VAL_4:.*]] = load { ptr, i64 }, ptr %[[VAL_3]], align 8
// CHECK: %[[VAL_5:.*]] = extractvalue { ptr, i64 } %[[VAL_4]], 0
diff --git a/flang/test/Fir/commute.fir b/flang/test/Fir/commute.fir
index 8713c8ff24e7f..a857ba55b00c5 100644
--- a/flang/test/Fir/commute.fir
+++ b/flang/test/Fir/commute.fir
@@ -11,7 +11,7 @@ func.func @f1(%a : i32, %b : i32) -> i32 {
return %3 : i32
}
-// CHECK-LABEL: define i32 @f2(ptr {{[^%]*}}%0)
+// CHECK-LABEL: define i32 @f2(ptr captures(none) %0)
func.func @f2(%a : !fir.ref<i32>) -> i32 {
%1 = fir.load %a : !fir.ref<i32>
// CHECK: %[[r2:.*]] = load
diff --git a/flang/test/Fir/coordinateof.fir b/flang/test/Fir/coordinateof.fir
index a01e9e9d1fc40..693bdf716ba1d 100644
--- a/flang/test/Fir/coordinateof.fir
+++ b/flang/test/Fir/coordinateof.fir
@@ -62,7 +62,7 @@ func.func @foo5(%box : !fir.box<!fir.ptr<!fir.array<?xi32>>>, %i : index) -> i32
}
// CHECK-LABEL: @foo6
-// CHECK-SAME: (ptr {{[^%]*}}%[[box:.*]], i64 %{{.*}}, ptr {{[^%]*}}%{{.*}})
+// CHECK-SAME: (ptr %[[box:.*]], i64 %{{.*}}, ptr captures(none) %{{.*}})
func.func @foo6(%box : !fir.box<!fir.ptr<!fir.array<?x!fir.char<1>>>>, %i : i64 , %res : !fir.ref<!fir.char<1>>) {
// CHECK: %[[addr_gep:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[box]], i32 0, i32 0
// CHECK: %[[addr:.*]] = load ptr, ptr %[[addr_gep]]
diff --git a/flang/test/Fir/embox.fir b/flang/test/Fir...
[truncated]
|
@@ -350,15 +350,11 @@ void createDefaultFIRCodeGenPassPipeline(mlir::PassManager &pm, | |||
else | |||
framePointerKind = mlir::LLVM::framePointerKind::FramePointerKind::None; | |||
|
|||
bool setNoCapture = false, setNoAlias = false; | |||
if (config.OptLevel.isOptimizingForSpeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tom, will you be okay if I just remove this if
? The LIT tests should still work, and it will be easier to re-enable the code, when function speciailization is fixed.
This is due to a 70% regression in exchange2_r on neoverse-v2 due to function specialization no longer triggering in the LTO pipline.
61914ae
to
645f6c3
Compare
// TODO: re-enable setNoAlias by default (when optimizing for speed) once | ||
// function specialization is fixed. | ||
bool setNoAlias = forceNoAlias; | ||
bool setNoCapture = forceNoCapture; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think this will revert more than I did. Nocapture was set before my change, but only for ref types (not for boxes).
Can you given me some time today to disable it correctly to resolve the exchange issue and avoid other regressions? Or is it crucial to resolve the exchange issue immediately? Please let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad I was rushing and missed this. It turns out to work fine with nocapture still enabled. I've updated the patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you, Tom!
With these enabled we see a 70% performance regression for exchange2_r on neoverse-v1 (aws graviton 3) using
-mcpu=native -Ofast -flto
. There is also a smaller regression on neoverse-v2.This appears to be because function specialization is no longer kicking in during LTO for digits_2. This can be seen in the output executable: previously it contained specialized copies of the function with names like
_QMbrute_forcePdigits_2.specialized.4
. Now there are no names like this.The bug is not in flang - instead in the function specialization pass - but due to the size of the regression I would like to request that this is disabled until function specialization has been fixed.