Skip to content

[OpenMP] [Debug] Debug support for work sharing iterator variable #122047

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alokkrsharma
Copy link
Contributor

@alokkrsharma alokkrsharma commented Jan 8, 2025

This fixes #110700

Work-sharing loop iterator variable had incorrect debug info attached to it.

It was due to redundant/incorrect debug info overwrites the correct debug info appearing later in LLVM IR.

define internal void @main.omp_outlined_debug__( )
  %i = alloca i32, align 4   ;;------------------ (Incorrect)
  %i4 = alloca i32, align 4  ;;------------------ (Correct)
    #dbg_declare(ptr %i, !82, !DIExpression(), !67) ;;--- (Incorrect)
  store i32 0, ptr %i, align 4, !dbg !83 ;;--- (Incorrect)

omp.precond.then:                                 ; preds = %entry
    #dbg_declare(ptr %i4, !82, !DIExpression(), !67) ;;--- (Correct)

omp.inner.for.body:                               ; preds = %omp.inner.for.cond
  %18 = load i32, ptr %.omp.iv, align 4, !dbg !85
  %mul = mul nsw i32 %18, 1, !dbg !83
  %add = add nsw i32 0, %mul, !dbg !83
  store i32 %add, ptr %i4, align 4, !dbg !83 ;;--- (Correct)

Suppressing the incorrect IR, enables emission of correct debug info.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang labels Jan 8, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 8, 2025

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-clang-codegen

Author: Alok Kumar Sharma (alokkrsharma)

Changes

This fixes #110700

Work-sharing loop iterator variable had incorrect debug info attached to it.

It was due to redundant/incorrect debug info overwrites the correct debug info appearing later in LLVM IR.

define internal void @main.omp_outlined_debug__( )
%i = alloca i32, align 4 ;;------------------ (Incorrect)
%i4 = alloca i32, align 4 ;;------------------ (Correct)
#dbg_declare(ptr %i, !82, !DIExpression(), !67) ;;--- (Incorrect)
store i32 0, ptr %i, align 4, !dbg !83 ;;--- (Incorrect)

omp.precond.then: ; preds = %entry
#dbg_declare(ptr %i4, !82, !DIExpression(), !67) ;;--- (Correct)

omp.inner.for.body: ; preds = %omp.inner.for.cond
%18 = load i32, ptr %.omp.iv, align 4, !dbg !85
%mul = mul nsw i32 %18, 1, !dbg !83
%add = add nsw i32 0, %mul, !dbg !83
store i32 %add, ptr %i4, align 4, !dbg !83 ;;--- (Correct)

Suppressing the incorrect IR, enables emission of correct debug info.


Patch is 1.32 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/122047.diff

77 Files Affected:

  • (modified) clang/lib/CodeGen/CGStmtOpenMP.cpp (+2-9)
  • (modified) clang/test/OpenMP/amdgcn_target_device_vla.cpp (+11-29)
  • (modified) clang/test/OpenMP/amdgpu_target_with_aligned_attribute.c (+3-9)
  • (modified) clang/test/OpenMP/bug60602.cpp (-6)
  • (modified) clang/test/OpenMP/cancel_codegen.cpp (+8-16)
  • (modified) clang/test/OpenMP/cancellation_point_codegen.cpp (+17-21)
  • (modified) clang/test/OpenMP/debug-info-openmp-array.cpp (+2-5)
  • (added) clang/test/OpenMP/debug_loopsharing_iter_var.c (+25)
  • (modified) clang/test/OpenMP/distribute_codegen.cpp (+4-16)
  • (modified) clang/test/OpenMP/distribute_parallel_for_codegen.cpp (+18-186)
  • (modified) clang/test/OpenMP/distribute_parallel_for_simd_codegen.cpp (+186-410)
  • (modified) clang/test/OpenMP/distribute_simd_codegen.cpp (+42-78)
  • (modified) clang/test/OpenMP/for_codegen.cpp (+1-3)
  • (modified) clang/test/OpenMP/for_non_rectangular_codegen.c (+3-10)
  • (modified) clang/test/OpenMP/generic_loop_codegen.cpp (-8)
  • (modified) clang/test/OpenMP/interchange_codegen.cpp (+12-72)
  • (modified) clang/test/OpenMP/irbuilder_unroll_partial_factor_for_collapse.c (-4)
  • (modified) clang/test/OpenMP/irbuilder_unroll_partial_heuristic_for_collapse.c (-4)
  • (modified) clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp (+6-14)
  • (modified) clang/test/OpenMP/nvptx_target_parallel_reduction_codegen_tbaa_PR46146.cpp (-10)
  • (modified) clang/test/OpenMP/nvptx_target_simd_codegen.cpp (-24)
  • (modified) clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp (+36-96)
  • (modified) clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_generic_mode_codegen.cpp (+6-14)
  • (modified) clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp (+12-28)
  • (modified) clang/test/OpenMP/nvptx_target_teams_distribute_simd_codegen.cpp (-24)
  • (modified) clang/test/OpenMP/nvptx_target_teams_generic_loop_codegen.cpp (+27-75)
  • (modified) clang/test/OpenMP/nvptx_target_teams_generic_loop_generic_mode_codegen.cpp (-4)
  • (modified) clang/test/OpenMP/ordered_codegen.cpp (+42-92)
  • (modified) clang/test/OpenMP/parallel_for_codegen.cpp (+6-43)
  • (modified) clang/test/OpenMP/parallel_master_taskloop_codegen.cpp (-10)
  • (modified) clang/test/OpenMP/parallel_master_taskloop_simd_codegen.cpp (+33-103)
  • (modified) clang/test/OpenMP/reduction_compound_op.cpp (-32)
  • (modified) clang/test/OpenMP/reduction_implicit_map.cpp (+8-16)
  • (modified) clang/test/OpenMP/reverse_codegen.cpp (-12)
  • (modified) clang/test/OpenMP/target_ompx_dyn_cgroup_mem_codegen.cpp (-16)
  • (modified) clang/test/OpenMP/target_teams_distribute_codegen.cpp (+1-13)
  • (modified) clang/test/OpenMP/target_teams_distribute_collapse_codegen.cpp (-8)
  • (modified) clang/test/OpenMP/target_teams_distribute_dist_schedule_codegen.cpp (-12)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_codegen.cpp (-40)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_collapse_codegen.cpp (-16)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_dist_schedule_codegen.cpp (+6-30)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_schedule_codegen.cpp (+12-92)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_codegen.cpp (+14-54)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_collapse_codegen.cpp (-24)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_dist_schedule_codegen.cpp (+68-104)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_schedule_codegen.cpp (+142-242)
  • (modified) clang/test/OpenMP/target_teams_distribute_simd_codegen.cpp (-36)
  • (modified) clang/test/OpenMP/target_teams_distribute_simd_collapse_codegen.cpp (-16)
  • (modified) clang/test/OpenMP/target_teams_distribute_simd_dist_schedule_codegen.cpp (+68-92)
  • (modified) clang/test/OpenMP/target_teams_generic_loop_codegen-1.cpp (-40)
  • (modified) clang/test/OpenMP/target_teams_generic_loop_codegen_as_parallel_for.cpp (+12-96)
  • (modified) clang/test/OpenMP/target_teams_generic_loop_collapse_codegen.cpp (-16)
  • (modified) clang/test/OpenMP/teams_distribute_codegen.cpp (-16)
  • (modified) clang/test/OpenMP/teams_distribute_collapse_codegen.cpp (-8)
  • (modified) clang/test/OpenMP/teams_distribute_dist_schedule_codegen.cpp (-12)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_codegen.cpp (-32)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_collapse_codegen.cpp (-16)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_dist_schedule_codegen.cpp (+18-42)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_schedule_codegen.cpp (+12-92)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_codegen.cpp (+14-62)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_collapse_codegen.cpp (-24)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_dist_schedule_codegen.cpp (+86-122)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_schedule_codegen.cpp (+68-168)
  • (modified) clang/test/OpenMP/teams_distribute_simd_codegen.cpp (+34-74)
  • (modified) clang/test/OpenMP/teams_distribute_simd_collapse_codegen.cpp (-16)
  • (modified) clang/test/OpenMP/teams_distribute_simd_dist_schedule_codegen.cpp (+28-52)
  • (modified) clang/test/OpenMP/teams_generic_loop_codegen-1.cpp (-16)
  • (modified) clang/test/OpenMP/teams_generic_loop_collapse_codegen.cpp (-8)
  • (modified) clang/test/OpenMP/tile_codegen.cpp (+18-32)
  • (modified) clang/test/OpenMP/tile_codegen_for_dependent.cpp (-2)
  • (modified) clang/test/OpenMP/tile_codegen_tile_for.cpp (-2)
  • (modified) clang/test/OpenMP/unroll_codegen_for_collapse_outer.cpp (+1-6)
  • (modified) clang/test/OpenMP/unroll_codegen_for_partial.cpp (-2)
  • (modified) clang/test/OpenMP/unroll_codegen_parallel_for_factor.cpp (-2)
  • (modified) clang/test/OpenMP/unroll_codegen_tile_for.cpp (-2)
  • (modified) clang/test/OpenMP/unroll_codegen_unroll_for.cpp (-2)
  • (modified) clang/test/OpenMP/unroll_codegen_unroll_for_attr.cpp (+1-3)
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index 6cb37b20b7aeee..0dcba56698c800 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -2367,15 +2367,8 @@ static void emitPreCond(CodeGenFunction &CGF, const OMPLoopDirective &S,
                         llvm::BasicBlock *FalseBlock, uint64_t TrueCount) {
   if (!CGF.HaveInsertPoint())
     return;
-  {
-    CodeGenFunction::OMPPrivateScope PreCondScope(CGF);
-    CGF.EmitOMPPrivateLoopCounters(S, PreCondScope);
-    (void)PreCondScope.Privatize();
-    // Get initial values of real counters.
-    for (const Expr *I : S.inits()) {
-      CGF.EmitIgnoredExpr(I);
-    }
-  }
+  // The Private counters are not needed to be emitted here as these
+  // are emitted later.
   // Create temp loop control variables with their init values to support
   // non-rectangular loops.
   CodeGenFunction::OMPMapVars PreCondVars;
diff --git a/clang/test/OpenMP/amdgcn_target_device_vla.cpp b/clang/test/OpenMP/amdgcn_target_device_vla.cpp
index 58fef517a9e72d..72804fd49ed9fa 100644
--- a/clang/test/OpenMP/amdgcn_target_device_vla.cpp
+++ b/clang/test/OpenMP/amdgcn_target_device_vla.cpp
@@ -151,12 +151,12 @@ int main() {
 // CHECK:       for.end:
 // CHECK-NEXT:    store i32 0, ptr [[I1_ASCAST]], align 4
 // CHECK-NEXT:    br label [[FOR_COND2:%.*]]
-// CHECK:       for.cond2:
+// CHECK:       for.cond
 // CHECK-NEXT:    [[TMP13:%.*]] = load i32, ptr [[I1_ASCAST]], align 4
 // CHECK-NEXT:    [[TMP14:%.*]] = load i32, ptr [[N_ASCAST]], align 4
 // CHECK-NEXT:    [[CMP3:%.*]] = icmp slt i32 [[TMP13]], [[TMP14]]
 // CHECK-NEXT:    br i1 [[CMP3]], label [[FOR_BODY4:%.*]], label [[FOR_END9:%.*]]
-// CHECK:       for.body4:
+// CHECK:       for.body
 // CHECK-NEXT:    [[TMP15:%.*]] = load i32, ptr [[I1_ASCAST]], align 4
 // CHECK-NEXT:    [[IDXPROM5:%.*]] = sext i32 [[TMP15]] to i64
 // CHECK-NEXT:    [[ARRAYIDX6:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IDXPROM5]]
@@ -165,12 +165,12 @@ int main() {
 // CHECK-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP17]], [[TMP16]]
 // CHECK-NEXT:    store i32 [[ADD]], ptr [[TMP0]], align 4
 // CHECK-NEXT:    br label [[FOR_INC7:%.*]]
-// CHECK:       for.inc7:
+// CHECK:       for.inc
 // CHECK-NEXT:    [[TMP18:%.*]] = load i32, ptr [[I1_ASCAST]], align 4
 // CHECK-NEXT:    [[INC8:%.*]] = add nsw i32 [[TMP18]], 1
 // CHECK-NEXT:    store i32 [[INC8]], ptr [[I1_ASCAST]], align 4
 // CHECK-NEXT:    br label [[FOR_COND2]], !llvm.loop [[LOOP15:![0-9]+]]
-// CHECK:       for.end9:
+// CHECK:       for.end
 // CHECK-NEXT:    call void @__kmpc_free_shared(ptr [[A]], i64 [[TMP7]])
 // CHECK-NEXT:    call void @__kmpc_target_deinit()
 // CHECK-NEXT:    ret void
@@ -228,7 +228,6 @@ int main() {
 // CHECK-NEXT:    [[TMP:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4, addrspace(5)
-// CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_COMB_LB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_COMB_UB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4, addrspace(5)
@@ -245,7 +244,6 @@ int main() {
 // CHECK-NEXT:    [[TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[TMP]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR__ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_1]] to ptr
-// CHECK-NEXT:    [[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
 // CHECK-NEXT:    [[DOTOMP_COMB_LB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_COMB_LB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_COMB_UB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_COMB_UB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_STRIDE_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_STRIDE]] to ptr
@@ -267,7 +265,6 @@ int main() {
 // CHECK-NEXT:    [[DIV:%.*]] = sdiv i32 [[SUB]], 1
 // CHECK-NEXT:    [[SUB2:%.*]] = sub nsw i32 [[DIV]], 1
 // CHECK-NEXT:    store i32 [[SUB2]], ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
-// CHECK-NEXT:    store i32 0, ptr [[I_ASCAST]], align 4
 // CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR__ASCAST]], align 4
 // CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 0, [[TMP4]]
 // CHECK-NEXT:    br i1 [[CMP]], label [[OMP_PRECOND_THEN:%.*]], label [[OMP_PRECOND_END:%.*]]
@@ -346,13 +343,13 @@ int main() {
 // CHECK-NEXT:    [[TMP39:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
 // CHECK-NEXT:    [[CMP9:%.*]] = icmp sgt i32 [[TMP38]], [[TMP39]]
 // CHECK-NEXT:    br i1 [[CMP9]], label [[COND_TRUE10:%.*]], label [[COND_FALSE11:%.*]]
-// CHECK:       cond.true10:
+// CHECK:       cond.true
 // CHECK-NEXT:    [[TMP40:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
 // CHECK-NEXT:    br label [[COND_END12:%.*]]
-// CHECK:       cond.false11:
+// CHECK:       cond.false
 // CHECK-NEXT:    [[TMP41:%.*]] = load i32, ptr [[DOTOMP_COMB_UB_ASCAST]], align 4
 // CHECK-NEXT:    br label [[COND_END12]]
-// CHECK:       cond.end12:
+// CHECK:       cond.end
 // CHECK-NEXT:    [[COND13:%.*]] = phi i32 [ [[TMP40]], [[COND_TRUE10]] ], [ [[TMP41]], [[COND_FALSE11]] ]
 // CHECK-NEXT:    store i32 [[COND13]], ptr [[DOTOMP_COMB_UB_ASCAST]], align 4
 // CHECK-NEXT:    [[TMP42:%.*]] = load i32, ptr [[DOTOMP_COMB_LB_ASCAST]], align 4
@@ -383,7 +380,6 @@ int main() {
 // CHECK-NEXT:    [[TMP:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4, addrspace(5)
-// CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_LB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_UB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4, addrspace(5)
@@ -405,7 +401,6 @@ int main() {
 // CHECK-NEXT:    [[TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[TMP]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR__ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_1]] to ptr
-// CHECK-NEXT:    [[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
 // CHECK-NEXT:    [[DOTOMP_LB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_LB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_UB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_UB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_STRIDE_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_STRIDE]] to ptr
@@ -432,7 +427,6 @@ int main() {
 // CHECK-NEXT:    [[DIV:%.*]] = sdiv i32 [[SUB]], 1
 // CHECK-NEXT:    [[SUB2:%.*]] = sub nsw i32 [[DIV]], 1
 // CHECK-NEXT:    store i32 [[SUB2]], ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
-// CHECK-NEXT:    store i32 0, ptr [[I_ASCAST]], align 4
 // CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR__ASCAST]], align 4
 // CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 0, [[TMP4]]
 // CHECK-NEXT:    br i1 [[CMP]], label [[OMP_PRECOND_THEN:%.*]], label [[OMP_PRECOND_END:%.*]]
@@ -500,12 +494,12 @@ int main() {
 // CHECK:       for.end:
 // CHECK-NEXT:    store i32 0, ptr [[J11_ASCAST]], align 4
 // CHECK-NEXT:    br label [[FOR_COND12:%.*]]
-// CHECK:       for.cond12:
+// CHECK:       for.cond
 // CHECK-NEXT:    [[TMP24:%.*]] = load i32, ptr [[J11_ASCAST]], align 4
 // CHECK-NEXT:    [[TMP25:%.*]] = load i32, ptr [[N_ASCAST]], align 4
 // CHECK-NEXT:    [[CMP13:%.*]] = icmp slt i32 [[TMP24]], [[TMP25]]
 // CHECK-NEXT:    br i1 [[CMP13]], label [[FOR_BODY14:%.*]], label [[FOR_END22:%.*]]
-// CHECK:       for.body14:
+// CHECK:       for.body
 // CHECK-NEXT:    [[TMP26:%.*]] = load i32, ptr [[J11_ASCAST]], align 4
 // CHECK-NEXT:    [[IDXPROM15:%.*]] = sext i32 [[TMP26]] to i64
 // CHECK-NEXT:    [[ARRAYIDX16:%.*]] = getelementptr inbounds i32, ptr [[VLA7_ASCAST]], i64 [[IDXPROM15]]
@@ -517,12 +511,12 @@ int main() {
 // CHECK-NEXT:    [[ADD19:%.*]] = add nsw i32 [[TMP29]], [[TMP27]]
 // CHECK-NEXT:    store i32 [[ADD19]], ptr [[ARRAYIDX18]], align 4
 // CHECK-NEXT:    br label [[FOR_INC20:%.*]]
-// CHECK:       for.inc20:
+// CHECK:       for.inc
 // CHECK-NEXT:    [[TMP30:%.*]] = load i32, ptr [[J11_ASCAST]], align 4
 // CHECK-NEXT:    [[INC21:%.*]] = add nsw i32 [[TMP30]], 1
 // CHECK-NEXT:    store i32 [[INC21]], ptr [[J11_ASCAST]], align 4
 // CHECK-NEXT:    br label [[FOR_COND12]], !llvm.loop [[LOOP17:![0-9]+]]
-// CHECK:       for.end22:
+// CHECK:       for.end
 // CHECK-NEXT:    [[TMP31:%.*]] = load ptr addrspace(5), ptr [[SAVED_STACK_ASCAST]], align 4
 // CHECK-NEXT:    call void @llvm.stackrestore.p5(ptr addrspace(5) [[TMP31]])
 // CHECK-NEXT:    br label [[OMP_BODY_CONTINUE:%.*]]
@@ -597,7 +591,6 @@ int main() {
 // CHECK-NEXT:    [[TMP:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4, addrspace(5)
-// CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_LB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_UB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4, addrspace(5)
@@ -615,7 +608,6 @@ int main() {
 // CHECK-NEXT:    [[TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[TMP]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR__ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_1]] to ptr
-// CHECK-NEXT:    [[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
 // CHECK-NEXT:    [[DOTOMP_LB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_LB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_UB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_UB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_STRIDE_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_STRIDE]] to ptr
@@ -639,7 +631,6 @@ int main() {
 // CHECK-NEXT:    [[DIV:%.*]] = sdiv i32 [[SUB]], 1
 // CHECK-NEXT:    [[SUB2:%.*]] = sub nsw i32 [[DIV]], 1
 // CHECK-NEXT:    store i32 [[SUB2]], ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
-// CHECK-NEXT:    store i32 0, ptr [[I_ASCAST]], align 4
 // CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR__ASCAST]], align 4
 // CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 0, [[TMP4]]
 // CHECK-NEXT:    br i1 [[CMP]], label [[OMP_PRECOND_THEN:%.*]], label [[OMP_PRECOND_END:%.*]]
@@ -760,7 +751,6 @@ int main() {
 // CHECK-NEXT:    [[TMP:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4, addrspace(5)
-// CHECK-NEXT:    [[J:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_LB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_UB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4, addrspace(5)
@@ -775,7 +765,6 @@ int main() {
 // CHECK-NEXT:    [[TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[TMP]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR__ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_1]] to ptr
-// CHECK-NEXT:    [[J_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[J]] to ptr
 // CHECK-NEXT:    [[DOTOMP_LB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_LB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_UB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_UB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_STRIDE_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_STRIDE]] to ptr
@@ -796,7 +785,6 @@ int main() {
 // CHECK-NEXT:    [[DIV:%.*]] = sdiv i32 [[SUB]], 1
 // CHECK-NEXT:    [[SUB2:%.*]] = sub nsw i32 [[DIV]], 1
 // CHECK-NEXT:    store i32 [[SUB2]], ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
-// CHECK-NEXT:    store i32 0, ptr [[J_ASCAST]], align 4
 // CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR__ASCAST]], align 4
 // CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 0, [[TMP5]]
 // CHECK-NEXT:    br i1 [[CMP]], label [[OMP_PRECOND_THEN:%.*]], label [[OMP_PRECOND_END:%.*]]
@@ -963,7 +951,6 @@ int main() {
 // CHECK-NEXT:    [[TMP:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_2:%.*]] = alloca i32, align 4, addrspace(5)
-// CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_LB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_UB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4, addrspace(5)
@@ -982,7 +969,6 @@ int main() {
 // CHECK-NEXT:    [[TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[TMP]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR__ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_2_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_2]] to ptr
-// CHECK-NEXT:    [[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
 // CHECK-NEXT:    [[DOTOMP_LB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_LB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_UB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_UB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_STRIDE_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_STRIDE]] to ptr
@@ -1009,7 +995,6 @@ int main() {
 // CHECK-NEXT:    [[DIV:%.*]] = sdiv i32 [[SUB]], 1
 // CHECK-NEXT:    [[SUB3:%.*]] = sub nsw i32 [[DIV]], 1
 // CHECK-NEXT:    store i32 [[SUB3]], ptr [[DOTCAPTURE_EXPR_2_ASCAST]], align 4
-// CHECK-NEXT:    store i32 0, ptr [[I_ASCAST]], align 4
 // CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR__ASCAST]], align 4
 // CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 0, [[TMP5]]
 // CHECK-NEXT:    br i1 [[CMP]], label [[OMP_PRECOND_THEN:%.*]], label [[OMP_PRECOND_END:%.*]]
@@ -1129,7 +1114,6 @@ int main() {
 // CHECK-NEXT:    [[TMP:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4, addrspace(5)
-// CHECK-NEXT:    [[J:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_LB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_UB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4, addrspace(5)
@@ -1144,7 +1128,6 @@ int main() {
 // CHECK-NEXT:    [[TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[TMP]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR__ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_1]] to ptr
-// CHECK-NEXT:    [[J_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[J]] to ptr
 // CHECK-NEXT:    [[DOTOMP_LB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_LB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_UB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_UB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_STRIDE_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_STRIDE]] to ptr
@@ -1165,7 +1148,6 @@ int main() {
 // CHECK-NEXT:    [[DIV:%.*]] = sdiv i32 [[SUB]], 1
 // CHECK-NEXT:    [[SUB2:%.*]] = sub nsw i32 [[DIV]], 1
 // CHECK-NEXT:    store i32 [[SUB2]], ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
-// CHECK-NEXT:    store i32 0, ptr [[J_ASCAST]], align 4
 // CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR__ASCAST]], align 4
 // CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 0, [[TMP5]]
 // CHECK-NEXT:    br i1 [[CMP]], label [[OMP_PRECOND_THEN:%.*]], label [[OMP_PRECOND_END:%.*]]
diff --git a/clang/test/OpenMP/amdgpu_target_with_aligned_attribute.c b/clang/test/OpenMP/amdgpu_target_with_aligned_attribute.c
index cc0cc0def48b81..2e2689da692592 100644
--- a/clang/test/OpenMP/amdgpu_target_with_aligned_attribute.c
+++ b/clang/test/OpenMP/amdgpu_target_with_aligned_attribute.c
@@ -65,7 +65,6 @@ void write_to_aligned_array(int *a, int N) {
 // CHECK-AMD-NEXT:    [[TMP:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4, addrspace(5)
-// CHECK-AMD-NEXT:    [[I:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTOMP_COMB_LB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTOMP_COMB_UB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4, addrspace(5)
@@ -81,7 +80,6 @@ void write_to_aligned_array(int *a, int N) {
 // CHECK-AMD-NEXT:    [[TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[TMP]] to ptr
 // CHECK-AMD-NEXT:    [[DOTCAPTURE_EXPR__ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_]] to ptr
 // CHECK-AMD-NEXT:    [[DOTCAPTURE_EXPR_1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_1]] to ptr
-// CHECK-AMD-NEXT:    [[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
 // CHECK-AMD-NEXT:    [[DOTOMP_COMB_LB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_COMB_LB]] to ptr
 // CHECK-AMD-NEXT:    [[DOTOMP_COMB_UB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_COMB_UB]] to ptr
 // CHECK-AMD-NEXT:    [[DOTOMP_STRIDE_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_STRIDE]] to ptr
@@ -100,7 +98,6 @@ void write_to_aligned_array(int *a, int N) {
 // CHECK-AMD-NEXT:    [[DIV:%.*]] = sdiv i32 [[SUB]], 1
 // CHECK-AMD-NEXT:    [[SUB2:%.*]] = sub nsw i32 [[DIV]], 1
 // CHECK-AMD-NEXT:    store i32 [[SUB2]], ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
-// CHECK-AMD-NEXT:    store i32 0, ptr [[I_ASCAST]], align 4
 // CHECK-AMD-NEXT:    [[TMP2:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR__ASCAST]], align 4
 // CHECK-AMD-NEXT:    [[CMP:%.*]] = icmp slt i32 0, [[TMP2]]
 // CHECK-AMD-NEXT:    br i1 [[CMP]], label [[OMP_PRECOND_THEN:%.*]], label [[OMP_PRECOND_END:%.*]]
@@ -177,13 +174,13 @@ void write_to_aligned_array(int *a, int N) {
 // CHECK-AMD-NEXT:    [[TMP36:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
 // CHECK-AMD-NEXT:    [[CMP9:%.*]] = icmp sgt i32 [[TMP35]], [[TMP36]]
 // CHECK-AMD-NEXT:    br i1 [[CMP9]], label [[COND_TRUE10:%.*]], label [[COND_FALSE11:%.*]]
-// CHECK-AMD:       cond.true10:
+// CHECK-AMD:       cond.true
 // CHECK-AMD-NEXT:    [[TMP37:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
 // CHECK-AMD-NEXT:    br label [[COND_END12:%.*]]
-// CHECK-AMD:       cond.false11:
+// CHECK-AMD:       cond.false
 // CHECK-AMD-NEXT:    [[TMP38:%.*]] = load i32, ptr [[DOTOMP_COMB_UB_ASCAST]], align 4
 // CHECK-AMD-NEXT:    br label [[COND_END12]]
-// CHECK-AMD:       cond.end12:
+// CHECK-AMD:       cond.end
 // CHECK-AMD-NEXT:    [[COND13:%.*]] = phi i32 [ [[TMP37]], [[COND_TRUE10]] ], [ [[TMP38]], [[COND_FALSE11]] ]
 // CHECK-AMD-NEXT:    store i32 [[COND13]], ptr [[DOTOMP_COMB_UB_ASCAST]], align 4
 // CHECK-AMD-NEXT:    [[TMP39:%.*]] = load i32, ptr [[DOTOMP_COMB_LB_ASCAST]], align 4
@@ -213,7 +210,6 @@ void write_to_aligned_array(int *a, int N) {
 // CHECK-AMD-NEXT:    [[TMP:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4, addrspace(5)
-// CHECK-AMD-NEXT:    [[I:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTOMP_LB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTOMP_UB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4, addrspace(5)
@@ -229,7 +225,6 @@ void write_to_aligned_array(int *a, int N) {
 // CHECK-AMD-NEXT:    [[TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[TMP]] to ptr
 // CHECK-AMD-NEXT:    [[DOTCAPTURE_EXPR__ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_]] to ptr
 // CHECK-AMD...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jan 8, 2025

@llvm/pr-subscribers-clang

Author: Alok Kumar Sharma (alokkrsharma)

Changes

This fixes #110700

Work-sharing loop iterator variable had incorrect debug info attached to it.

It was due to redundant/incorrect debug info overwrites the correct debug info appearing later in LLVM IR.

define internal void @main.omp_outlined_debug__( )
%i = alloca i32, align 4 ;;------------------ (Incorrect)
%i4 = alloca i32, align 4 ;;------------------ (Correct)
#dbg_declare(ptr %i, !82, !DIExpression(), !67) ;;--- (Incorrect)
store i32 0, ptr %i, align 4, !dbg !83 ;;--- (Incorrect)

omp.precond.then: ; preds = %entry
#dbg_declare(ptr %i4, !82, !DIExpression(), !67) ;;--- (Correct)

omp.inner.for.body: ; preds = %omp.inner.for.cond
%18 = load i32, ptr %.omp.iv, align 4, !dbg !85
%mul = mul nsw i32 %18, 1, !dbg !83
%add = add nsw i32 0, %mul, !dbg !83
store i32 %add, ptr %i4, align 4, !dbg !83 ;;--- (Correct)

Suppressing the incorrect IR, enables emission of correct debug info.


Patch is 1.32 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/122047.diff

77 Files Affected:

  • (modified) clang/lib/CodeGen/CGStmtOpenMP.cpp (+2-9)
  • (modified) clang/test/OpenMP/amdgcn_target_device_vla.cpp (+11-29)
  • (modified) clang/test/OpenMP/amdgpu_target_with_aligned_attribute.c (+3-9)
  • (modified) clang/test/OpenMP/bug60602.cpp (-6)
  • (modified) clang/test/OpenMP/cancel_codegen.cpp (+8-16)
  • (modified) clang/test/OpenMP/cancellation_point_codegen.cpp (+17-21)
  • (modified) clang/test/OpenMP/debug-info-openmp-array.cpp (+2-5)
  • (added) clang/test/OpenMP/debug_loopsharing_iter_var.c (+25)
  • (modified) clang/test/OpenMP/distribute_codegen.cpp (+4-16)
  • (modified) clang/test/OpenMP/distribute_parallel_for_codegen.cpp (+18-186)
  • (modified) clang/test/OpenMP/distribute_parallel_for_simd_codegen.cpp (+186-410)
  • (modified) clang/test/OpenMP/distribute_simd_codegen.cpp (+42-78)
  • (modified) clang/test/OpenMP/for_codegen.cpp (+1-3)
  • (modified) clang/test/OpenMP/for_non_rectangular_codegen.c (+3-10)
  • (modified) clang/test/OpenMP/generic_loop_codegen.cpp (-8)
  • (modified) clang/test/OpenMP/interchange_codegen.cpp (+12-72)
  • (modified) clang/test/OpenMP/irbuilder_unroll_partial_factor_for_collapse.c (-4)
  • (modified) clang/test/OpenMP/irbuilder_unroll_partial_heuristic_for_collapse.c (-4)
  • (modified) clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp (+6-14)
  • (modified) clang/test/OpenMP/nvptx_target_parallel_reduction_codegen_tbaa_PR46146.cpp (-10)
  • (modified) clang/test/OpenMP/nvptx_target_simd_codegen.cpp (-24)
  • (modified) clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp (+36-96)
  • (modified) clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_generic_mode_codegen.cpp (+6-14)
  • (modified) clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp (+12-28)
  • (modified) clang/test/OpenMP/nvptx_target_teams_distribute_simd_codegen.cpp (-24)
  • (modified) clang/test/OpenMP/nvptx_target_teams_generic_loop_codegen.cpp (+27-75)
  • (modified) clang/test/OpenMP/nvptx_target_teams_generic_loop_generic_mode_codegen.cpp (-4)
  • (modified) clang/test/OpenMP/ordered_codegen.cpp (+42-92)
  • (modified) clang/test/OpenMP/parallel_for_codegen.cpp (+6-43)
  • (modified) clang/test/OpenMP/parallel_master_taskloop_codegen.cpp (-10)
  • (modified) clang/test/OpenMP/parallel_master_taskloop_simd_codegen.cpp (+33-103)
  • (modified) clang/test/OpenMP/reduction_compound_op.cpp (-32)
  • (modified) clang/test/OpenMP/reduction_implicit_map.cpp (+8-16)
  • (modified) clang/test/OpenMP/reverse_codegen.cpp (-12)
  • (modified) clang/test/OpenMP/target_ompx_dyn_cgroup_mem_codegen.cpp (-16)
  • (modified) clang/test/OpenMP/target_teams_distribute_codegen.cpp (+1-13)
  • (modified) clang/test/OpenMP/target_teams_distribute_collapse_codegen.cpp (-8)
  • (modified) clang/test/OpenMP/target_teams_distribute_dist_schedule_codegen.cpp (-12)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_codegen.cpp (-40)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_collapse_codegen.cpp (-16)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_dist_schedule_codegen.cpp (+6-30)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_schedule_codegen.cpp (+12-92)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_codegen.cpp (+14-54)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_collapse_codegen.cpp (-24)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_dist_schedule_codegen.cpp (+68-104)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_schedule_codegen.cpp (+142-242)
  • (modified) clang/test/OpenMP/target_teams_distribute_simd_codegen.cpp (-36)
  • (modified) clang/test/OpenMP/target_teams_distribute_simd_collapse_codegen.cpp (-16)
  • (modified) clang/test/OpenMP/target_teams_distribute_simd_dist_schedule_codegen.cpp (+68-92)
  • (modified) clang/test/OpenMP/target_teams_generic_loop_codegen-1.cpp (-40)
  • (modified) clang/test/OpenMP/target_teams_generic_loop_codegen_as_parallel_for.cpp (+12-96)
  • (modified) clang/test/OpenMP/target_teams_generic_loop_collapse_codegen.cpp (-16)
  • (modified) clang/test/OpenMP/teams_distribute_codegen.cpp (-16)
  • (modified) clang/test/OpenMP/teams_distribute_collapse_codegen.cpp (-8)
  • (modified) clang/test/OpenMP/teams_distribute_dist_schedule_codegen.cpp (-12)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_codegen.cpp (-32)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_collapse_codegen.cpp (-16)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_dist_schedule_codegen.cpp (+18-42)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_schedule_codegen.cpp (+12-92)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_codegen.cpp (+14-62)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_collapse_codegen.cpp (-24)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_dist_schedule_codegen.cpp (+86-122)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_schedule_codegen.cpp (+68-168)
  • (modified) clang/test/OpenMP/teams_distribute_simd_codegen.cpp (+34-74)
  • (modified) clang/test/OpenMP/teams_distribute_simd_collapse_codegen.cpp (-16)
  • (modified) clang/test/OpenMP/teams_distribute_simd_dist_schedule_codegen.cpp (+28-52)
  • (modified) clang/test/OpenMP/teams_generic_loop_codegen-1.cpp (-16)
  • (modified) clang/test/OpenMP/teams_generic_loop_collapse_codegen.cpp (-8)
  • (modified) clang/test/OpenMP/tile_codegen.cpp (+18-32)
  • (modified) clang/test/OpenMP/tile_codegen_for_dependent.cpp (-2)
  • (modified) clang/test/OpenMP/tile_codegen_tile_for.cpp (-2)
  • (modified) clang/test/OpenMP/unroll_codegen_for_collapse_outer.cpp (+1-6)
  • (modified) clang/test/OpenMP/unroll_codegen_for_partial.cpp (-2)
  • (modified) clang/test/OpenMP/unroll_codegen_parallel_for_factor.cpp (-2)
  • (modified) clang/test/OpenMP/unroll_codegen_tile_for.cpp (-2)
  • (modified) clang/test/OpenMP/unroll_codegen_unroll_for.cpp (-2)
  • (modified) clang/test/OpenMP/unroll_codegen_unroll_for_attr.cpp (+1-3)
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index 6cb37b20b7aeee..0dcba56698c800 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -2367,15 +2367,8 @@ static void emitPreCond(CodeGenFunction &CGF, const OMPLoopDirective &S,
                         llvm::BasicBlock *FalseBlock, uint64_t TrueCount) {
   if (!CGF.HaveInsertPoint())
     return;
-  {
-    CodeGenFunction::OMPPrivateScope PreCondScope(CGF);
-    CGF.EmitOMPPrivateLoopCounters(S, PreCondScope);
-    (void)PreCondScope.Privatize();
-    // Get initial values of real counters.
-    for (const Expr *I : S.inits()) {
-      CGF.EmitIgnoredExpr(I);
-    }
-  }
+  // The Private counters are not needed to be emitted here as these
+  // are emitted later.
   // Create temp loop control variables with their init values to support
   // non-rectangular loops.
   CodeGenFunction::OMPMapVars PreCondVars;
diff --git a/clang/test/OpenMP/amdgcn_target_device_vla.cpp b/clang/test/OpenMP/amdgcn_target_device_vla.cpp
index 58fef517a9e72d..72804fd49ed9fa 100644
--- a/clang/test/OpenMP/amdgcn_target_device_vla.cpp
+++ b/clang/test/OpenMP/amdgcn_target_device_vla.cpp
@@ -151,12 +151,12 @@ int main() {
 // CHECK:       for.end:
 // CHECK-NEXT:    store i32 0, ptr [[I1_ASCAST]], align 4
 // CHECK-NEXT:    br label [[FOR_COND2:%.*]]
-// CHECK:       for.cond2:
+// CHECK:       for.cond
 // CHECK-NEXT:    [[TMP13:%.*]] = load i32, ptr [[I1_ASCAST]], align 4
 // CHECK-NEXT:    [[TMP14:%.*]] = load i32, ptr [[N_ASCAST]], align 4
 // CHECK-NEXT:    [[CMP3:%.*]] = icmp slt i32 [[TMP13]], [[TMP14]]
 // CHECK-NEXT:    br i1 [[CMP3]], label [[FOR_BODY4:%.*]], label [[FOR_END9:%.*]]
-// CHECK:       for.body4:
+// CHECK:       for.body
 // CHECK-NEXT:    [[TMP15:%.*]] = load i32, ptr [[I1_ASCAST]], align 4
 // CHECK-NEXT:    [[IDXPROM5:%.*]] = sext i32 [[TMP15]] to i64
 // CHECK-NEXT:    [[ARRAYIDX6:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IDXPROM5]]
@@ -165,12 +165,12 @@ int main() {
 // CHECK-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP17]], [[TMP16]]
 // CHECK-NEXT:    store i32 [[ADD]], ptr [[TMP0]], align 4
 // CHECK-NEXT:    br label [[FOR_INC7:%.*]]
-// CHECK:       for.inc7:
+// CHECK:       for.inc
 // CHECK-NEXT:    [[TMP18:%.*]] = load i32, ptr [[I1_ASCAST]], align 4
 // CHECK-NEXT:    [[INC8:%.*]] = add nsw i32 [[TMP18]], 1
 // CHECK-NEXT:    store i32 [[INC8]], ptr [[I1_ASCAST]], align 4
 // CHECK-NEXT:    br label [[FOR_COND2]], !llvm.loop [[LOOP15:![0-9]+]]
-// CHECK:       for.end9:
+// CHECK:       for.end
 // CHECK-NEXT:    call void @__kmpc_free_shared(ptr [[A]], i64 [[TMP7]])
 // CHECK-NEXT:    call void @__kmpc_target_deinit()
 // CHECK-NEXT:    ret void
@@ -228,7 +228,6 @@ int main() {
 // CHECK-NEXT:    [[TMP:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4, addrspace(5)
-// CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_COMB_LB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_COMB_UB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4, addrspace(5)
@@ -245,7 +244,6 @@ int main() {
 // CHECK-NEXT:    [[TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[TMP]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR__ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_1]] to ptr
-// CHECK-NEXT:    [[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
 // CHECK-NEXT:    [[DOTOMP_COMB_LB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_COMB_LB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_COMB_UB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_COMB_UB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_STRIDE_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_STRIDE]] to ptr
@@ -267,7 +265,6 @@ int main() {
 // CHECK-NEXT:    [[DIV:%.*]] = sdiv i32 [[SUB]], 1
 // CHECK-NEXT:    [[SUB2:%.*]] = sub nsw i32 [[DIV]], 1
 // CHECK-NEXT:    store i32 [[SUB2]], ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
-// CHECK-NEXT:    store i32 0, ptr [[I_ASCAST]], align 4
 // CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR__ASCAST]], align 4
 // CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 0, [[TMP4]]
 // CHECK-NEXT:    br i1 [[CMP]], label [[OMP_PRECOND_THEN:%.*]], label [[OMP_PRECOND_END:%.*]]
@@ -346,13 +343,13 @@ int main() {
 // CHECK-NEXT:    [[TMP39:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
 // CHECK-NEXT:    [[CMP9:%.*]] = icmp sgt i32 [[TMP38]], [[TMP39]]
 // CHECK-NEXT:    br i1 [[CMP9]], label [[COND_TRUE10:%.*]], label [[COND_FALSE11:%.*]]
-// CHECK:       cond.true10:
+// CHECK:       cond.true
 // CHECK-NEXT:    [[TMP40:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
 // CHECK-NEXT:    br label [[COND_END12:%.*]]
-// CHECK:       cond.false11:
+// CHECK:       cond.false
 // CHECK-NEXT:    [[TMP41:%.*]] = load i32, ptr [[DOTOMP_COMB_UB_ASCAST]], align 4
 // CHECK-NEXT:    br label [[COND_END12]]
-// CHECK:       cond.end12:
+// CHECK:       cond.end
 // CHECK-NEXT:    [[COND13:%.*]] = phi i32 [ [[TMP40]], [[COND_TRUE10]] ], [ [[TMP41]], [[COND_FALSE11]] ]
 // CHECK-NEXT:    store i32 [[COND13]], ptr [[DOTOMP_COMB_UB_ASCAST]], align 4
 // CHECK-NEXT:    [[TMP42:%.*]] = load i32, ptr [[DOTOMP_COMB_LB_ASCAST]], align 4
@@ -383,7 +380,6 @@ int main() {
 // CHECK-NEXT:    [[TMP:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4, addrspace(5)
-// CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_LB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_UB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4, addrspace(5)
@@ -405,7 +401,6 @@ int main() {
 // CHECK-NEXT:    [[TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[TMP]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR__ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_1]] to ptr
-// CHECK-NEXT:    [[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
 // CHECK-NEXT:    [[DOTOMP_LB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_LB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_UB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_UB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_STRIDE_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_STRIDE]] to ptr
@@ -432,7 +427,6 @@ int main() {
 // CHECK-NEXT:    [[DIV:%.*]] = sdiv i32 [[SUB]], 1
 // CHECK-NEXT:    [[SUB2:%.*]] = sub nsw i32 [[DIV]], 1
 // CHECK-NEXT:    store i32 [[SUB2]], ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
-// CHECK-NEXT:    store i32 0, ptr [[I_ASCAST]], align 4
 // CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR__ASCAST]], align 4
 // CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 0, [[TMP4]]
 // CHECK-NEXT:    br i1 [[CMP]], label [[OMP_PRECOND_THEN:%.*]], label [[OMP_PRECOND_END:%.*]]
@@ -500,12 +494,12 @@ int main() {
 // CHECK:       for.end:
 // CHECK-NEXT:    store i32 0, ptr [[J11_ASCAST]], align 4
 // CHECK-NEXT:    br label [[FOR_COND12:%.*]]
-// CHECK:       for.cond12:
+// CHECK:       for.cond
 // CHECK-NEXT:    [[TMP24:%.*]] = load i32, ptr [[J11_ASCAST]], align 4
 // CHECK-NEXT:    [[TMP25:%.*]] = load i32, ptr [[N_ASCAST]], align 4
 // CHECK-NEXT:    [[CMP13:%.*]] = icmp slt i32 [[TMP24]], [[TMP25]]
 // CHECK-NEXT:    br i1 [[CMP13]], label [[FOR_BODY14:%.*]], label [[FOR_END22:%.*]]
-// CHECK:       for.body14:
+// CHECK:       for.body
 // CHECK-NEXT:    [[TMP26:%.*]] = load i32, ptr [[J11_ASCAST]], align 4
 // CHECK-NEXT:    [[IDXPROM15:%.*]] = sext i32 [[TMP26]] to i64
 // CHECK-NEXT:    [[ARRAYIDX16:%.*]] = getelementptr inbounds i32, ptr [[VLA7_ASCAST]], i64 [[IDXPROM15]]
@@ -517,12 +511,12 @@ int main() {
 // CHECK-NEXT:    [[ADD19:%.*]] = add nsw i32 [[TMP29]], [[TMP27]]
 // CHECK-NEXT:    store i32 [[ADD19]], ptr [[ARRAYIDX18]], align 4
 // CHECK-NEXT:    br label [[FOR_INC20:%.*]]
-// CHECK:       for.inc20:
+// CHECK:       for.inc
 // CHECK-NEXT:    [[TMP30:%.*]] = load i32, ptr [[J11_ASCAST]], align 4
 // CHECK-NEXT:    [[INC21:%.*]] = add nsw i32 [[TMP30]], 1
 // CHECK-NEXT:    store i32 [[INC21]], ptr [[J11_ASCAST]], align 4
 // CHECK-NEXT:    br label [[FOR_COND12]], !llvm.loop [[LOOP17:![0-9]+]]
-// CHECK:       for.end22:
+// CHECK:       for.end
 // CHECK-NEXT:    [[TMP31:%.*]] = load ptr addrspace(5), ptr [[SAVED_STACK_ASCAST]], align 4
 // CHECK-NEXT:    call void @llvm.stackrestore.p5(ptr addrspace(5) [[TMP31]])
 // CHECK-NEXT:    br label [[OMP_BODY_CONTINUE:%.*]]
@@ -597,7 +591,6 @@ int main() {
 // CHECK-NEXT:    [[TMP:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4, addrspace(5)
-// CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_LB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_UB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4, addrspace(5)
@@ -615,7 +608,6 @@ int main() {
 // CHECK-NEXT:    [[TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[TMP]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR__ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_1]] to ptr
-// CHECK-NEXT:    [[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
 // CHECK-NEXT:    [[DOTOMP_LB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_LB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_UB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_UB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_STRIDE_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_STRIDE]] to ptr
@@ -639,7 +631,6 @@ int main() {
 // CHECK-NEXT:    [[DIV:%.*]] = sdiv i32 [[SUB]], 1
 // CHECK-NEXT:    [[SUB2:%.*]] = sub nsw i32 [[DIV]], 1
 // CHECK-NEXT:    store i32 [[SUB2]], ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
-// CHECK-NEXT:    store i32 0, ptr [[I_ASCAST]], align 4
 // CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR__ASCAST]], align 4
 // CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 0, [[TMP4]]
 // CHECK-NEXT:    br i1 [[CMP]], label [[OMP_PRECOND_THEN:%.*]], label [[OMP_PRECOND_END:%.*]]
@@ -760,7 +751,6 @@ int main() {
 // CHECK-NEXT:    [[TMP:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4, addrspace(5)
-// CHECK-NEXT:    [[J:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_LB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_UB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4, addrspace(5)
@@ -775,7 +765,6 @@ int main() {
 // CHECK-NEXT:    [[TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[TMP]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR__ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_1]] to ptr
-// CHECK-NEXT:    [[J_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[J]] to ptr
 // CHECK-NEXT:    [[DOTOMP_LB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_LB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_UB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_UB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_STRIDE_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_STRIDE]] to ptr
@@ -796,7 +785,6 @@ int main() {
 // CHECK-NEXT:    [[DIV:%.*]] = sdiv i32 [[SUB]], 1
 // CHECK-NEXT:    [[SUB2:%.*]] = sub nsw i32 [[DIV]], 1
 // CHECK-NEXT:    store i32 [[SUB2]], ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
-// CHECK-NEXT:    store i32 0, ptr [[J_ASCAST]], align 4
 // CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR__ASCAST]], align 4
 // CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 0, [[TMP5]]
 // CHECK-NEXT:    br i1 [[CMP]], label [[OMP_PRECOND_THEN:%.*]], label [[OMP_PRECOND_END:%.*]]
@@ -963,7 +951,6 @@ int main() {
 // CHECK-NEXT:    [[TMP:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_2:%.*]] = alloca i32, align 4, addrspace(5)
-// CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_LB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_UB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4, addrspace(5)
@@ -982,7 +969,6 @@ int main() {
 // CHECK-NEXT:    [[TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[TMP]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR__ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_2_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_2]] to ptr
-// CHECK-NEXT:    [[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
 // CHECK-NEXT:    [[DOTOMP_LB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_LB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_UB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_UB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_STRIDE_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_STRIDE]] to ptr
@@ -1009,7 +995,6 @@ int main() {
 // CHECK-NEXT:    [[DIV:%.*]] = sdiv i32 [[SUB]], 1
 // CHECK-NEXT:    [[SUB3:%.*]] = sub nsw i32 [[DIV]], 1
 // CHECK-NEXT:    store i32 [[SUB3]], ptr [[DOTCAPTURE_EXPR_2_ASCAST]], align 4
-// CHECK-NEXT:    store i32 0, ptr [[I_ASCAST]], align 4
 // CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR__ASCAST]], align 4
 // CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 0, [[TMP5]]
 // CHECK-NEXT:    br i1 [[CMP]], label [[OMP_PRECOND_THEN:%.*]], label [[OMP_PRECOND_END:%.*]]
@@ -1129,7 +1114,6 @@ int main() {
 // CHECK-NEXT:    [[TMP:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4, addrspace(5)
-// CHECK-NEXT:    [[J:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_LB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_UB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4, addrspace(5)
@@ -1144,7 +1128,6 @@ int main() {
 // CHECK-NEXT:    [[TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[TMP]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR__ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_]] to ptr
 // CHECK-NEXT:    [[DOTCAPTURE_EXPR_1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_1]] to ptr
-// CHECK-NEXT:    [[J_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[J]] to ptr
 // CHECK-NEXT:    [[DOTOMP_LB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_LB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_UB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_UB]] to ptr
 // CHECK-NEXT:    [[DOTOMP_STRIDE_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_STRIDE]] to ptr
@@ -1165,7 +1148,6 @@ int main() {
 // CHECK-NEXT:    [[DIV:%.*]] = sdiv i32 [[SUB]], 1
 // CHECK-NEXT:    [[SUB2:%.*]] = sub nsw i32 [[DIV]], 1
 // CHECK-NEXT:    store i32 [[SUB2]], ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
-// CHECK-NEXT:    store i32 0, ptr [[J_ASCAST]], align 4
 // CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR__ASCAST]], align 4
 // CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 0, [[TMP5]]
 // CHECK-NEXT:    br i1 [[CMP]], label [[OMP_PRECOND_THEN:%.*]], label [[OMP_PRECOND_END:%.*]]
diff --git a/clang/test/OpenMP/amdgpu_target_with_aligned_attribute.c b/clang/test/OpenMP/amdgpu_target_with_aligned_attribute.c
index cc0cc0def48b81..2e2689da692592 100644
--- a/clang/test/OpenMP/amdgpu_target_with_aligned_attribute.c
+++ b/clang/test/OpenMP/amdgpu_target_with_aligned_attribute.c
@@ -65,7 +65,6 @@ void write_to_aligned_array(int *a, int N) {
 // CHECK-AMD-NEXT:    [[TMP:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4, addrspace(5)
-// CHECK-AMD-NEXT:    [[I:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTOMP_COMB_LB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTOMP_COMB_UB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4, addrspace(5)
@@ -81,7 +80,6 @@ void write_to_aligned_array(int *a, int N) {
 // CHECK-AMD-NEXT:    [[TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[TMP]] to ptr
 // CHECK-AMD-NEXT:    [[DOTCAPTURE_EXPR__ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_]] to ptr
 // CHECK-AMD-NEXT:    [[DOTCAPTURE_EXPR_1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_1]] to ptr
-// CHECK-AMD-NEXT:    [[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
 // CHECK-AMD-NEXT:    [[DOTOMP_COMB_LB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_COMB_LB]] to ptr
 // CHECK-AMD-NEXT:    [[DOTOMP_COMB_UB_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_COMB_UB]] to ptr
 // CHECK-AMD-NEXT:    [[DOTOMP_STRIDE_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTOMP_STRIDE]] to ptr
@@ -100,7 +98,6 @@ void write_to_aligned_array(int *a, int N) {
 // CHECK-AMD-NEXT:    [[DIV:%.*]] = sdiv i32 [[SUB]], 1
 // CHECK-AMD-NEXT:    [[SUB2:%.*]] = sub nsw i32 [[DIV]], 1
 // CHECK-AMD-NEXT:    store i32 [[SUB2]], ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
-// CHECK-AMD-NEXT:    store i32 0, ptr [[I_ASCAST]], align 4
 // CHECK-AMD-NEXT:    [[TMP2:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR__ASCAST]], align 4
 // CHECK-AMD-NEXT:    [[CMP:%.*]] = icmp slt i32 0, [[TMP2]]
 // CHECK-AMD-NEXT:    br i1 [[CMP]], label [[OMP_PRECOND_THEN:%.*]], label [[OMP_PRECOND_END:%.*]]
@@ -177,13 +174,13 @@ void write_to_aligned_array(int *a, int N) {
 // CHECK-AMD-NEXT:    [[TMP36:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
 // CHECK-AMD-NEXT:    [[CMP9:%.*]] = icmp sgt i32 [[TMP35]], [[TMP36]]
 // CHECK-AMD-NEXT:    br i1 [[CMP9]], label [[COND_TRUE10:%.*]], label [[COND_FALSE11:%.*]]
-// CHECK-AMD:       cond.true10:
+// CHECK-AMD:       cond.true
 // CHECK-AMD-NEXT:    [[TMP37:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1_ASCAST]], align 4
 // CHECK-AMD-NEXT:    br label [[COND_END12:%.*]]
-// CHECK-AMD:       cond.false11:
+// CHECK-AMD:       cond.false
 // CHECK-AMD-NEXT:    [[TMP38:%.*]] = load i32, ptr [[DOTOMP_COMB_UB_ASCAST]], align 4
 // CHECK-AMD-NEXT:    br label [[COND_END12]]
-// CHECK-AMD:       cond.end12:
+// CHECK-AMD:       cond.end
 // CHECK-AMD-NEXT:    [[COND13:%.*]] = phi i32 [ [[TMP37]], [[COND_TRUE10]] ], [ [[TMP38]], [[COND_FALSE11]] ]
 // CHECK-AMD-NEXT:    store i32 [[COND13]], ptr [[DOTOMP_COMB_UB_ASCAST]], align 4
 // CHECK-AMD-NEXT:    [[TMP39:%.*]] = load i32, ptr [[DOTOMP_COMB_LB_ASCAST]], align 4
@@ -213,7 +210,6 @@ void write_to_aligned_array(int *a, int N) {
 // CHECK-AMD-NEXT:    [[TMP:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4, addrspace(5)
-// CHECK-AMD-NEXT:    [[I:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTOMP_LB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTOMP_UB:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-AMD-NEXT:    [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4, addrspace(5)
@@ -229,7 +225,6 @@ void write_to_aligned_array(int *a, int N) {
 // CHECK-AMD-NEXT:    [[TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[TMP]] to ptr
 // CHECK-AMD-NEXT:    [[DOTCAPTURE_EXPR__ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR_]] to ptr
 // CHECK-AMD...
[truncated]

…rong based on DWARF

Work-sharing loop iterator variable had incorrect debug info attached to it.

It was due to redundant/incorrect debug info overwrites the correct debug info
appearing later in LLVM IR.
------------------
define internal void @main.omp_outlined_debug__( )
  %i = alloca i32, align 4   ;;------------------ (Incorrect)
  %i4 = alloca i32, align 4  ;;------------------ (Correct)
    #dbg_declare(ptr %i, !82, !DIExpression(), !67) ;;--- (Incorrect)
  store i32 0, ptr %i, align 4, !dbg !83 ;;--- (Incorrect)

omp.precond.then:                                 ; preds = %entry
    #dbg_declare(ptr %i4, !82, !DIExpression(), !67) ;;--- (Correct)

omp.inner.for.body:                               ; preds = %omp.inner.for.cond
  %18 = load i32, ptr %.omp.iv, align 4, !dbg !85
  %mul = mul nsw i32 %18, 1, !dbg !83
  %add = add nsw i32 0, %mul, !dbg !83
  store i32 %add, ptr %i4, align 4, !dbg !83 ;;--- (Correct)
------------------

Suppressing the incorrect IR, enables emission of correct debug info.
@shiltian
Copy link
Contributor

shiltian commented Jan 8, 2025

The fix doesn't look right to me...

@alexey-bataev
Copy link
Member

Maybe it fixes debug info, but definitely breaks some of the OpenMP support

@alokkrsharma
Copy link
Contributor Author

alokkrsharma commented Jan 15, 2025

Maybe it fixes debug info, but definitely breaks some of the OpenMP support

Thanks for your comment alexey-bataev. Though I intend to delete only unused instructions but it is possible that I have overlooked something. Would you please help me with some example, I will improve the fix with concerns addressed.

@alokkrsharma
Copy link
Contributor Author

The fix doesn't look right to me...

Thanks for your comment shiltian. I would love to address your concern and improve the patch. Please help me with the example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Iteration variable in OpenMP work-sharing loops is wrong based on DWARF
4 participants