Skip to content

[DebugInfo] getMergedLocation: match scopes based on their location #132286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 18, 2025

Conversation

dzhidzhoev
Copy link
Member

GetNearestCommonScope finds a common scope for two locations by finding the common parent scope object.

With this commit, GetNearestCommonScope considers two scopes equal if they have the same file, line, and column values. Thus, it can find the "nearest common included location" when merging locations from different files. This may be useful for a case described here #125780 (comment).

DIScope's pointer equality isn't enough for scope equality check, since, for example, two #include "x.inc" directives showing up on different lines produce different DILocalScope objects representing the same file content. Thus, two (even the same) locations from "x.inc" may have different parent scope objects representing the same source location.

If input DILocations are from different files, or a common scope is in another file than input locations, a textual inclusion case is assumed, and the location of common scope ("nearest common included location") is returned from getMergedLocation.
It fixes #122846.

@llvmbot
Copy link
Member

llvmbot commented Mar 20, 2025

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-debuginfo

Author: Vladislav Dzhidzhoev (dzhidzhoev)

Changes

GetNearestCommonScope finds a common scope for two locations by finding the common parent scope object.

With this commit, GetNearestCommonScope considers two scopes equal if they have the same file, line, and column values. Thus, it can find the "nearest common included location" when merging locations from different files. This may be useful for a case described here #125780 (comment).

DIScope's pointer equality isn't enough for scope equality check, since, for example, two #include "x.inc" directives showing up on different lines produce different DILocalScope objects representing the same file content. Thus, two (even the same) locations from "x.inc" may have different parent scope objects representing the same source location.

If input DILocations are from different files, or a common scope is in another file than input locations, a textual inclusion case is assumed, and the location of common scope ("nearest common included location") is returned from getMergedLocation.
It fixes #122846.


Patch is 32.78 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/132286.diff

4 Files Affected:

  • (modified) llvm/lib/IR/DebugInfoMetadata.cpp (+64-8)
  • (added) llvm/test/DebugInfo/AArch64/merge-nested-block-loc.ll (+194)
  • (added) llvm/test/DebugInfo/AArch64/merge-nested-block-loc2.ll (+229)
  • (modified) llvm/unittests/IR/MetadataTest.cpp (+204-7)
diff --git a/llvm/lib/IR/DebugInfoMetadata.cpp b/llvm/lib/IR/DebugInfoMetadata.cpp
index f975d4ca33ad9..48eaa4fd56af2 100644
--- a/llvm/lib/IR/DebugInfoMetadata.cpp
+++ b/llvm/lib/IR/DebugInfoMetadata.cpp
@@ -13,7 +13,7 @@
 #include "llvm/IR/DebugInfoMetadata.h"
 #include "LLVMContextImpl.h"
 #include "MetadataImpl.h"
-#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/StringSwitch.h"
 #include "llvm/BinaryFormat/Dwarf.h"
 #include "llvm/IR/DebugProgramInstruction.h"
@@ -118,6 +118,22 @@ DILocation *DILocation::getMergedLocations(ArrayRef<DILocation *> Locs) {
   return Merged;
 }
 
+using LineColumn = std::pair<unsigned /* Line */, unsigned /* Column */>;
+
+/// Returns the location of DILocalScope, if present, or a default value.
+static LineColumn getLocalScopeLocationOr(DIScope *S, LineColumn Default) {
+  assert(isa<DILocalScope>(S) && "Expected DILocalScope.");
+
+  if (isa<DILexicalBlockFile>(S))
+    return Default;
+  if (auto *LB = dyn_cast<DILexicalBlock>(S))
+    return {LB->getLine(), LB->getColumn()};
+  if (auto *SP = dyn_cast<DISubprogram>(S))
+    return {SP->getLine(), 0u};
+
+  llvm_unreachable("Unhandled type of DILocalScope.");
+}
+
 DILocation *DILocation::getMergedLocation(DILocation *LocA, DILocation *LocB) {
   if (!LocA || !LocB)
     return nullptr;
@@ -188,27 +204,67 @@ DILocation *DILocation::getMergedLocation(DILocation *LocA, DILocation *LocB) {
       return nullptr;
 
     // Return the nearest common scope inside a subprogram.
-    auto GetNearestCommonScope = [](DIScope *S1, DIScope *S2) -> DIScope * {
-      SmallPtrSet<DIScope *, 8> Scopes;
+    auto GetNearestCommonScope =
+        [](const DILocation *L1,
+           const DILocation *L2) -> std::pair<DIScope *, LineColumn> {
+      DIScope *S1 = L1->getScope();
+      DIScope *S2 = L2->getScope();
+
+      SmallMapVector<std::tuple<DIFile *, LineColumn>,
+                     SmallSetVector<DIScope *, 8>, 8>
+          Scopes;
+
+      // When matching DILexicalBlockFile's, ignore column numbers, so that
+      // DILocation's having different columns within the same
+      // DILexicalBlockFile will match.
+      auto getLocForBlockFile = [](LineColumn L) {
+        L.second = 0;
+        return L;
+      };
+
+      LineColumn Loc1(L1->getLine(), L1->getColumn());
       for (; S1; S1 = S1->getScope()) {
-        Scopes.insert(S1);
+        Loc1 = getLocalScopeLocationOr(S1, getLocForBlockFile(Loc1));
+        Scopes[{S1->getFile(), Loc1}].insert(S1);
+
         if (isa<DISubprogram>(S1))
           break;
       }
 
+      LineColumn Loc2(L2->getLine(), L2->getColumn());
       for (; S2; S2 = S2->getScope()) {
-        if (Scopes.count(S2))
-          return S2;
+        Loc2 = getLocalScopeLocationOr(S2, getLocForBlockFile(Loc2));
+
+        auto ScopesAtLoc = Scopes.find({S2->getFile(), Loc2});
+        // No scope found with the same file, line and column as S2.
+        if (ScopesAtLoc == Scopes.end())
+          continue;
+
+        // Return S2 if it is L1's parent.
+        if (ScopesAtLoc->second.contains(S2))
+          return std::make_pair(S2, Loc2);
+
+        // Return any L1's parent with the same file, line and column as S2.
+        if (!ScopesAtLoc->second.empty())
+          return std::make_pair(*ScopesAtLoc->second.begin(), Loc2);
+
         if (isa<DISubprogram>(S2))
           break;
       }
 
-      return nullptr;
+      return std::make_pair(nullptr,
+                            LineColumn(L2->getLine(), L2->getColumn()));
     };
 
-    auto Scope = GetNearestCommonScope(L1->getScope(), L2->getScope());
+    auto [Scope, ScopeLoc] = GetNearestCommonScope(L1, L2);
     assert(Scope && "No common scope in the same subprogram?");
 
+    // Use inclusion location if files are different.
+    if (Scope->getFile() != L1->getFile() || L1->getFile() != L2->getFile()) {
+      return DILocation::get(C, ScopeLoc.first, ScopeLoc.second, Scope,
+                             InlinedAt);
+    }
+
     bool SameLine = L1->getLine() == L2->getLine();
     bool SameCol = L1->getColumn() == L2->getColumn();
     unsigned Line = SameLine ? L1->getLine() : 0;
diff --git a/llvm/test/DebugInfo/AArch64/merge-nested-block-loc.ll b/llvm/test/DebugInfo/AArch64/merge-nested-block-loc.ll
new file mode 100644
index 0000000000000..7fa0024847283
--- /dev/null
+++ b/llvm/test/DebugInfo/AArch64/merge-nested-block-loc.ll
@@ -0,0 +1,194 @@
+; RUN: opt -mtriple=aarch64-unknown-linux-gnu -S %s -passes=sroa -o - | FileCheck %s
+
+; In this test we want to ensure that the location of phi instruction merged from locations
+; of %mul3 and %mul9 belongs to the correct scope (DILexicalBlockFile), so that line
+; number of that location belongs to the corresponding file.
+
+; Generated with clang from
+; # 1 "1.c" 1
+; # 1 "1.c" 2
+; int foo(int a) {
+;   int i = 0;
+;   if ((a & 1) == 1) {
+;     a -= 1;
+; # 1 "m.c" 1
+; # 40 "m.c"
+; i += a;
+; i -= 10*a;
+; i *= a*a;
+; # 6 "1.c" 2
+;  } else {
+;     a += 3;
+; # 1 "m.c" 1
+; # 40 "m.c"
+; i += a;
+; i -= 10*a;
+; i *= a*a;
+; # 9 "1.c" 2
+;  }
+;   return i;
+; }
+
+source_filename = "repro.c"
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
+target triple = "aarch64-unknown-linux-gnu"
+
+; Function Attrs: nounwind uwtable
+define dso_local i32 @foo(i32 noundef %a) !dbg !9 {
+; CHECK:    phi i32 {{.*}}, !dbg [[PHILOC:![0-9]+]]
+;
+entry:
+  %a.addr = alloca i32, align 4, !DIAssignID !17
+  #dbg_assign(i1 undef, !15, !DIExpression(), !17, ptr %a.addr, !DIExpression(), !18)
+  %i = alloca i32, align 4, !DIAssignID !19
+  #dbg_assign(i1 undef, !16, !DIExpression(), !19, ptr %i, !DIExpression(), !18)
+  store i32 %a, ptr %a.addr, align 4, !tbaa !20, !DIAssignID !24
+  #dbg_assign(i32 %a, !15, !DIExpression(), !24, ptr %a.addr, !DIExpression(), !18)
+  call void @llvm.lifetime.start.p0(i64 4, ptr %i), !dbg !25
+  store i32 0, ptr %i, align 4, !dbg !26, !tbaa !20, !DIAssignID !27
+  #dbg_assign(i32 0, !16, !DIExpression(), !27, ptr %i, !DIExpression(), !18)
+  %0 = load i32, ptr %a.addr, align 4, !dbg !28, !tbaa !20
+  %and = and i32 %0, 1, !dbg !30
+  %cmp = icmp eq i32 %and, 1, !dbg !31
+  br i1 %cmp, label %if.then, label %if.else, !dbg !31
+
+if.then:                                          ; preds = %entry
+  %1 = load i32, ptr %a.addr, align 4, !dbg !32, !tbaa !20
+  %sub = sub nsw i32 %1, 1, !dbg !32
+  store i32 %sub, ptr %a.addr, align 4, !dbg !32, !tbaa !20, !DIAssignID !34
+  #dbg_assign(i32 %sub, !15, !DIExpression(), !34, ptr %a.addr, !DIExpression(), !18)
+  %2 = load i32, ptr %a.addr, align 4, !dbg !35, !tbaa !20
+  %3 = load i32, ptr %i, align 4, !dbg !38, !tbaa !20
+  %add = add nsw i32 %3, %2, !dbg !38
+  store i32 %add, ptr %i, align 4, !dbg !38, !tbaa !20, !DIAssignID !39
+  #dbg_assign(i32 %add, !16, !DIExpression(), !39, ptr %i, !DIExpression(), !18)
+  %4 = load i32, ptr %a.addr, align 4, !dbg !40, !tbaa !20
+  %mul = mul nsw i32 10, %4, !dbg !41
+  %5 = load i32, ptr %i, align 4, !dbg !42, !tbaa !20
+  %sub1 = sub nsw i32 %5, %mul, !dbg !42
+  store i32 %sub1, ptr %i, align 4, !dbg !42, !tbaa !20, !DIAssignID !43
+  #dbg_assign(i32 %sub1, !16, !DIExpression(), !43, ptr %i, !DIExpression(), !18)
+  %6 = load i32, ptr %a.addr, align 4, !dbg !44, !tbaa !20
+  %7 = load i32, ptr %a.addr, align 4, !dbg !45, !tbaa !20
+  %mul2 = mul nsw i32 %6, %7, !dbg !46
+  %8 = load i32, ptr %i, align 4, !dbg !47, !tbaa !20
+  %mul3 = mul nsw i32 %8, %mul2, !dbg !47
+  store i32 %mul3, ptr %i, align 4, !dbg !47, !tbaa !20, !DIAssignID !48
+  #dbg_assign(i32 %mul3, !16, !DIExpression(), !48, ptr %i, !DIExpression(), !18)
+  br label %if.end, !dbg !49
+
+if.else:                                          ; preds = %entry
+  %9 = load i32, ptr %a.addr, align 4, !dbg !51, !tbaa !20
+  %add4 = add nsw i32 %9, 3, !dbg !51
+  store i32 %add4, ptr %a.addr, align 4, !dbg !51, !tbaa !20, !DIAssignID !53
+  #dbg_assign(i32 %add4, !15, !DIExpression(), !53, ptr %a.addr, !DIExpression(), !18)
+  %10 = load i32, ptr %a.addr, align 4, !dbg !54, !tbaa !20
+  %11 = load i32, ptr %i, align 4, !dbg !56, !tbaa !20
+  %add5 = add nsw i32 %11, %10, !dbg !56
+  store i32 %add5, ptr %i, align 4, !dbg !56, !tbaa !20, !DIAssignID !57
+  #dbg_assign(i32 %add5, !16, !DIExpression(), !57, ptr %i, !DIExpression(), !18)
+  %12 = load i32, ptr %a.addr, align 4, !dbg !58, !tbaa !20
+  %mul6 = mul nsw i32 10, %12, !dbg !59
+  %13 = load i32, ptr %i, align 4, !dbg !60, !tbaa !20
+  %sub7 = sub nsw i32 %13, %mul6, !dbg !60
+  store i32 %sub7, ptr %i, align 4, !dbg !60, !tbaa !20, !DIAssignID !61
+  #dbg_assign(i32 %sub7, !16, !DIExpression(), !61, ptr %i, !DIExpression(), !18)
+  %14 = load i32, ptr %a.addr, align 4, !dbg !62, !tbaa !20
+  %15 = load i32, ptr %a.addr, align 4, !dbg !63, !tbaa !20
+  %mul8 = mul nsw i32 %14, %15, !dbg !64
+  %16 = load i32, ptr %i, align 4, !dbg !65, !tbaa !20
+  %mul9 = mul nsw i32 %16, %mul8, !dbg !65
+  store i32 %mul9, ptr %i, align 4, !dbg !65, !tbaa !20, !DIAssignID !66
+  #dbg_assign(i32 %mul9, !16, !DIExpression(), !66, ptr %i, !DIExpression(), !18)
+  br label %if.end
+
+if.end:                                           ; preds = %if.else, %if.then
+  %17 = load i32, ptr %i, align 4, !dbg !67, !tbaa !20
+  call void @llvm.lifetime.end.p0(i64 4, ptr %i), !dbg !68
+  ret i32 %17, !dbg !69
+}
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
+declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture)
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
+declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture)
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3}
+!llvm.ident = !{!8}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 21.0.0git", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None)
+!1 = !DIFile(filename: "repro.c", directory: "")
+!2 = !{i32 7, !"Dwarf Version", i32 5}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!8 = !{!"clang version 21.0.0git"}
+!9 = distinct !DISubprogram(name: "foo", scope: !10, file: !10, line: 1, type: !11, scopeLine: 1, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !14)
+!10 = !DIFile(filename: "1.c", directory: "")
+!11 = !DISubroutineType(types: !12)
+!12 = !{!13, !13}
+!13 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!14 = !{!15, !16}
+!15 = !DILocalVariable(name: "a", arg: 1, scope: !9, file: !10, line: 1, type: !13)
+!16 = !DILocalVariable(name: "i", scope: !9, file: !10, line: 2, type: !13)
+!17 = distinct !DIAssignID()
+!18 = !DILocation(line: 0, scope: !9)
+!19 = distinct !DIAssignID()
+!20 = !{!21, !21, i64 0}
+!21 = !{!"int", !22, i64 0}
+!22 = !{!"omnipotent char", !23, i64 0}
+!23 = !{!"Simple C/C++ TBAA"}
+!24 = distinct !DIAssignID()
+!25 = !DILocation(line: 2, column: 3, scope: !9)
+!26 = !DILocation(line: 2, column: 7, scope: !9)
+!27 = distinct !DIAssignID()
+!28 = !DILocation(line: 3, column: 8, scope: !29)
+!29 = distinct !DILexicalBlock(scope: !9, file: !10, line: 3, column: 7)
+!30 = !DILocation(line: 3, column: 10, scope: !29)
+!31 = !DILocation(line: 3, column: 15, scope: !29)
+!32 = !DILocation(line: 4, column: 7, scope: !33)
+!33 = distinct !DILexicalBlock(scope: !29, file: !10, line: 3, column: 21)
+!34 = distinct !DIAssignID()
+!35 = !DILocation(line: 40, column: 6, scope: !36)
+!36 = !DILexicalBlockFile(scope: !33, file: !37, discriminator: 0)
+!37 = !DIFile(filename: "m.c", directory: "")
+!38 = !DILocation(line: 40, column: 3, scope: !36)
+!39 = distinct !DIAssignID()
+!40 = !DILocation(line: 41, column: 9, scope: !36)
+!41 = !DILocation(line: 41, column: 8, scope: !36)
+!42 = !DILocation(line: 41, column: 3, scope: !36)
+!43 = distinct !DIAssignID()
+!44 = !DILocation(line: 42, column: 6, scope: !36)
+!45 = !DILocation(line: 42, column: 8, scope: !36)
+!46 = !DILocation(line: 42, column: 7, scope: !36)
+!47 = !DILocation(line: 42, column: 3, scope: !36)
+!48 = distinct !DIAssignID()
+!49 = !DILocation(line: 6, column: 2, scope: !50)
+!50 = !DILexicalBlockFile(scope: !33, file: !10, discriminator: 0)
+!51 = !DILocation(line: 7, column: 7, scope: !52)
+!52 = distinct !DILexicalBlock(scope: !29, file: !10, line: 6, column: 9)
+!53 = distinct !DIAssignID()
+!54 = !DILocation(line: 40, column: 6, scope: !55)
+!55 = !DILexicalBlockFile(scope: !52, file: !37, discriminator: 0)
+!56 = !DILocation(line: 40, column: 3, scope: !55)
+!57 = distinct !DIAssignID()
+!58 = !DILocation(line: 41, column: 9, scope: !55)
+!59 = !DILocation(line: 41, column: 8, scope: !55)
+!60 = !DILocation(line: 41, column: 3, scope: !55)
+!61 = distinct !DIAssignID()
+!62 = !DILocation(line: 42, column: 6, scope: !55)
+!63 = !DILocation(line: 42, column: 8, scope: !55)
+!64 = !DILocation(line: 42, column: 7, scope: !55)
+!65 = !DILocation(line: 42, column: 3, scope: !55)
+!66 = distinct !DIAssignID()
+!67 = !DILocation(line: 10, column: 10, scope: !9)
+!68 = !DILocation(line: 11, column: 1, scope: !9)
+!69 = !DILocation(line: 10, column: 3, scope: !9)
+
+; CHECK: [[SP:![0-9]+]] = distinct !DISubprogram(name: "foo", scope: [[FILE1:![0-9]+]], file: [[FILE1]], line: 1
+; CHECK: [[FILE1]] = !DIFile(filename: "1.c", directory: "")
+; CHECK: [[LB1:![0-9]+]] = distinct !DILexicalBlock(scope: [[SP]], file: [[FILE1]], line: 3, column: 7)
+; CHECK: [[LB2:![0-9]+]] = distinct !DILexicalBlock(scope: [[LB1]], file: [[FILE1]], line: 3, column: 21)
+; CHECK: [[LBF:![0-9]+]] = !DILexicalBlockFile(scope: [[LB2]], file: [[FILE2:![0-9]+]], discriminator: 0)
+; CHECK: [[FILE2]] = !DIFile(filename: "m.c", directory: "")
+; CHECK: [[PHILOC]] = !DILocation(line: 42, column: 3, scope: [[LBF]])
diff --git a/llvm/test/DebugInfo/AArch64/merge-nested-block-loc2.ll b/llvm/test/DebugInfo/AArch64/merge-nested-block-loc2.ll
new file mode 100644
index 0000000000000..dd52bf7907a76
--- /dev/null
+++ b/llvm/test/DebugInfo/AArch64/merge-nested-block-loc2.ll
@@ -0,0 +1,229 @@
+; RUN: opt -mtriple=aarch64-unknown-linux-gnu -S %s -passes=sroa -o - | FileCheck %s
+
+; In this test we want to ensure that getMergedLocations uses common include
+; location if incoming locations belong to different files.
+; The location of phi instruction merged from locations of %mul3 and %mul10
+; should be the location of do-loop lexical block from y.c.
+
+; Generated with clang from
+;
+; main.c:
+;   int foo(int a) {
+;     int i = 0;
+;     if ((a & 1) == 1) {
+;       a -= 1;
+;   #define A
+;   #include "y.c"
+;    } else {
+;       a += 3;
+;   #undef A
+;   #include "y.c"
+;    }
+;     return i;
+;   }
+;
+; y.c:
+;   # 300 "y.c" 1
+;   do {
+;   #ifdef A
+;   #include "z1.c"
+;   #else
+;   #include "z2.c"
+;   #endif
+;   } while (0);
+;
+; z1.c:
+;   # 100 "z1.c" 1
+;   i += a;
+;   i -= 10*a;
+;   i *= a*a;
+;
+; z2.c:
+;   # 200 "z1.c" 1
+;   i += a;
+;   i -= 10*a;
+;   i *= a*a;
+;
+; Preprocessed source:
+;
+; # 1 "main.c"
+; int foo(int a) {
+;   int i = 0;
+;   if ((a & 1) == 1) {
+;     a -= 1;
+; # 300 "y.c" 1
+; do {
+; # 100 "z1.c" 1
+; i += a;
+; i -= 10*a;
+; i *= a*a;
+; # 303 "y.c" 2
+; } while (0);
+; # 7 "main.c" 2
+;  } else {
+;     a += 3;
+; # 300 "y.c" 1
+; do {
+; # 200 "z2.c" 1
+; i += a;
+; i -= 10*a;
+; i *= a*a;
+; # 305 "y.c" 2
+; } while (0);
+; # 11 "main.c" 2
+;  }
+;   return i;
+; }
+
+source_filename = "main.preproc.c"
+target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-n32:64-S128-Fn32"
+target triple = "arm64-apple-macosx15.0.0"
+
+; Function Attrs: noinline nounwind ssp uwtable(sync)
+define i32 @foo(i32 noundef %a) !dbg !9 {
+; CHECK:    phi i32 {{.*}}, !dbg [[PHILOC:![0-9]+]]
+;
+entry:
+  %a.addr = alloca i32, align 4
+  %i = alloca i32, align 4
+  store i32 %a, ptr %a.addr, align 4
+    #dbg_declare(ptr %a.addr, !15, !DIExpression(), !16)
+    #dbg_declare(ptr %i, !17, !DIExpression(), !18)
+  store i32 0, ptr %i, align 4, !dbg !18
+  %0 = load i32, ptr %a.addr, align 4, !dbg !19
+  %and = and i32 %0, 1, !dbg !21
+  %cmp = icmp eq i32 %and, 1, !dbg !22
+  br i1 %cmp, label %if.then, label %if.else, !dbg !22
+
+if.then:                                          ; preds = %entry
+  %1 = load i32, ptr %a.addr, align 4, !dbg !23
+  %sub = sub nsw i32 %1, 1, !dbg !23
+  store i32 %sub, ptr %a.addr, align 4, !dbg !23
+  br label %do.body, !dbg !25
+
+do.body:                                          ; preds = %if.then
+  %2 = load i32, ptr %a.addr, align 4, !dbg !28
+  %3 = load i32, ptr %i, align 4, !dbg !32
+  %add = add nsw i32 %3, %2, !dbg !32
+  store i32 %add, ptr %i, align 4, !dbg !32
+  %4 = load i32, ptr %a.addr, align 4, !dbg !33
+  %mul = mul nsw i32 10, %4, !dbg !34
+  %5 = load i32, ptr %i, align 4, !dbg !35
+  %sub1 = sub nsw i32 %5, %mul, !dbg !35
+  store i32 %sub1, ptr %i, align 4, !dbg !35
+  %6 = load i32, ptr %a.addr, align 4, !dbg !36
+  %7 = load i32, ptr %a.addr, align 4, !dbg !37
+  %mul2 = mul nsw i32 %6, %7, !dbg !38
+  %8 = load i32, ptr %i, align 4, !dbg !39
+  %mul3 = mul nsw i32 %8, %mul2, !dbg !39
+  store i32 %mul3, ptr %i, align 4, !dbg !39
+  br label %do.end, !dbg !40
+
+do.end:                                           ; preds = %do.body
+  br label %if.end, !dbg !42
+
+if.else:                                          ; preds = %entry
+  %9 = load i32, ptr %a.addr, align 4, !dbg !44
+  %add4 = add nsw i32 %9, 3, !dbg !44
+  store i32 %add4, ptr %a.addr, align 4, !dbg !44
+  br label %do.body5, !dbg !46
+
+do.body5:                                         ; preds = %if.else
+  %10 = load i32, ptr %a.addr, align 4, !dbg !48
+  %11 = load i32, ptr %i, align 4, !dbg !52
+  %add6 = add nsw i32 %11, %10, !dbg !52
+  store i32 %add6, ptr %i, align 4, !dbg !52
+  %12 = load i32, ptr %a.addr, align 4, !dbg !53
+  %mul7 = mul nsw i32 10, %12, !dbg !54
+  %13 = load i32, ptr %i, align 4, !dbg !55
+  %sub8 = sub nsw i32 %13, %mul7, !dbg !55
+  store i32 %sub8, ptr %i, align 4, !dbg !55
+  %14 = load i32, ptr %a.addr, align 4, !dbg !56
+  %15 = load i32, ptr %a.addr, align 4, !dbg !57
+  %mul9 = mul nsw i32 %14, %15, !dbg !58
+  %16 = load i32, ptr %i, align 4, !dbg !59
+  %mul10 = mul nsw i32 %16, %mul9, !dbg !59
+  store i32 %mul10, ptr %i, align 4, !dbg !59
+  br label %do.end11, !dbg !60
+
+do.end11:                                         ; preds = %do.body5
+  br label %if.end
+
+if.end:                                           ; preds = %do.end11, %do.end
+  %17 = load i32, ptr %i, align 4, !dbg !62
+  ret i32 %17, !dbg !63
+}
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3}
+!llvm.ident = !{!8}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 21.0.0git", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: Apple, sysroot: "/")
+!1 = !DIFile(filename: "main.preproc.c", directory: "")
+!2 = !{i32 7, !"Dwarf Version", i32 5}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!8 = !{!"clang version 21.0.0git"}
+!9 = distinct !DISubprogram(name: "foo", scope: !10, file: !10, line: 1, type: !11, scopeLine: 1, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !14)
+!10 = !DIFile(filename: "main.c", directory: "")
+!11 = !DISubroutineType(types: !12)
+!12 = !{!13, !13}
+!13 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!14 = !{}
+!15 = !DILocalVariable(name: "a", arg: 1, scope: !9, file: !10, line: 1, type: !13)
+!16 = !DILocation(line: 1, column: 13, scope: !9)
+!17 = !DILocalVariable(name: "i", scope: !9, file: !10, line: 2, type: !13)
+!18 = !DILocation(line: 2, column: 7, scope: !9)
+!19 = !DILocation(line: 3, column: 8, scope: !20)
+!20 = distinct !DILexicalBlock(scope: !9, file: !10, line: 3, column: 7)
+!21 = !DILocation(line: 3, column: 10, scope: !20)
+!22 = !DILocation(line: 3, column: 15, scope: !20)
+!23 = !DILocation(line: 4, column: 7, scope: !24)
+!24 = distinct !DILexicalBlock(scope: !20, file: !10, line: 3, column: 21)
+!25 = !DILocation(line: 300, column: 1, scope: !26)
+!26 = !DILexicalBlockFile(scope: !24, file: !27, discriminator: 0)
+!27 = !DIFile(filename: "y.c", directory: "")
+!28 = !DILocation(line: 100, column: 6, scope: !29)
+!29 = !DILexicalBlockFile(scope: !31, file: !30, discriminator: 0)
+!30 = !DI...
[truncated]

Copy link
Contributor

@SLTozer SLTozer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the .ll tests - if possible, could you reduce them so that they contain the minimal input needed to test the expected behaviour? The most important part being the metadata imo - it's much easier to understand what the test is testing for when we only have the metadata entries that are relevant/necessary for the test, and only the instructions being merged have those DILocation attachments.

@dzhidzhoev
Copy link
Member Author

Regarding the .ll tests - if possible, could you reduce them so that they contain the minimal input needed to test the expected behaviour? The most important part being the metadata imo - it's much easier to understand what the test is testing for when we only have the metadata entries that are relevant/necessary for the test, and only the instructions being merged have those DILocation attachments.

Fixed

@dwblaikie
Copy link
Collaborator

the textual description of this issue I'm finding a bit hard to follow - could you sketch out a little example, perhaps - some approximation of the source situation, the two location chains in question, the location chain that's produced currently when they're merged, and the location chain that'll be produced after this patch?

@dzhidzhoev
Copy link
Member Author

dzhidzhoev commented Mar 28, 2025

the textual description of this issue I'm finding a bit hard to follow - could you sketch out a little example, perhaps - some approximation of the source situation, the two location chains in question, the location chain that's produced currently when they're merged, and the location chain that'll be produced after this patch?

Sorry, I should probably rewrite that (but not sure about the wording yet).

The issue #122846 arose while using an external non-C++ language frontend, which has a feature of inlining on the frontend side, that produces semantically identical subtrees of DILocalScopes at call sites of a function being inlined. In some cases, a call site and a callee function are from different source files.

On the current mainline, MergeLocPair() within getMergedLocation() doesn't consider the case of DILocation's or their nearest common scope being from different files at all. It can be solved just by ensuring the equality of input DILocations' files and output DILocation's file after the GetNearestCommonScope() call.

However, with that approach, many source locations will be lost with the frontend I mentioned before. What this patch is trying to achieve is that if there are two DILocalScope subtrees that de facto represent the same source code (probably included from different files/parent scopes), and DILocations being merged belong to these subtrees, GetNearestCommonScope should be able to return a scope that it would return if both input DILocations belonged to a single DILocalScope subtree (not to different subtrees representing the same source). This is implemented by having GetNearestCommonScope consider DILocalScopes equal not based on pointer equality, but on the equality of their locations (line number, column number, and file).

There's an example in merge-nested-block-loc2.ll. We may consider two files, the locations from which will be merged:

z1.c:

  # 100 "z1.c" 1
  i += a;
  i -= 10*a;
  i *= a*a;

z2.c:

  # 200 "z2.c" 1
  i += a;
  i -= 10*a;
  i *= a*a;

They are both included from another file.

y.c:

  # 300 "y.c" 1
  do {
  #ifdef A
  #include "z1.c"
  #else
  #include "z2.c"
  #endif
  } while (0);

Which is included from the "root" file two times. There, include directives "simulate" the inlining of a single function (y.c) at two different locations, with different arguments (preprocessor symbol A as an "argument").

main.c:

int foo(int a) {
  int i = 0;
  if ((a & 1) == 1) {
    a -= 1;
#define A
#include "y.c"
 } else {
    a += 3;
#undef A
#include "y.c"
 }
  return i;
}

DILocalScopes tree produced from this source code will be similar to that:

DILocalScopes of y.c included from if-branch of foo() ==> DILocalScopes of z1.c
    ^
    |
DILocalScopes for foo() from main.c
    |
    v
DILocalScopes of y.c included from else-branch of foo() ==> DILocalScopes of z2.c

On the current mainline, merging two locations from z1.c and z2.c will result in a line-0 location having a scope from main.c.
With this patch, the output location will have a scope from y.c, its line and column numbers will also be from y.c.

@dwblaikie
Copy link
Collaborator

Thanks, this is helping.

The chains are still a bit confusing to me in the example - perhaps use specific instructions, and their scope chain? (some instruction should have a single scope link, which continues up until it reaches the function - so I don't quite understand the function with up and down arrows, and sideways arrows)

Should be two instructions, each with linear scope chains that reach the function - could you describe those, and how the merging currently works (the steps, and how it reaches the (undesirable to you) outcome)? and how the new algorithm reaches a different conclusion?

@dzhidzhoev
Copy link
Member Author

dzhidzhoev commented Mar 31, 2025

I'll paste the example from the previous comment here to put everything in one place.

z1.c:

  # 100 "z1.c" 1
  i += a;
  i -= 10*a;
  i *= a*a;

z2.c:

  # 200 "z2.c" 1
  i += a;
  i -= 10*a;
  i *= a*a;

y.c:

  # 300 "y.c" 1
  do {
  #ifdef A
  #include "z1.c"
  #else
  #include "z2.c"
  #endif
  } while (0);

main.c:

int foo(int a) {
  int i = 0;
  if ((a & 1) == 1) {
    a -= 1;
#define A
#include "y.c"
 } else {
    a += 3;
#undef A
#include "y.c"
 }
  return i;
}

Let's consider the DILocations of i *= a*a; from z1.c:102 (L1) and i *= a*a; from z2.c:202 (L2). They are being merged.

We have the following parent scopes chain for L1, discovered by GetNearestCommonScope:

!6  =          !DILocation(line: 102, column: 3, scope: !7)
!7  =          !DILexicalBlockFile(scope: !9,  file: !8,  discriminator: 0)
!9  = distinct !DILexicalBlock    (scope: !11, file: !10, line: 300, column: 4)
!11 =          !DILexicalBlockFile(scope: !12, file: !10, discriminator: 0)
!12 = distinct !DILexicalBlock    (scope: !13, file: !1,  line: 3, column: 21)
!13 = distinct !DILexicalBlock    (scope: !3,  file: !1,  line: 3, column: 7)
!3  = distinct !DISubprogram      (name: "foo", scope: !1, file: !1, line: 1)

!8 =  !DIFile(filename: "z1.c", directory: "")
!10 = !DIFile(filename: "y.c", directory: "")
!1 =  !DIFile(filename: "main.c", directory: "")

The parent scopes chain for L2 looks like that:

!14 =          !DILocation(line: 202, column: 3, scope: !15)
!15 =          !DILexicalBlockFile(scope: !17, file: !16, discriminator: 0)
!17 = distinct !DILexicalBlock    (scope: !18, file: !10, line: 300, column: 4)
!18 =          !DILexicalBlockFile(scope: !19, file: !10, discriminator: 0)
!19 = distinct !DILexicalBlock    (scope: !13, file: !1, line: 7, column: 9)
!13 = distinct !DILexicalBlock    (scope: !3, file: !1, line: 3, column: 7)
!3  = distinct !DISubprogram      (name: "foo", scope: !1, file: !1, line: 1)

!16 = !DIFile(filename: "z2.c", directory: "")
!10 = !DIFile(filename: "y.c", directory: "")
!1 =  !DIFile(filename: "main.c", directory: "")

The current implementation stores a pointer to every scope from L1's chain into Scopes set. Then, it goes over the parent scopes chain of L2 until it finds a DIScope* that is contained in Scopes set. In this example, it is !13. It is taken as a parent scope for the output DILocation. Since L1's and L2's line numbers don't match, the output is !DILocation(line: 0, scope: !13).

With the proposed PR, every scope from L1's chain will be put in Scopes map. The key of the map is the scope's file, line number, and column number tuple. For DILexicalBlockFiles, the line number of a previous scope in the chain is used as a location (since DILexicalBlockFile doesn't have line or column fields). For this example, the map will look like this:

Scopes[{!8,  102, 0}] = {!7}
Scopes[{!10, 300, 4}] = {!9}
Scopes[{!10, 300, 0}] = {!11}
Scopes[{!1, 3, 21}] = {!12}
Scopes[{!1, 3, 7}] = {!13}
Scopes[{!1, 1, 0}] = {!3}

Then, it goes over L2's parent scopes chain, and for every scope S from L2's chain, it determines if there is a scope with the same key in Scopes. If the scope with the same key is found, two options are considered:

  1. scope S is present in Scopes[Key(S)] => return S.
  2. scope S is not present in Scopes[Key(S)] => return any scope from Scopes[Key(S)].

In our example, the following keys are examined until a suitable scope is found:

Scope !15 - Key {!16, 202, 0}
Scope !17 - Key {!10, 300, 4}
Scopes[{!10, 300, 4}] = {!9}

Thus, scope !9 is considered as an output for GetNearestCommonScope.

After Scope = GetNearestCommonScope(L1, L2) call, getMergedLocation checks if L1->getFile(), L2->getFile(), Scope->getFile() are pairwise equal.

  1. If all files are equal, it means there are scopes S1 and S2 from L1 and L2 parent scopes chains correspondingly, which have the same line number, column number, and file as S has. We use S as a parent scope for output location. Output line number and column number are derived from L1, L2 (if they match).
  2. If files are not equal, a case of "#include" into function body is assumed. L1 and L2 locations are ignored, and the output of getMergedLocation() is the location of S (as a scope from which L1 and L2 are included).

In the given example, the output location will be !DILocation(line: 300, column: 4, scope: !9).

The chains are still a bit confusing to me in the example - perhaps use specific instructions, and their scope chain? (some instruction should have a single scope link, which continues up until it reaches the function - so I don't quite understand the function with up and down arrows, and sideways arrows)

(arrows go from parent scopes to children)

@dwblaikie
Copy link
Collaborator

Thanks for all the details - so maybe to paraphrase/check my understanding, this boils down to:
Without this patch, we chose the nearest common scope.
With this patch, we choose the first scope in LHS that has the same file/line/column as any scope in RHS? (& use the shared Scope if there happens to be one, otherwise picks an arbitrary one)

In the given example, the output location will be !DILocation(line: 300, column: 4, scope: !9).

That seems like a problem/not acceptable to me, if I'm following correctly.

That would place the merged instruction inside the scope that belongs to the "if" branch of this block. This would be inconsistent with the general debug info location principles we have so far (that we should never create "false" coverage - ie: it would appear as if the "if" condition was true (because code "inside" it is executing) even if the condition was false). Is that the case/am I understanding this correctly?

I guess in this case, which is a bit different from other cases we've considered, it's about the scope and not the location of the instruction itself - but I think the same principle should apply.

I /think/ it'd be OK to place this in scope !13 with perhaps a newly created/artificial DILexicalBlockFile with the shared source file/line/column on it, maybe? Though I'm not sure exactly how we could/should pick that shared one - "do any collide" seems a bit too vague to me. Perhaps a strict suffix/tail in the source location chain could be preserved? Or just the final location? Not sure...

(there is some ongoing discussion about these principles (see #129960 for instance) - but we should be treading pretty carefully in any case, to figure out what, if any, change in principles we are planning to make here)

@dzhidzhoev
Copy link
Member Author

dzhidzhoev commented Apr 1, 2025

Thanks for all the details - so maybe to paraphrase/check my understanding, this boils down to: Without this patch, we chose the nearest common scope. With this patch, we choose the first scope in LHS that has the same file/line/column as any scope in RHS? (& use the shared Scope if there happens to be one, otherwise picks an arbitrary one)

Yes.

In the given example, the output location will be !DILocation(line: 300, column: 4, scope: !9).

That seems like a problem/not acceptable to me, if I'm following correctly.

That would place the merged instruction inside the scope that belongs to the "if" branch of this block. This would be inconsistent with the general debug info location principles we have so far (that we should never create "false" coverage - ie: it would appear as if the "if" condition was true (because code "inside" it is executing) even if the condition was false). Is that the case/am I understanding this correctly?

Yes. Is it a problem considering that scope !9 refers to y.c, which is included from both branches in main.c, and it is "equivalent" to !17 in the sense of file/line/column?
If we had

if (...) {
#include "z1.c"
} else {
#include "z2.c"
}

in our example, the algorithm would produce !DILocation(scope: !13, line: 3, column: 7) (scope of main()), which does not belong to any of the branches. I thought intermediate scopes (between DILocation's scope and DISubprogram) did not matter from the debugger's prospective.

I /think/ it'd be OK to place this in scope !13 with perhaps a newly created/artificial DILexicalBlockFile with the shared source file/line/column on it, maybe? Though I'm not sure exactly how we could/should pick that shared one - "do any collide" seems a bit too vague to me. Perhaps a strict suffix/tail in the source location chain could be preserved? Or just the final location? Not sure...

We could try the following:

  1. Find the nearest common scope (the current mainline implementation). Scope = GetNearestCommonScope(...).
  2. If any of L1, L2 or Scope files are different:
    1. Use the algorithm from this PR to determine Scope2 (the nearest scope from L2's chain with the common file/line/column as some scope in L1's chain).
    2. NewScope = Scope2.clone() (create an artificial scope), put it inside Scope (as we are sure it does not belong to divergent branches with respect to L1, L2).
    3. If any of L1, L2 or NewScope files are different, return the location of NewScope (a location from which L1, L2 were included).
    4. Scope = NewScope;
  3. Match L1, L2 as it is done currently in the mainline, return DILocation(Scope, Line, Col).

@dzhidzhoev
Copy link
Member Author

dzhidzhoev commented Apr 1, 2025

Though I'm not sure exactly how we could/should pick that shared one - "do any collide" seems a bit too vague to me.

It doesn't seem like a vague match criterion to me, since if we have DILexicalBlock *S1, *S2, if S1 == S2, then they collide. So we should not lose "precision" compared to the mainline (though DILexicalBlockFiles that have different children DILocations may add some imprecision, as we define the location of DILexicalBlockFile as the line of the previous location in a chain).

Perhaps a strict suffix/tail in the source location chain could be preserved?

Similar logic was implemented for the algorithm for matching inlinedAt chains (which wraps MergeLocPair()). I have tried to generalize it here #125780.

@dwblaikie
Copy link
Collaborator

Thanks for all the details - so maybe to paraphrase/check my understanding, this boils down to: Without this patch, we chose the nearest common scope. With this patch, we choose the first scope in LHS that has the same file/line/column as any scope in RHS? (& use the shared Scope if there happens to be one, otherwise picks an arbitrary one)

Yes.

In the given example, the output location will be !DILocation(line: 300, column: 4, scope: !9).

That seems like a problem/not acceptable to me, if I'm following correctly.
That would place the merged instruction inside the scope that belongs to the "if" branch of this block. This would be inconsistent with the general debug info location principles we have so far (that we should never create "false" coverage - ie: it would appear as if the "if" condition was true (because code "inside" it is executing) even if the condition was false). Is that the case/am I understanding this correctly?

Yes. Is it a problem considering that scope !9 refers to y.c, which is included from both branches in main.c, and it is "equivalent" to !17 in the sense of file/line/column? If we had

if (...) {
#include "z1.c"
} else {
#include "z2.c"
}

in our example, the algorithm would produce !DILocation(scope: !13, line: 3, column: 7) (scope of main()), which does not belong to any of the branches. I thought intermediate scopes (between DILocation's scope and DISubprogram) did not matter from the debugger's prospective.

At least they matter for name lookup. For instance if inside each side of the if/else a local variable x was declared, then with the proposed location merging, being at that location and printing x would refer to the x in the if, not the else, with no hint that it's ambiguous. Current merging/the direction I'd consider, would ensure x was not valid at all, which is about the best we can do since DWARF doesn't provide a way to describe the ambiguity.

I /think/ it'd be OK to place this in scope !13 with perhaps a newly created/artificial DILexicalBlockFile with the shared source file/line/column on it, maybe? Though I'm not sure exactly how we could/should pick that shared one - "do any collide" seems a bit too vague to me. Perhaps a strict suffix/tail in the source location chain could be preserved? Or just the final location? Not sure...

We could try the following:

  1. Find the nearest common scope (the current mainline implementation). Scope = GetNearestCommonScope(...).

  2. If any of L1, L2 or Scope files are different:

    1. Use the algorithm from this PR to determine Scope2 (the nearest scope from L2's chain with the common file/line/column as some scope in L1's chain).
    2. NewScope = Scope2.clone() (create an artificial scope), put it inside Scope (as we are sure it does not belong to divergent branches with respect to L1, L2).
    3. If any of L1, L2 or NewScope files are different, return the location of NewScope (a location from which L1, L2 were included).
    4. Scope = NewScope;
  3. Match L1, L2 as it is done currently in the mainline, return DILocation(Scope, Line, Col).

Yeah, something like that ^ seems plausible to me at the moment.

About the only thing I'm currently concerned about in that direction is the "nearest scope from L2 with a common file/line/column as a scope in L1" - that algorithm doesn't obviously sound symmetric, but I guess it probably is. Is it symmetric? (like is "this_thing(L1, L2)" the same as "this_thing(L2, L1)") that'd be a nice property to have/preserve & reduce that chance this some nuanced/fussy algorithm isn't even fussier.

@SLTozer Guess you've got some context from the previous reviews - thoughts on all this?

@dzhidzhoev
Copy link
Member Author

Thanks for all the details - so maybe to paraphrase/check my understanding, this boils down to: Without this patch, we chose the nearest common scope. With this patch, we choose the first scope in LHS that has the same file/line/column as any scope in RHS? (& use the shared Scope if there happens to be one, otherwise picks an arbitrary one)

Yes.

In the given example, the output location will be !DILocation(line: 300, column: 4, scope: !9).

That seems like a problem/not acceptable to me, if I'm following correctly.
That would place the merged instruction inside the scope that belongs to the "if" branch of this block. This would be inconsistent with the general debug info location principles we have so far (that we should never create "false" coverage - ie: it would appear as if the "if" condition was true (because code "inside" it is executing) even if the condition was false). Is that the case/am I understanding this correctly?

Yes. Is it a problem considering that scope !9 refers to y.c, which is included from both branches in main.c, and it is "equivalent" to !17 in the sense of file/line/column? If we had

if (...) {
#include "z1.c"
} else {
#include "z2.c"
}

in our example, the algorithm would produce !DILocation(scope: !13, line: 3, column: 7) (scope of main()), which does not belong to any of the branches. I thought intermediate scopes (between DILocation's scope and DISubprogram) did not matter from the debugger's prospective.

At least they matter for name lookup. For instance if inside each side of the if/else a local variable x was declared, then with the proposed location merging, being at that location and printing x would refer to the x in the if, not the else, with no hint that it's ambiguous. Current merging/the direction I'd consider, would ensure x was not valid at all, which is about the best we can do since DWARF doesn't provide a way to describe the ambiguity.

Thank you for the explanation!

About the only thing I'm currently concerned about in that direction is the "nearest scope from L2 with a common file/line/column as a scope in L1" - that algorithm doesn't obviously sound symmetric, but I guess it probably is. Is it symmetric? (like is "this_thing(L1, L2)" the same as "this_thing(L2, L1)") that'd be a nice property to have/preserve & reduce that chance this some nuanced/fussy algorithm isn't even fussier.

I wouldn't say it is strictly symmetric since it always returns scopes from L1's chain.
If the question is whether this_thing(L1, L2) returns a scope with the same file/line/column as this_thing(L2, L1), I don't think so.
Let's consider two chains of Scopes:

L1 <= DILexicalBlock(file, line1, col) <= DILexicalBlock(file, line2, col) <= SP
L2 <= DILexicalBlock(file, line2, col) <= DILexicalBlock(file, line1, col) <= SP

Then,

this_thing(L1, L2) = DILexicalBlock(file, line2, col)
this_thing(L2, L1) = DILexicalBlock(file, line1, col)

I /think/ it'd be OK to place this in scope !13 with perhaps a newly created/artificial DILexicalBlockFile with the shared source file/line/column on it, maybe? Though I'm not sure exactly how we could/should pick that shared one - "do any collide" seems a bit too vague to me. Perhaps a strict suffix/tail in the source location chain could be preserved? Or just the final location? Not sure...

We could try the following:

  1. Find the nearest common scope (the current mainline implementation). Scope = GetNearestCommonScope(...).

  2. If any of L1, L2 or Scope files are different:

    1. Use the algorithm from this PR to determine Scope2 (the nearest scope from L2's chain with the common file/line/column as some scope in L1's chain).
    2. NewScope = Scope2.clone() (create an artificial scope), put it inside Scope (as we are sure it does not belong to divergent branches with respect to L1, L2).
    3. If any of L1, L2 or NewScope files are different, return the location of NewScope (a location from which L1, L2 were included).
    4. Scope = NewScope;
  3. Match L1, L2 as it is done currently in the mainline, return DILocation(Scope, Line, Col).

Yeah, something like that ^ seems plausible to me at the moment.

I'll include the changes into this PR.

@SLTozer
Copy link
Contributor

SLTozer commented Apr 3, 2025

Let's consider two chains of Scopes:

Yes, I was thinking that this would be possible, and asymmetry isn't something that I believe can otherwise happen in this function (e.g. getMergedLocation(L1, L2) is always the same as getMergedLocation(L2, L1)). It sounds like a bad property to have, but I don't know if it actually causes any problems anywhere - it's more "odd" than anything, but from my (C++) perspective the textual inclusion behaviour is odd as well. I think you could create a stack of scopes like that in C/C++, but at the moment, I don't think either of the asymmetric results would be more incorrect than a line 0 (since either one is an accurate depiction of some layer of the textual-inclusion stack), or indeed any other source location, and I'm hard pressed to think of a practical scenario where this would cause more problems for debugging or profiling than omitting the information outright.

@dzhidzhoev
Copy link
Member Author

I'll include the changes into this PR.

Done.

Let's consider two chains of Scopes:

Yes, I was thinking that this would be possible, and asymmetry isn't something that I believe can otherwise happen in this function

Also, the lexical blocks structure from the counterexample that I provided, IMO, shouldn't pop up in practice. I can hardly imagine for which source code two lexical block chains with swapped blocks will be produced if usually lexical blocks form a tree, and "#include" directives also form a tree. It would be interesting to know about such examples.

Copy link
Contributor

@SLTozer SLTozer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm more confident with the direction of this patch now - thanks for your patience with the updates. Code changes largely LGTM at this point too. Just double checking: @dwblaikie are you happy with the approach described above? I think this gives us strictly more information, and doesn't give us anything that's "incorrect"; hopefully it doesn't result in generating too many new scopes, but the logic seems like it's rarely applicable to most languages at least.

Comment on lines 154 to 156
// When matching DILexicalBlockFile's, ignore column numbers, so that
// DILocation's having different columns within the same
// DILexicalBlockFile will match.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a behaviour we definitely want? On the one hand, it seems worth matching DILexicalBlockFiles if they have different column numbers, and setting a line 0 column in the result; on the other hand, if we have two DILexicalBlockFiles that have identical column numbers, then we would want to preserve that column number.

No need to complicate the patch further by allowing either exact matches or line 0 columns if it's non-trivial, just noting this and checking whether it's an important property or just a matter of convenience.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a behaviour we definitely want? On the one hand, it seems worth matching DILexicalBlockFiles if they have different column numbers, and setting a line 0 column in the result; on the other hand, if we have two DILexicalBlockFiles that have identical column numbers, then we would want to preserve that column number.

I added this initially, when the algorithm was meant to be used not in addition to but instead of getNearestCommonScope(). It was necessary to make it work the same way as getNearestCommonScope() in a simple case of

!0 = !DILocation(scope: !2, line: 1, col: 1)
!1 = !DILocation(scope: !2, line: 1, col: 2)
!2 = !DILexicalBlockFile(...)

It may also be useful at the current state of the PR, but we can simplify it for now, and probably add this behavior later when needed.

@dwblaikie
Copy link
Collaborator

I think I'm more confident with the direction of this patch now - thanks for your patience with the updates. Code changes largely LGTM at this point too. Just double checking: @dwblaikie are you happy with the approach described above? I think this gives us strictly more information, and doesn't give us anything that's "incorrect"; hopefully it doesn't result in generating too many new scopes, but the logic seems like it's rarely applicable to most languages at least.

yeah, thanks for taking a look - I'm OK whenever you're OK with this, given the current state/direction.

@dzhidzhoev dzhidzhoev force-pushed the merge-scopes-by-loc branch from c7ded9d to b61a02b Compare April 7, 2025 13:41
GetNearestCommonScope finds a common scope for two locations by finding
the common parent scope object.

With this commit, GetNearestCommonScope considers two scopes equal if
they have the same file, line and column values. Thus, it can find
the "nearest common included location" when merging locations from
different files. This may be useful for a case described here
llvm#125780 (comment).

DIScope's pointer equality isn't enough for scope equality check, since, for
example, two `#include "x.inc"` directives showing up on different lines
produce different DILocalScope objects representing the same file content.
Thus, two (even same) locations from "x.inc" may have different
parent scope objects representing the same source location.

If input DILocations are from different files, or common scope is in
another file than input locations, a textual inclusion case is assumed,
and the location of common scope ("nearest common included location") is
returned as a result of getMergedLocation.
It fixes llvm#122846.
@dzhidzhoev dzhidzhoev force-pushed the merge-scopes-by-loc branch from b61a02b to 326dd47 Compare April 11, 2025 15:37
@dzhidzhoev
Copy link
Member Author

Rebased on top of #129960.

Copy link
Contributor

@SLTozer SLTozer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides one inline comment, LGTM

// Try using the nearest scope with common location if files are different.
if (Scope->getFile() != L1->getFile() || L1->getFile() != L2->getFile()) {
auto [CommonLocScope, CommonLoc] =
getNearestMatchingScope<ScopeLocationsMatcher>(L1, L2);
Copy link
Contributor

@SLTozer SLTozer Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine a small optimization where for this second search, we cap the search up through L1's scope chain at Scope, i.e. we never consider any parent scopes of Scope; but that's a minor point that may not be worth making the code less legible for and this patch doesn't noticeably affect performance, so it's not necessary to merge this.

Comment on lines 1089 to 1091
M->dump();
M->getScope()->dump();
LBF1->dump();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some debug dumps left hanging around.

@dzhidzhoev dzhidzhoev merged commit 6462fad into llvm:main Apr 18, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DILocation::getMergedLocation produces invalid result while merging locations from DILexicalBlockFile
4 participants