Skip to content

[ubsan] Add more -fsanitize-annotate-debug-info checks #141997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

thurstond
Copy link
Contributor

@thurstond thurstond commented May 29, 2025

This extends #138577 to more UBSan checks, by changing SanitizerDebugLocation (formerly SanitizerScope) to add annotations if enabled for the specified ordinals.

Annotations will use the ordinal name if there is exactly one ordinal specified in the SanitizerDebugLocation; otherwise, it will use the handler name.

Updates the tests from #141814.

thurstond added 6 commits May 29, 2025 00:00
This extends llvm#138577 to more
UBSan checks.

Note that the annotations are less detailed: they will always be
__ubsan_check_singularity, rather than using the SanitizerKind (previous
behavior, which is not always possible for all UBSan checks) or SanitizerHandler.
This is a (minor) regression compared to
llvm#128977 and
llvm#139809.

Updates the tests from llvm#128976,
llvm#139149 and llvm#141814.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels May 29, 2025
@llvmbot
Copy link
Member

llvmbot commented May 29, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Thurston Dang (thurstond)

Changes

This extends #138577 to more UBSan checks, by changing SanitizerScope to add annotations if enabled for the specified ordinals.

Note that the annotations are less detailed: they will always be __ubsan_check_singularity, rather than using the SanitizerKind (previous behavior, which is not always possible for all UBSan checks) or SanitizerHandler. This is a (minor) regression compared to #128977 and #139809.

Updates the tests from #128976, #139149 and #141814.


Patch is 53.49 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/141997.diff

16 Files Affected:

  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+10-6)
  • (modified) clang/lib/CodeGen/CGCall.cpp (+3-3)
  • (modified) clang/lib/CodeGen/CGClass.cpp (+5-6)
  • (modified) clang/lib/CodeGen/CGDecl.cpp (+5-4)
  • (modified) clang/lib/CodeGen/CGExpr.cpp (+47-46)
  • (modified) clang/lib/CodeGen/CGExprScalar.cpp (+48-16)
  • (modified) clang/lib/CodeGen/CGObjC.cpp (+3-3)
  • (modified) clang/lib/CodeGen/CodeGenFunction.cpp (+20-10)
  • (modified) clang/lib/CodeGen/CodeGenFunction.h (+11-5)
  • (modified) clang/lib/CodeGen/ItaniumCXXABI.cpp (+6-8)
  • (modified) clang/test/CodeGen/bounds-checking-debuginfo.c (+4-4)
  • (modified) clang/test/CodeGen/cfi-check-fail-debuginfo.c (+2-2)
  • (modified) clang/test/CodeGen/cfi-icall-generalize-debuginfo.c (+4-4)
  • (modified) clang/test/CodeGen/cfi-icall-normalize2-debuginfo.c (+2-2)
  • (modified) clang/test/CodeGen/ubsan-function-debuginfo.c (+18-15)
  • (modified) clang/test/CodeGen/unsigned-promotion-debuginfo.c (+16-13)
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 89b321090f2d8..b858770155472 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -2006,10 +2006,11 @@ Value *CodeGenFunction::EmitCheckedArgForBuiltin(const Expr *E,
   if (!SanOpts.has(SanitizerKind::Builtin))
     return ArgValue;
 
-  SanitizerScope SanScope(this);
+  auto CheckOrdinal = SanitizerKind::SO_Builtin;
+  SanitizerScope SanScope(this, {CheckOrdinal});
   Value *Cond = Builder.CreateICmpNE(
       ArgValue, llvm::Constant::getNullValue(ArgValue->getType()));
-  EmitCheck(std::make_pair(Cond, SanitizerKind::SO_Builtin),
+  EmitCheck(std::make_pair(Cond, CheckOrdinal),
             SanitizerHandler::InvalidBuiltin,
             {EmitCheckSourceLocation(E->getExprLoc()),
              llvm::ConstantInt::get(Builder.getInt8Ty(), Kind)},
@@ -2022,10 +2023,10 @@ Value *CodeGenFunction::EmitCheckedArgForAssume(const Expr *E) {
   if (!SanOpts.has(SanitizerKind::Builtin))
     return ArgValue;
 
-  SanitizerScope SanScope(this);
+  auto CheckOrdinal = SanitizerKind::SO_Builtin;
+  SanitizerScope SanScope(this, {CheckOrdinal});
   EmitCheck(
-      std::make_pair(ArgValue, SanitizerKind::SO_Builtin),
-      SanitizerHandler::InvalidBuiltin,
+      std::make_pair(ArgValue, CheckOrdinal), SanitizerHandler::InvalidBuiltin,
       {EmitCheckSourceLocation(E->getExprLoc()),
        llvm::ConstantInt::get(Builder.getInt8Ty(), BCK_AssumePassedFalse)},
       std::nullopt);
@@ -2048,7 +2049,10 @@ static Value *EmitOverflowCheckedAbs(CodeGenFunction &CGF, const CallExpr *E,
       return EmitAbs(CGF, ArgValue, true);
   }
 
-  CodeGenFunction::SanitizerScope SanScope(&CGF);
+  SmallVector<SanitizerKind::SanitizerOrdinal, 1> Ordinals;
+  if (SanitizeOverflow)
+    Ordinals.push_back(SanitizerKind::SO_SignedIntegerOverflow);
+  CodeGenFunction::SanitizerScope SanScope(&CGF, Ordinals);
 
   Constant *Zero = Constant::getNullValue(ArgValue->getType());
   Value *ResultAndOverflow = CGF.Builder.CreateBinaryIntrinsic(
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index bd920a2e3f2dd..fedabea72d600 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -4156,7 +4156,7 @@ void CodeGenFunction::EmitReturnValueCheck(llvm::Value *RV) {
     Handler = SanitizerHandler::NullabilityReturn;
   }
 
-  SanitizerScope SanScope(this);
+  SanitizerScope SanScope(this, {CheckKind});
 
   // Make sure the "return" source location is valid. If we're checking a
   // nullability annotation, make sure the preconditions for the check are met.
@@ -4541,7 +4541,7 @@ void CodeGenFunction::EmitNonNullArgCheck(RValue RV, QualType ArgType,
     Handler = SanitizerHandler::NullabilityArg;
   }
 
-  SanitizerScope SanScope(this);
+  SanitizerScope SanScope(this, {CheckKind});
   llvm::Value *Cond = EmitNonNullRValueCheck(RV, ArgType);
   llvm::Constant *StaticData[] = {
       EmitCheckSourceLocation(ArgLoc),
@@ -5976,7 +5976,7 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
       // attribute to insert handler calls.
       if (SanOpts.hasOneOf(SanitizerKind::Address |
                            SanitizerKind::KernelAddress)) {
-        SanitizerScope SanScope(this);
+        SanitizerScope SanScope(this, {});
         llvm::IRBuilder<>::InsertPointGuard IPGuard(Builder);
         Builder.SetInsertPoint(CI);
         auto *FnType = llvm::FunctionType::get(CGM.VoidTy, /*isVarArg=*/false);
diff --git a/clang/lib/CodeGen/CGClass.cpp b/clang/lib/CodeGen/CGClass.cpp
index 251b059c256f6..70f885de05cf9 100644
--- a/clang/lib/CodeGen/CGClass.cpp
+++ b/clang/lib/CodeGen/CGClass.cpp
@@ -1678,7 +1678,7 @@ namespace {
   static void EmitSanitizerDtorCallback(
       CodeGenFunction &CGF, StringRef Name, llvm::Value *Ptr,
       std::optional<CharUnits::QuantityType> PoisonSize = {}) {
-    CodeGenFunction::SanitizerScope SanScope(&CGF);
+    CodeGenFunction::SanitizerScope SanScope(&CGF, {});
     // Pass in void pointer and size of region as arguments to runtime
     // function
     SmallVector<llvm::Value *, 2> Args = {Ptr};
@@ -2885,7 +2885,7 @@ void CodeGenFunction::EmitVTablePtrCheck(const CXXRecordDecl *RD,
           SanitizerMask::bitPosToMask(M), TypeName))
     return;
 
-  SanitizerScope SanScope(this);
+  SanitizerScope SanScope(this, {M});
   EmitSanitizerStatReport(SSK);
 
   llvm::Metadata *MD =
@@ -2942,11 +2942,10 @@ bool CodeGenFunction::ShouldEmitVTableTypeCheckedLoad(const CXXRecordDecl *RD) {
 llvm::Value *CodeGenFunction::EmitVTableTypeCheckedLoad(
     const CXXRecordDecl *RD, llvm::Value *VTable, llvm::Type *VTableTy,
     uint64_t VTableByteOffset) {
-  SanitizerScope SanScope(this);
+  auto CheckOrdinal = SanitizerKind::SO_CFIVCall;
+  SanitizerScope SanScope(this, {CheckOrdinal});
 
   EmitSanitizerStatReport(llvm::SanStat_CFI_VCall);
-  ApplyDebugLocation ApplyTrapDI(
-      *this, SanitizerAnnotateDebugInfo(SanitizerKind::SO_CFIVCall));
 
   llvm::Metadata *MD =
       CGM.CreateMetadataIdentifierForType(QualType(RD->getTypeForDecl(), 0));
@@ -2965,7 +2964,7 @@ llvm::Value *CodeGenFunction::EmitVTableTypeCheckedLoad(
   if (SanOpts.has(SanitizerKind::CFIVCall) &&
       !getContext().getNoSanitizeList().containsType(SanitizerKind::CFIVCall,
                                                      TypeName)) {
-    EmitCheck(std::make_pair(CheckResult, SanitizerKind::SO_CFIVCall),
+    EmitCheck(std::make_pair(CheckResult, CheckOrdinal),
               SanitizerHandler::CFICheckFail, {}, {});
   }
 
diff --git a/clang/lib/CodeGen/CGDecl.cpp b/clang/lib/CodeGen/CGDecl.cpp
index f4549ab3033b2..139cc567f101b 100644
--- a/clang/lib/CodeGen/CGDecl.cpp
+++ b/clang/lib/CodeGen/CGDecl.cpp
@@ -765,14 +765,15 @@ void CodeGenFunction::EmitNullabilityCheck(LValue LHS, llvm::Value *RHS,
 
   // Check if the right hand side of the assignment is nonnull, if the left
   // hand side must be nonnull.
-  SanitizerScope SanScope(this);
+  auto CheckOrdinal = SanitizerKind::SO_NullabilityAssign;
+  SanitizerScope SanScope(this, {CheckOrdinal});
   llvm::Value *IsNotNull = Builder.CreateIsNotNull(RHS);
   llvm::Constant *StaticData[] = {
       EmitCheckSourceLocation(Loc), EmitCheckTypeDescriptor(LHS.getType()),
       llvm::ConstantInt::get(Int8Ty, 0), // The LogAlignment info is unused.
       llvm::ConstantInt::get(Int8Ty, TCK_NonnullAssign)};
-  EmitCheck({{IsNotNull, SanitizerKind::SO_NullabilityAssign}},
-            SanitizerHandler::TypeMismatch, StaticData, RHS);
+  EmitCheck({{IsNotNull, CheckOrdinal}}, SanitizerHandler::TypeMismatch,
+            StaticData, RHS);
 }
 
 void CodeGenFunction::EmitScalarInit(const Expr *init, const ValueDecl *D,
@@ -2852,7 +2853,7 @@ void CodeGenFunction::EmitParmDecl(const VarDecl &D, ParamValue Arg,
   if (requiresReturnValueNullabilityCheck()) {
     auto Nullability = Ty->getNullability();
     if (Nullability && *Nullability == NullabilityKind::NonNull) {
-      SanitizerScope SanScope(this);
+      SanitizerScope SanScope(this, {});
       RetValNullabilityPrecondition =
           Builder.CreateAnd(RetValNullabilityPrecondition,
                             Builder.CreateIsNotNull(Arg.getAnyValue()));
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index bae28b45afaa3..5bd782215b1b4 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -748,7 +748,9 @@ void CodeGenFunction::EmitTypeCheck(TypeCheckKind TCK, SourceLocation Loc,
   if (Ty.isVolatileQualified())
     return;
 
-  SanitizerScope SanScope(this);
+  SanitizerScope SanScope(
+      this, {SanitizerKind::SO_Null, SanitizerKind::SO_ObjectSize,
+             SanitizerKind::SO_Alignment, SanitizerKind::SO_Vptr});
 
   SmallVector<std::pair<llvm::Value *, SanitizerKind::SanitizerOrdinal>, 3>
       Checks;
@@ -989,7 +991,7 @@ static llvm::Value *getArrayIndexingBound(CodeGenFunction &CGF,
     if (CE->getCastKind() == CK_ArrayToPointerDecay &&
         !CE->getSubExpr()->isFlexibleArrayMemberLike(CGF.getContext(),
                                                      StrictFlexArraysLevel)) {
-      CodeGenFunction::SanitizerScope SanScope(&CGF);
+      CodeGenFunction::SanitizerScope SanScope(&CGF, {});
 
       IndexedType = CE->getSubExpr()->getType();
       const ArrayType *AT = IndexedType->castAsArrayTypeUnsafe();
@@ -1002,7 +1004,7 @@ static llvm::Value *getArrayIndexingBound(CodeGenFunction &CGF,
     }
   }
 
-  CodeGenFunction::SanitizerScope SanScope(&CGF);
+  CodeGenFunction::SanitizerScope SanScope(&CGF, {});
 
   QualType EltTy{Base->getType()->getPointeeOrArrayElementType(), 0};
   if (llvm::Value *POS = CGF.LoadPassedObjectSize(Base, EltTy)) {
@@ -1224,10 +1226,8 @@ void CodeGenFunction::EmitBoundsCheckImpl(const Expr *E, llvm::Value *Bound,
   if (!Bound)
     return;
 
-  SanitizerScope SanScope(this);
-
   auto CheckKind = SanitizerKind::SO_ArrayBounds;
-  ApplyDebugLocation ApplyTrapDI(*this, SanitizerAnnotateDebugInfo(CheckKind));
+  SanitizerScope SanScope(this, {CheckKind});
 
   bool IndexSigned = IndexType->isSignedIntegerOrEnumerationType();
   llvm::Value *IndexVal = Builder.CreateIntCast(Index, SizeTy, IndexSigned);
@@ -1245,30 +1245,23 @@ void CodeGenFunction::EmitBoundsCheckImpl(const Expr *E, llvm::Value *Bound,
 }
 
 llvm::DILocation *CodeGenFunction::SanitizerAnnotateDebugInfo(
-    SanitizerKind::SanitizerOrdinal CheckKindOrdinal) {
-  std::string Label;
-  switch (CheckKindOrdinal) {
-#define SANITIZER(NAME, ID)                                                    \
-  case SanitizerKind::SO_##ID:                                                 \
-    Label = "__ubsan_check_" NAME;                                             \
-    break;
-#include "clang/Basic/Sanitizers.def"
-  default:
-    llvm_unreachable("unexpected sanitizer kind");
-  }
-
-  // Sanitize label
-  for (unsigned int i = 0; i < Label.length(); i++)
-    if (!std::isalpha(Label[i]))
-      Label[i] = '_';
-
+    ArrayRef<SanitizerKind::SanitizerOrdinal> Ordinals) {
   llvm::DILocation *CheckDI = Builder.getCurrentDebugLocation();
-  // TODO: deprecate ClArrayBoundsPseudoFn
-  if (((ClArrayBoundsPseudoFn &&
-        CheckKindOrdinal == SanitizerKind::SO_ArrayBounds) ||
-       CGM.getCodeGenOpts().SanitizeAnnotateDebugInfo.has(CheckKindOrdinal)) &&
-      CheckDI)
-    CheckDI = getDebugInfo()->CreateSyntheticInlineAt(CheckDI, Label);
+
+  // TODO: the annotation could be more precise:
+  // 1) use the ordinal name if there is only one ordinal; and/or,
+  // 2) use the overarching SanitizerHandler if there are multiple ordinals
+  //    (this may be confusing to users)
+  for (auto Ord : Ordinals) {
+    // TODO: deprecate ClArrayBoundsPseudoFn
+    if (((ClArrayBoundsPseudoFn && Ord == SanitizerKind::SO_ArrayBounds) ||
+         CGM.getCodeGenOpts().SanitizeAnnotateDebugInfo.has(Ord)) &&
+        CheckDI) {
+      CheckDI = getDebugInfo()->CreateSyntheticInlineAt(
+          CheckDI, "__ubsan_check_singularity");
+      break;
+    }
+  }
 
   return CheckDI;
 }
@@ -1994,8 +1987,11 @@ bool CodeGenFunction::EmitScalarRangeCheck(llvm::Value *Value, QualType Ty,
   if (!getRangeForType(*this, Ty, Min, End, /*StrictEnums=*/true, IsBool))
     return true;
 
+  SanitizerKind::SanitizerOrdinal Kind =
+      NeedsEnumCheck ? SanitizerKind::SO_Enum : SanitizerKind::SO_Bool;
+
   auto &Ctx = getLLVMContext();
-  SanitizerScope SanScope(this);
+  SanitizerScope SanScope(this, {Kind});
   llvm::Value *Check;
   --End;
   if (!Min) {
@@ -2009,8 +2005,6 @@ bool CodeGenFunction::EmitScalarRangeCheck(llvm::Value *Value, QualType Ty,
   }
   llvm::Constant *StaticArgs[] = {EmitCheckSourceLocation(Loc),
                                   EmitCheckTypeDescriptor(Ty)};
-  SanitizerKind::SanitizerOrdinal Kind =
-      NeedsEnumCheck ? SanitizerKind::SO_Enum : SanitizerKind::SO_Bool;
   EmitCheck(std::make_pair(Check, Kind), SanitizerHandler::LoadInvalidValue,
             StaticArgs, Value);
   return true;
@@ -3931,7 +3925,14 @@ void CodeGenFunction::EmitCfiCheckStub() {
 // can be nullptr if the calling module has -fsanitize-trap behavior for this
 // check kind; in this case __cfi_check_fail traps as well.
 void CodeGenFunction::EmitCfiCheckFail() {
-  SanitizerScope SanScope(this);
+  // TODO: the SanitizerKind is not yet determined for this check (and might
+  // not even be available, if Data == nullptr). However, we still want to
+  // annotate the instrumentation. We approximate this by using all the CFI
+  // kinds.
+  SanitizerScope SanScope(
+      this, {SanitizerKind::SO_CFIVCall, SanitizerKind::SO_CFINVCall,
+             SanitizerKind::SO_CFIDerivedCast,
+             SanitizerKind::SO_CFIUnrelatedCast, SanitizerKind::SO_CFIICall});
   FunctionArgList Args;
   ImplicitParamDecl ArgData(getContext(), getContext().VoidPtrTy,
                             ImplicitParamKind::Other);
@@ -3994,8 +3995,6 @@ void CodeGenFunction::EmitCfiCheckFail() {
                          {Addr, AllVtables}),
       IntPtrTy);
 
-  // TODO: the instructions above are not annotated with debug info. It is
-  // inconvenient to do so because we have not determined SanitizerKind yet.
   const std::pair<int, SanitizerKind::SanitizerOrdinal> CheckKinds[] = {
       {CFITCK_VCall, SanitizerKind::SO_CFIVCall},
       {CFITCK_NVCall, SanitizerKind::SO_CFINVCall},
@@ -4007,7 +4006,8 @@ void CodeGenFunction::EmitCfiCheckFail() {
     int Kind = CheckKindOrdinalPair.first;
     SanitizerKind::SanitizerOrdinal Ordinal = CheckKindOrdinalPair.second;
 
-    ApplyDebugLocation ApplyTrapDI(*this, SanitizerAnnotateDebugInfo(Ordinal));
+    // TODO: we could apply SanitizerAnnotateDebugInfo(Ordinal) instead of
+    //       relying on the SanitizerScope with all CFI ordinals
 
     llvm::Value *Cond =
         Builder.CreateICmpNE(CheckKind, llvm::ConstantInt::get(Int8Ty, Kind));
@@ -4030,9 +4030,10 @@ void CodeGenFunction::EmitCfiCheckFail() {
 
 void CodeGenFunction::EmitUnreachable(SourceLocation Loc) {
   if (SanOpts.has(SanitizerKind::Unreachable)) {
-    SanitizerScope SanScope(this);
+    auto CheckOrdinal = SanitizerKind::SO_Unreachable;
+    SanitizerScope SanScope(this, {CheckOrdinal});
     EmitCheck(std::make_pair(static_cast<llvm::Value *>(Builder.getFalse()),
-                             SanitizerKind::SO_Unreachable),
+                             CheckOrdinal),
               SanitizerHandler::BuiltinUnreachable,
               EmitCheckSourceLocation(Loc), {});
   }
@@ -6271,7 +6272,8 @@ RValue CodeGenFunction::EmitCall(QualType CalleeType,
       !isa<FunctionNoProtoType>(PointeeType)) {
     if (llvm::Constant *PrefixSig =
             CGM.getTargetCodeGenInfo().getUBSanFunctionSignature(CGM)) {
-      SanitizerScope SanScope(this);
+      auto CheckOrdinal = SanitizerKind::SO_Function;
+      SanitizerScope SanScope(this, {CheckOrdinal});
       auto *TypeHash = getUBSanFunctionTypeHash(PointeeType);
 
       llvm::Type *PrefixSigType = PrefixSig->getType();
@@ -6331,7 +6333,7 @@ RValue CodeGenFunction::EmitCall(QualType CalleeType,
           Builder.CreateICmpEQ(CalleeTypeHash, TypeHash);
       llvm::Constant *StaticData[] = {EmitCheckSourceLocation(E->getBeginLoc()),
                                       EmitCheckTypeDescriptor(CalleeType)};
-      EmitCheck(std::make_pair(CalleeTypeHashMatch, SanitizerKind::SO_Function),
+      EmitCheck(std::make_pair(CalleeTypeHashMatch, CheckOrdinal),
                 SanitizerHandler::FunctionTypeMismatch, StaticData,
                 {CalleePtr});
 
@@ -6350,10 +6352,9 @@ RValue CodeGenFunction::EmitCall(QualType CalleeType,
   // function pointer is a member of the bit set for the function type.
   if (SanOpts.has(SanitizerKind::CFIICall) &&
       (!TargetDecl || !isa<FunctionDecl>(TargetDecl))) {
-    SanitizerScope SanScope(this);
+    auto CheckOrdinal = SanitizerKind::SO_CFIICall;
+    SanitizerScope SanScope(this, {CheckOrdinal});
     EmitSanitizerStatReport(llvm::SanStat_CFI_ICall);
-    ApplyDebugLocation ApplyTrapDI(
-        *this, SanitizerAnnotateDebugInfo(SanitizerKind::SO_CFIICall));
 
     llvm::Metadata *MD;
     if (CGM.getCodeGenOpts().SanitizeCfiICallGeneralizePointers)
@@ -6374,10 +6375,10 @@ RValue CodeGenFunction::EmitCall(QualType CalleeType,
         EmitCheckTypeDescriptor(QualType(FnType, 0)),
     };
     if (CGM.getCodeGenOpts().SanitizeCfiCrossDso && CrossDsoTypeId) {
-      EmitCfiSlowPathCheck(SanitizerKind::SO_CFIICall, TypeTest, CrossDsoTypeId,
-                           CalleePtr, StaticData);
+      EmitCfiSlowPathCheck(CheckOrdinal, TypeTest, CrossDsoTypeId, CalleePtr,
+                           StaticData);
     } else {
-      EmitCheck(std::make_pair(TypeTest, SanitizerKind::SO_CFIICall),
+      EmitCheck(std::make_pair(TypeTest, CheckOrdinal),
                 SanitizerHandler::CFICheckFail, StaticData,
                 {CalleePtr, llvm::UndefValue::get(IntPtrTy)});
     }
diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index 111b5805c6a94..43ac70bd81b7c 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -999,7 +999,8 @@ void ScalarExprEmitter::EmitFloatConversionCheck(
   if (!isa<llvm::IntegerType>(DstTy))
     return;
 
-  CodeGenFunction::SanitizerScope SanScope(&CGF);
+  auto CheckOrdinal = SanitizerKind::SO_FloatCastOverflow;
+  CodeGenFunction::SanitizerScope SanScope(&CGF, {CheckOrdinal});
   using llvm::APFloat;
   using llvm::APSInt;
 
@@ -1056,7 +1057,7 @@ void ScalarExprEmitter::EmitFloatConversionCheck(
   llvm::Constant *StaticArgs[] = {CGF.EmitCheckSourceLocation(Loc),
                                   CGF.EmitCheckTypeDescriptor(OrigSrcType),
                                   CGF.EmitCheckTypeDescriptor(DstType)};
-  CGF.EmitCheck(std::make_pair(Check, SanitizerKind::SO_FloatCastOverflow),
+  CGF.EmitCheck(std::make_pair(Check, CheckOrdinal),
                 SanitizerHandler::FloatCastOverflow, StaticArgs, OrigSrc);
 }
 
@@ -1134,18 +1135,28 @@ void ScalarExprEmitter::EmitIntegerTruncationCheck(Value *Src, QualType SrcType,
       (!SrcSigned && DstSigned))
     return;
 
-  CodeGenFunction::SanitizerScope SanScope(&CGF);
-
   std::pair<ScalarExprEmitter::ImplicitConversionCheckKind,
             std::pair<llvm::Value *, SanitizerKind::SanitizerOrdinal>>
-      Check =
-          EmitIntegerTruncationCheckHelper(Src, SrcType, Dst, DstType, Builder);
-  // If the comparison result is 'i1 false', then the truncation was lossy.
+      Check;
+
+  {
+    // We don't know the check kind until we call
+    // EmitIntegerTruncationCheckHelper, but we want to annotate
+    // EmitIntegerTruncationCheckHelper's instructions too.
+    CodeGenFunction::SanitizerScope SanScope(
+        &CGF, {SanitizerKind::SO_ImplicitUnsignedIntegerTruncation,
+               SanitizerKind::SO_ImplicitSignedIntegerTruncation});
+    Check =
+        EmitIntegerTruncationCheckHelper(Src, SrcType, Dst, DstType, Builder);
+    // If the comparison result is 'i1 false', then the truncation was lossy.
+  }
 
   // Do we care about this type of truncation?
   if (!CGF.SanOpts.has(Check.second.second))
     return;
 
+  CodeGenFunction::SanitizerScope SanScope(&CGF, {Check.second.second});
+
   // Does some SSCL ignore this type?
   if (CGF.getContext().isTypeIgnoredBySanitizer(
           SanitizerMask::bitPosToMask(Check.second.second), DstType))
@@ -1272,7 +1283,9 @@ void ScalarExprEmitter::EmitIntegerSignChangeCheck(Value *Src, QualType SrcType,
     return;
   // That's it. We can't rule out any more cases with the data we have.
 
-  CodeGenFunction::SanitizerScope SanScope(&CGF);
+  CodeGenFunction::SanitizerScope SanScope(
+      &CGF, {SanitizerKind::SO_ImplicitUnsignedIntegerTruncation,
+             SanitizerKind::SO_ImplicitSignedIntegerTruncation});
 
   std::pair<ScalarExprEmitter::ImplicitConversionCheckKind,
             std::pair<llvm::Value *, SanitizerKind::SanitizerOrdinal>>
@@ -1393,7 +1406,8 @@ void CodeGenFunction::EmitBitfieldConversionCheck(Value *Src, QualType SrcType,
   bool SrcSigned = SrcType->isSignedIntegerOrEnumerationType();
   bool DstSigned = DstType->isSignedIntegerOrEnumerationType();
 
-  CodeGenFunction::SanitizerScope SanScope(this);
+  CodeGenFunction::SanitizerScope SanScope(
+      this, {SanitizerKind::SO_ImplicitBitfieldConversion});
 
   std::pair<ScalarExprEmitter::ImplicitConversionCheckKind,
             std...
[truncated]

@thurstond thurstond requested a review from vitalybuka May 30, 2025 18:43
Copy link
Contributor

@fmayer fmayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this going to be useful? We should make some effort to try to determine where the overhead comes from. I don't like that the specific array_bounds inline function is going away even though we have a specific codepath for it.

@thurstond thurstond marked this pull request as draft May 30, 2025 20:59
@thurstond
Copy link
Contributor Author

How is this going to be useful? We should make some effort to try to determine where the overhead comes from. I don't like that the specific array_bounds inline function is going away even though we have a specific codepath for it.

I've now changed the annotations to use the handler name e.g., __ubsan_check_out_of_bounds, __ubsan_check_cfi_check_fail, __ubsan_check_type_mismatch.

@thurstond thurstond marked this pull request as ready for review June 2, 2025 21:52
@thurstond thurstond requested a review from fmayer June 2, 2025 21:52
CodeGenFunction *CGF, ArrayRef<SanitizerKind::SanitizerOrdinal> Ordinals,
SanitizerHandler Handler)
: SanitizerScope(CGF) {
this->ApplyTrapDI = new ApplyDebugLocation(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not do manual memory management. Use a unique_ptr and forward declared class, that should work because this class has a explicit destructor.

Also drop all the this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2a816fe

drop all the this

It's cleaner?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::make_unuque<>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why do we need heap allocation here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::make_unuque<>

Done

But why do we need heap allocation here?

ApplyDebugLocation is scoped.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess right answer is to avoid include, which probably achievable in other way.

Copy link
Contributor

@fmayer fmayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@vitalybuka
Copy link
Collaborator

  • fsanitize-annotate-debug-info

I guess we don't even want to annotate traps. Now we annotate because how scoped thing work,
but if CreateTrapFailureMessageFor replaced it, it should be fine?

@fmayer
Copy link
Contributor

fmayer commented Jun 2, 2025

  • fsanitize-annotate-debug-info

I guess we don't even want to annotate traps. Now we annotate because how scoped thing work, but if CreateTrapFailureMessageFor replaced it, it should be fine?

Why don't we want to annotate traps?

@vitalybuka
Copy link
Collaborator

vitalybuka commented Jun 2, 2025

  • fsanitize-annotate-debug-info

I guess we don't even want to annotate traps. Now we annotate because how scoped thing work, but if CreateTrapFailureMessageFor replaced it, it should be fine?

Why don't we want to annotate traps?

We annotate for profiling. Traps executed rarely, once per process. So they are not interested for us. Only conditions used for check.
Unless you have use-case which I don't know about yet.

@vitalybuka
Copy link
Collaborator

  • fsanitize-annotate-debug-info

I guess we don't even want to annotate traps. Now we annotate because how scoped thing work, but if CreateTrapFailureMessageFor replaced it, it should be fine?

Why don't we want to annotate traps?

We annotate for profiling. Traps executed rarely, once per process. So they are not interested for us. Only conditions used for check.

But ideally both should work. Our inline frame + message frame should work.

@vitalybuka
Copy link
Collaborator

vitalybuka commented Jun 3, 2025

But ideally both should work. Our inline frame + message frame should work.

Looks like we still want annotations on trap.
@fmayer showed me stack traces with annotation from existing thing. It's nice to see check name as top frame.

I am not sure what is needed to make those frames to produce LLDB recognizable frame.
@delcypher Would it be possible to expand frames of this patch to work as needed?

@delcypher
Copy link
Contributor

@thurstond

the debugger reaches the trap: -fsanitize-annotate-debug-info will have the second-top-most frame, while @anthonyhatran's feature will have the top-most frame, which is logically correct.

While logically correct I think we might have to teach LLDB how to handle this. LLDB has a special "frame recognizer" where it looks for frames using this fake debug info mechanism (e.g. __builtin_verbose_trap) and it assumes the frame below it is the real source code and automatically selects this frame when trapping so that the user sees the correct source location when the trap is hit. If the frame below the fake frame isn't the user's code and is instead another fake frame this won't work properly.

@MiB137 Can probably say more here.

@delcypher
Copy link
Contributor

@vitalybuka

I am not sure what is needed to make those frames to produce LLDB recognizable frame.
@delcypher Would it be possible to expand frames of this patch to work as needed?

@MiB137 is better positioned to comment on this than me. It's probably possible to make LLDB work with which ever top frame we decide but we'd probably want the fake frame names be easily recognizable and stable so that LLDB's feature continues to work reliably. I personally think having the user readable message (i.e the __builtin_verbose_trap like message) at the top makes more sense because that will be the behavior without -fsanitize-annotate-debug-info. It sounds like you are doing this work for profiling. In which case are you able to teach your analysis tools to ignore the frame with the user readable message?

Given that -fsanitize-annotate-debug-info is off by default my suggestion is that @anthonyhatran land the simplest version of his patch possible by not concerning himself with -fsanitize-annotate-debug-info at this stage. And after it's landed we can then decide how to make the two features interact in a way that the LLDB folks are happy with.

@delcypher
Copy link
Contributor

@Michael137 Ping

@Michael137
Copy link
Member

Michael137 commented Jun 4, 2025

While logically correct I think we might have to teach LLDB how to handle this. LLDB has a special "frame recognizer" where it looks for frames using this fake debug info mechanism (e.g. __builtin_verbose_trap) and it assumes the frame below it is the real source code and automatically selects this frame when trapping so that the user sees the correct source location when the trap is hit. If the frame below the fake frame isn't the user's code and is instead another fake frame this won't work properly.

Yea everything that @delcypher said here is accurate from the LLDB-side. Here's the logic that LLDB uses to pick which frame to display:

static StackFrameSP FindMostRelevantFrame(Thread &selected_thread) {
// Defensive upper-bound of when we stop walking up the frames in
// case we somehow ended up looking at an infinite recursion.
const size_t max_stack_depth = 128;
// Start at parent frame.
size_t stack_idx = 1;
StackFrameSP most_relevant_frame_sp =
selected_thread.GetStackFrameAtIndex(stack_idx);
while (most_relevant_frame_sp && stack_idx <= max_stack_depth) {
auto const &sc =
most_relevant_frame_sp->GetSymbolContext(eSymbolContextEverything);
ConstString frame_name = sc.GetFunctionName();
if (!frame_name)
return nullptr;
// Found a frame outside of the `std` namespace. That's the
// first frame in user-code that ended up triggering the
// verbose_trap. Hence that's the one we want to display.
if (!frame_name.GetStringRef().starts_with("std::"))
return most_relevant_frame_sp;
++stack_idx;
most_relevant_frame_sp = selected_thread.GetStackFrameAtIndex(stack_idx);
}
return nullptr;
}

Currently we pick the frame just above the fake inlined frame (and skip over any std:: frames). We can definitely adjust the heuristic to accommodate the case where we have another fake frame above the UBSan trap one.

It's probably possible to make LLDB work with which ever top frame we decide but we'd probably want the fake frame names be easily recognizable and stable so that LLDB's feature continues to work reliably.

Agreed, for the __builtin_verbose_trap we picked __clang_trap_msg as the prefix for the fake frame. And that's how LLDB knows to activate the frame recognizer:

auto symbol_regex_sp = std::make_shared<RegularExpression>(
llvm::formatv("^{0}", ClangTrapPrefix).str());

We'll want a similar prefix for any other fake frames that LLDB is supposed to recognize

Given that -fsanitize-annotate-debug-info is off by default my suggestion is that @anthonyhatran land the simplest version of his patch possible by not concerning himself with -fsanitize-annotate-debug-info at this stage. And after it's landed we can then decide how to make the two features interact in a way that the LLDB folks are happy with.

Agreed

@vitalybuka
Copy link
Collaborator

vitalybuka commented Jun 4, 2025

Should we just rename stuff of this patch into __clang_trap_msg ?

@vitalybuka
Copy link
Collaborator

vitalybuka commented Jun 4, 2025

Should we just rename stuff of this patch into __clang_trap_msg ?

Actually no.

_clang_trap_msg* is only for trap.
_ubsan_check* is for instructions evaluating check (as now there is a bonus: that _ubsan_check* works as _clang_trap_msg* on traps).

Attaching _clang_trap_msg* to evaluation code probably not nice.

Now we can attach _ubsan_check* to both - eval and trap.
Later _clang_trap_msg* can be attached on top of our frame to trap.

I assume _clang_trap_msg* can be as precise as _ubsan_check, so we can switch to that one from _ubsan_check for crash reports.

I guess we can land this one, as we can see clear way forward with both features?

@thurstond thurstond requested review from vitalybuka and fmayer June 4, 2025 19:05
@delcypher
Copy link
Contributor

@vitalybuka

Should we just rename stuff of this patch into __clang_trap_msg ?

Actually no.

_clang_trap_msg* is only for trap. _ubsan_check* is for instructions evaluating check (as now there is a bonus: that _ubsan_check* works as _clang_trap_msg* on traps).

Attaching _clang_trap_msg* to evaluation code probably not nice.

Agreed. __clang_trap_msg serves a very different purpose (show a human readable message in the debugger) to __ubsan_check_ (to aid profiling).

Now we can attach _ubsan_check* to both - eval and trap. Later _clang_trap_msg* can be attached on top of our frame to trap.

That seems reasonable to me. as long as the __ubsan_check name is stable we can teach LLDB to skip this frame when a trap is hit.

I assume _clang_trap_msg* can be as precise as ubsan_check, so we can switch to that one from ubsan_check for crash reports.

By "switch to" do you mean remove __ubsan_check from the debug info on the trap instruction and only use __clang_trap_msg? For now I think the implementation (on the clang side) would likely be simpler if we allowed both inline frames in the debug info.

I guess we can land this one, as we can see clear way forward with both features?

SGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants