Skip to content

[UBSan] Implement src:*=sanitize for UBSan #140529

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

qinkunbao
Copy link
Member

@qinkunbao qinkunbao commented May 19, 2025

Background: #139128

It is a draft implementation for "src:*=sanitize". It should be applied to all sanitizers.

Any srcs assigned to the sanitize category will have their sanitizer instrumentation remained ignored by "src:". For example,

src:*
src:*/test1.cc=sanitize

test1.cc will still have the UBSan instrumented.

Conflicting entries are resolved by the latest entry, which takes precedence.

src:*
src:*/mylib/*=sanitize
src:*/mylib/test.cc

test.cc does not have the UBSan check (In this case, src:*/mylib/test.cc overrides src:*/mylib/*=sanitize for test.cc).

src:*
src:*/mylib/test.cc
src:*/mylib/*=sanitize

test1.cc has the UBSan instrumented (In this case, src:*/mylib/*=sanitize overrides src:*/mylib/test.cc).

Documents update will be in a new PR.

Created using spr 1.3.6
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels May 19, 2025
@llvmbot
Copy link
Member

llvmbot commented May 19, 2025

@llvm/pr-subscribers-clang

Author: Qinkun Bao (qinkunbao)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/140529.diff

4 Files Affected:

  • (modified) clang/include/clang/Basic/SanitizerSpecialCaseList.h (+6-1)
  • (modified) clang/lib/Basic/NoSanitizeList.cpp (+7)
  • (modified) clang/lib/Basic/SanitizerSpecialCaseList.cpp (+16)
  • (added) clang/test/CodeGen/ubsan-src-ignorelist-category.test (+37)
diff --git a/clang/include/clang/Basic/SanitizerSpecialCaseList.h b/clang/include/clang/Basic/SanitizerSpecialCaseList.h
index d024b7dfc2e85..25d518e7128cf 100644
--- a/clang/include/clang/Basic/SanitizerSpecialCaseList.h
+++ b/clang/include/clang/Basic/SanitizerSpecialCaseList.h
@@ -43,13 +43,18 @@ class SanitizerSpecialCaseList : public llvm::SpecialCaseList {
   bool inSection(SanitizerMask Mask, StringRef Prefix, StringRef Query,
                  StringRef Category = StringRef()) const;
 
+  // Query ignorelisted entries if any bit in Mask matches the entry's section.
+  // Return 0 if not found. If found, return the line number (starts with 1).
+  unsigned inSectionBlame(SanitizerMask Mask, StringRef Prefix, StringRef Query,
+                          StringRef Category = StringRef()) const;
+
 protected:
   // Initialize SanitizerSections.
   void createSanitizerSections();
 
   struct SanitizerSection {
     SanitizerSection(SanitizerMask SM, SectionEntries &E)
-        : Mask(SM), Entries(E){};
+        : Mask(SM), Entries(E) {};
 
     SanitizerMask Mask;
     SectionEntries &Entries;
diff --git a/clang/lib/Basic/NoSanitizeList.cpp b/clang/lib/Basic/NoSanitizeList.cpp
index e7e63c1f419e6..811480f914ec5 100644
--- a/clang/lib/Basic/NoSanitizeList.cpp
+++ b/clang/lib/Basic/NoSanitizeList.cpp
@@ -44,6 +44,13 @@ bool NoSanitizeList::containsFunction(SanitizerMask Mask,
 
 bool NoSanitizeList::containsFile(SanitizerMask Mask, StringRef FileName,
                                   StringRef Category) const {
+  unsigned nosanline = SSCL->inSectionBlame(Mask, "src", FileName, Category);
+  unsigned sanline = SSCL->inSectionBlame(Mask, "src", FileName, "sanitize");
+  // If we have two cases such as `src:a.cpp=sanitize` and `src:a.cpp`, the
+  // current entry override the previous entry.
+  if (nosanline > 0 && sanline > 0) {
+    return nosanline > sanline;
+  }
   return SSCL->inSection(Mask, "src", FileName, Category);
 }
 
diff --git a/clang/lib/Basic/SanitizerSpecialCaseList.cpp b/clang/lib/Basic/SanitizerSpecialCaseList.cpp
index 2dbf04c6ede97..7da36f3801453 100644
--- a/clang/lib/Basic/SanitizerSpecialCaseList.cpp
+++ b/clang/lib/Basic/SanitizerSpecialCaseList.cpp
@@ -63,3 +63,19 @@ bool SanitizerSpecialCaseList::inSection(SanitizerMask Mask, StringRef Prefix,
 
   return false;
 }
+
+unsigned SanitizerSpecialCaseList::inSectionBlame(SanitizerMask Mask,
+                                                  StringRef Prefix,
+                                                  StringRef Query,
+                                                  StringRef Category) const {
+  for (auto &S : SanitizerSections) {
+    if (S.Mask & Mask) {
+      unsigned lineNum =
+          SpecialCaseList::inSectionBlame(S.Entries, Prefix, Query, Category);
+      if (lineNum > 0) {
+        return lineNum;
+      }
+    }
+  }
+  return 0;
+}
diff --git a/clang/test/CodeGen/ubsan-src-ignorelist-category.test b/clang/test/CodeGen/ubsan-src-ignorelist-category.test
new file mode 100644
index 0000000000000..e0efd65df8652
--- /dev/null
+++ b/clang/test/CodeGen/ubsan-src-ignorelist-category.test
@@ -0,0 +1,37 @@
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow -fsanitize-ignorelist=%t/src.ignorelist -emit-llvm %t/test1.c -o - | FileCheck %s -check-prefix=CHECK-ALLOWLIST
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow -fsanitize-ignorelist=%t/src.ignorelist -emit-llvm %t/test2.c -o - | FileCheck %s -check-prefix=CHECK-IGNORELIST
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow -fsanitize-ignorelist=%t/src.ignorelist.contradict1 -emit-llvm %t/test1.c -o - | FileCheck %s -check-prefix=CHECK-ALLOWLISTOVERIDE1
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow -fsanitize-ignorelist=%t/src.ignorelist.contradict2 -emit-llvm %t/test1.c -o - | FileCheck %s -check-prefix=CHECK-ALLOWLISTOVERIDE2
+
+
+// Verify ubsan only emits checks for files in the allowlist
+
+//--- src.ignorelist
+src:*
+src:*/test1.c=sanitize
+
+//--- src.ignorelist.contradict1
+src:*
+src:*/test1.c=sanitize
+src:*/test1.c
+
+//--- src.ignorelist.contradict2
+src:*
+src:*/test1.c
+src:*/test1.c=sanitize
+
+//--- test1.c
+int add1(int a, int b) {
+// CHECK-ALLOWLIST: llvm.sadd.with.overflow.i32
+// CHECK-ALLOWLISTOVERIDE1-NOT: llvm.sadd.with.overflow.i32
+// CHECK-ALLOWLISTOVERIDE2: llvm.sadd.with.overflow.i32
+    return a+b;
+}
+
+//--- test2.c
+int add2(int a, int b) {
+// CHECK-IGNORELIST-NOT: llvm.sadd.with.overflow.i32
+    return a+b;
+}

Created using spr 1.3.6
Copy link

github-actions bot commented May 19, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Created using spr 1.3.6
@qinkunbao qinkunbao requested a review from vitalybuka May 19, 2025 15:15
@qinkunbao qinkunbao changed the title Implement src:*=sanitize for UBSan [UBSan] Implement src:*=sanitize for UBSan May 19, 2025
@qinkunbao qinkunbao requested a review from thurstond May 19, 2025 18:43
Created using spr 1.3.6
qinkunbao added 2 commits May 19, 2025 14:55
Created using spr 1.3.6

[skip ci]
Created using spr 1.3.6
@vitalybuka vitalybuka changed the base branch from main to users/qinkunbao/spr/demo-test-for-httpsgithubcomllvmllvm-projectpull140529 May 19, 2025 21:55
@vitalybuka vitalybuka changed the base branch from users/qinkunbao/spr/demo-test-for-httpsgithubcomllvmllvm-projectpull140529 to main May 19, 2025 21:59
qinkunbao added 2 commits May 19, 2025 15:13
Created using spr 1.3.6

[skip ci]
Created using spr 1.3.6
@vitalybuka vitalybuka changed the base branch from main to users/qinkunbao/spr/demo-test-for-httpsgithubcomllvmllvm-projectpull140529 May 19, 2025 22:14
vitalybuka pushed a commit that referenced this pull request May 19, 2025
Base automatically changed from users/qinkunbao/spr/demo-test-for-httpsgithubcomllvmllvm-projectpull140529 to main May 19, 2025 22:22
Created using spr 1.3.6
@@ -44,6 +44,12 @@ bool NoSanitizeList::containsFunction(SanitizerMask Mask,

bool NoSanitizeList::containsFile(SanitizerMask Mask, StringRef FileName,
StringRef Category) const {
unsigned NoSanLine = SSCL->inSectionBlame(Mask, "src", FileName, Category);
Copy link
Collaborator

@vitalybuka vitalybuka May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inSectionBlame returns a single int, but
SSCL can be created from multiple ignore list files, you can repeat -fsanitize-ignorelist=

I think you need to create a separate patch to change inSectionBlame,
to return pair<uint, uint> or new struct {uint fileIdx, uint lineNo}.

Note, there is llvm-project/llvm/unittests/Support/SpecialCaseListTest.cpp, you can test it there.

// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow -fsanitize-ignorelist=%t/src.ignorelist -emit-llvm %t/test2.c -o - | FileCheck %s --check-prefixes=CHECK2

// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow -fsanitize-ignorelist=%t/src.ignorelist.contradict1 -emit-llvm %t/test1.c -o - | FileCheck %s --check-prefixes=CHECK1,IGNORE
// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow -fsanitize-ignorelist=%t/src.ignorelist.contradict1 -emit-llvm %t/test2.c -o - | FileCheck %s --check-prefixes=CHECK2

// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow -fsanitize-ignorelist=%t/src.ignorelist.contradict2 -emit-llvm %t/test1.c -o - | FileCheck %s --check-prefixes=CHECK1,IGNORE
// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow -fsanitize-ignorelist=%t/src.ignorelist.contradict2 -emit-llvm %t/test1.c -o - | FileCheck %s --check-prefixes=CHECK1,SANITIZE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please in separate PR extend this test for:

  1. category - seems like your fix has no issues with that, but it would be nice to have test coverage
  2. multiple files (or maybe llvm-project/llvm/unittests/Support/SpecialCaseListTest.cpp will be enough)

@vitalybuka
Copy link
Collaborator

llvm-project/clang/docs/SanitizerSpecialCaseList.rst needs to be updated

And nice to mention in llvm-project/clang/docs/ReleaseNotes.rst

vitalybuka added a commit that referenced this pull request May 19, 2025
vitalybuka added a commit that referenced this pull request May 19, 2025
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 19, 2025
Created using spr 1.3.6
@vitalybuka
Copy link
Collaborator

Note: Use to download my updates spr patch 140529

@vitalybuka
Copy link
Collaborator

Also I don't mind we fix multifile and sections in followup patches.

But this one needs at least documentation update.

@qinkunbao
Copy link
Member Author

The PR is ready for review.

Update:

  1. Fix the failure for c.ignorelist.contradict3 c.ignorelist.contradict4
  2. Update the document.
  3. Add unit tests.
  4. Remove the second lookup in containsFile.

TODO:

  1. Fix multifile and sections (in a new PR).

@qinkunbao qinkunbao requested review from vitalybuka and thurstond May 20, 2025 18:17
Created using spr 1.3.6
@qinkunbao qinkunbao requested a review from vitalybuka May 20, 2025 21:13
@@ -103,6 +101,22 @@ supported sanitizers.
char c = toobig; // also not instrumented
}

Conflicting entries are resolved by the latest entry, which takes precedence.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multiple entries match the source, than the latest entry takes the precedence.

@@ -58,7 +58,7 @@ Usage with UndefinedBehaviorSanitizer
ability to adjust instrumentation based on type.

By default, supported sanitizers will have their instrumentation disabled for
types specified within an ignorelist.
entris specified within an ignorelist.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entries

``sanitize`` category will have their sanitizer instrumentation remain. If the
same type appears within or across ignorelists with different categories the
``sanitize`` category takes precedence -- regardless of order.
The ``=sanitize`` category is also supported. Any entries assigned to the
Copy link
Collaborator

@vitalybuka vitalybuka May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Any entries assigned to the..." -> "Match by entry =sanitize category enables sanitizer instrumentation, even if it was ignored by entries before."

@@ -63,6 +63,11 @@ Error SpecialCaseList::Matcher::insert(StringRef Pattern, unsigned LineNumber,
.moveInto(Pair.first))
return Err;
Pair.second = LineNumber;
} else {
Copy link
Collaborator

@vitalybuka vitalybuka May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move llvm/lib/Support/SpecialCaseList.cpp and llvm/unittests/Support/SpecialCaseListTest.cpp into a separate patch

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we need a patch to Matcher::Globs -> vector

As I see we don't use Map property, but it will mess up the order.

Copy link
Collaborator

@vitalybuka vitalybuka May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And "Pair.second = LineNumber;" is not a solution:
More generic case it will solve #140855 (please review and merge)

We need to walk Globs/Regex in reverse order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants