Skip to content

[TySan] Add initial documentation for Type Sanitizer #123595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 28, 2025

Conversation

gbMattN
Copy link
Contributor

@gbMattN gbMattN commented Jan 20, 2025

Add some initial documentation for type sanitizer [From issue #122522]

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jan 20, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 20, 2025

@llvm/pr-subscribers-clang

Author: None (gbMattN)

Changes

Add some initial documentation for type sanitizer [From issue #122522]


Full diff: https://github.com/llvm/llvm-project/pull/123595.diff

1 Files Affected:

  • (added) clang/docs/TypeSanitizer.rst (+152)
diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst
new file mode 100644
index 00000000000000..ceb2fca37df904
--- /dev/null
+++ b/clang/docs/TypeSanitizer.rst
@@ -0,0 +1,152 @@
+================
+TypeSanitizer
+================
+
+.. contents::
+   :local:
+
+Introduction
+============
+
+TypeSanitizer is a detector for strict type aliasing violations. It consists of a compiler
+instrumentation module and a run-time library. The tool detects violations such as the use 
+of an illegally cast pointer, or misuse of a union.
+
+The violations TypeSanitizer catches may cause the compiler to emit incorrect code.
+
+Typical slowdown introduced by TypeSanitizer is about **4x** [[CHECK THIS]]. Typical memory overhead introduced by TypeSanitizer is about **9x**. 
+
+How to build
+============
+
+Build LLVM/Clang with `CMake <https://llvm.org/docs/CMake.html>`_ and enable
+the ``compiler-rt`` runtime. An example CMake configuration that will allow
+for the use/testing of TypeSanitizer:
+
+.. code-block:: console
+
+   $ cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_ENABLE_RUNTIMES="compiler-rt" <path to source>/llvm
+
+Usage
+=====
+
+Compile and link your program with ``-fsanitize=type`` flag.  The
+TypeSanitizer run-time library should be linked to the final executable, so
+make sure to use ``clang`` (not ``ld``) for the final link step. To
+get a reasonable performance add ``-O1`` or higher.
+TypeSanitizer by default doesn't print the full stack trace on error messages. Use ``TYSAN_OPTIONS=print_stacktrace=1`` 
+to print the full trace. To get nicer stack traces in error messages add ``-fno-omit-frame-pointer`` and 
+``-g``.  To get perfect stack traces you may need to disable inlining (just use ``-O1``) and tail call elimination 
+(``-fno-optimize-sibling-calls``).
+
+.. code-block:: console
+
+    % cat example_AliasViolation.c
+    int main(int argc, char **argv) {
+      int x = 100;
+      float *y = (float*)&x;
+      *y += 2.0f;          // Strict aliasing violation
+      return 0;
+    }
+
+    # Compile and link
+    % clang++ -g -fsanitize=type example_AliasViolation.cc
+
+If a strict aliasing violation is detected, the program will print an error message to stderr. 
+The program won't terminate, which will allow you to detect many strict aliasing violations in one 
+run.
+
+.. code-block:: console
+    % ./a.out
+    ==1375532==ERROR: TypeSanitizer: type-aliasing-violation on address 0x7ffeebf1a72c (pc 0x5b3b1145ff41 bp 0x7ffeebf1a660 sp 0x7ffeebf19e08 tid 1375532)
+    READ of size 4 at 0x7ffeebf1a72c with type float accesses an existing object of type int
+        #0 0x5b3b1145ff40 in main example_AliasViolation.c:4:10
+
+    ==1375532==ERROR: TypeSanitizer: type-aliasing-violation on address 0x7ffeebf1a72c (pc 0x5b3b1146008a bp 0x7ffeebf1a660 sp 0x7ffeebf19e08 tid 1375532)
+    WRITE of size 4 at 0x7ffeebf1a72c with type float accesses an existing object of type int
+        #0 0x5b3b11460089 in main example_AliasViolation.c:4:10
+
+Error terminology
+------------------
+
+There are some terms that may appear in TypeSanitizer errors that are derived from 
+`TBAA Metadata <https://llvm.org/docs/LangRef.html#tbaa-metadata>`. This section hopes to provide a 
+brief dictionary of these terms.
+
+* ``omnipotent char``: This is a special type which can alias with anything. Its name comes from the C/C++ 
+  type ``char``.
+* ``type p[x]``: Sometimes a program could generate distinct TBAA metadata that resolve to the same name. 
+  To make them unique, they have the character 'p' and a number prepended to their name.
+
+These terms are a result of non-user-facing processes, and not always self-explanatory. There is some 
+interest in changing TypeSanitizer in the future to translate these terms before printing them to users.
+
+Sanitizer features
+==================
+
+``__has_feature(type_sanitizer)``
+------------------------------------
+
+In some cases one may need to execute different code depending on whether
+TypeSanitizer is enabled.
+:ref:`\_\_has\_feature <langext-__has_feature-__has_extension>` can be used for
+this purpose.
+
+.. code-block:: c
+
+    #if defined(__has_feature)
+    #  if __has_feature(type_sanitizer)
+    // code that builds only under TypeSanitizer
+    #  endif
+    #endif
+
+``__attribute__((no_sanitize("type")))``
+-----------------------------------------------
+
+Some code you may not want to be instrumented by TypeSanitizer.  One may use the
+function attribute ``no_sanitize("type")`` to disable instrumenting type aliasing. 
+It is possible, depending on what happens in non-instrumented code, that instrumented code 
+emits false-positives/ false-negatives. This attribute may not be supported by other 
+compilers, so we suggest to use it together with ``__has_feature(type_sanitizer)``.
+
+``__attribute__((disable_sanitizer_instrumentation))``
+--------------------------------------------------------
+
+The ``disable_sanitizer_instrumentation`` attribute can be applied to functions
+to prevent all kinds of instrumentation. As a result, it may introduce false
+positives and incorrect stack traces. Therefore, it should be used with care,
+and only if absolutely required; for example for certain code that cannot
+tolerate any instrumentation and resulting side-effects. This attribute
+overrides ``no_sanitize("type")``.
+
+Ignorelist
+----------
+
+TypeSanitizer supports ``src`` and ``fun`` entity types in
+:doc:`SanitizerSpecialCaseList`, that can be used to suppress aliasing 
+violation reports in the specified source files or functions. Like 
+with other methods of ignoring instrumentation, this can result in false 
+positives/ false-negatives.
+
+Limitations
+-----------
+
+* TypeSanitizer uses more real memory than a native run. It uses 8 bytes of
+  shadow memory for each byte of user memory.
+* There are transformation passes which run before TypeSanitizer. If these 
+  passes optimize out an aliasing violation, TypeSanitizer cannot catch it.
+* Currently, all instrumentation is inlined. This can result in a **15x** 
+  (on average) increase in generated file size, and **3x** to **7x** increase 
+  in compile time. In some documented cases this can cause the compiler to hang.
+  There are plans to improve this in the future.
+* Codebases that use unions and struct-initialized variables can see incorrect 
+  results, as TypeSanitizer doesn't yet instrument these reliably.
+
+Current Status
+--------------
+
+TypeSanitizer is brand new, and still in development. There are some known 
+issues, especially in areas where clang doesn't generate valid TBAA metadata. 
+
+We are actively working on enhancing the tool --- stay tuned.  Any help, 
+issues, pull requests, ideas, is more than welcome.

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this up!


TypeSanitizer is a detector for strict type aliasing violations. It consists of a compiler
instrumentation module and a run-time library. The tool detects violations such as the use
of an illegally cast pointer, or misuse of a union.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should say something about accessing memory with a different type than the dynamic type of the object or something like that?


The violations TypeSanitizer catches may cause the compiler to emit incorrect code.

Typical slowdown introduced by TypeSanitizer is about **4x** [[CHECK THIS]]. Typical memory overhead introduced by TypeSanitizer is about **9x**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to just say that TypeSanitizer is still experimental and currently can have a very large runtime, memory and code size overhead :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also seems to have a sizable compile-time overhead.


* ``omnipotent char``: This is a special type which can alias with anything. Its name comes from the C/C++
type ``char``.
* ``type p[x]``: Sometimes a program could generate distinct TBAA metadata that resolve to the same name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p2 int means int **, the number indicates the number of indirections (not sure what the best terminology would be here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! My mistake, will change that

Comment on lines 82 to 83
These terms are a result of non-user-facing processes, and not always self-explanatory. There is some
interest in changing TypeSanitizer in the future to translate these terms before printing them to users.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth just saying that it is still experimental and the user-facing error messages should be improved in the future to remove LLVM IR specific references

--------------

TypeSanitizer is brand new, and still in development. There are some known
issues, especially in areas where clang doesn't generate valid TBAA metadata.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
issues, especially in areas where clang doesn't generate valid TBAA metadata.
issues, especially in areas where Clang doesn't generate valid TBAA metadata.

I am not sure saying Clang doesn't generate valid TBAA metadata is correct here. It may not generate TBAA metadata that TySan suspects in some cases, but what Clang emits should be conservatively correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha right, its not that it isn't valid, it just isn't descriptive/extensive enough for the sanitizer. Will change that

@gbMattN gbMattN requested a review from fhahn January 20, 2025 17:05
Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The language here is a little bit staccato for my taste, but its probably a good enough start. A few suggestions, else I'm happy whenever @fhahn and the others are.

============

TypeSanitizer is a detector for strict type aliasing violations. It consists of a compiler
instrumentation module and a run-time library. The tool detects violations where you access
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

violations of what? Would love another sentence or two here just explaining what it is detecting better. Something like (please don't use unless it is correct!):

"This tool detects violations of the strict-aliasing rule, which prohibits access of a memory location as a type that is different from the dynamic type of the object at that location."

Or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more text based off Florian's original pitch

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on these docs!

Usage
=====

Compile and link your program with ``-fsanitize=type`` flag. The
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the type sanitizer be enabled along with other sanitizers, or is it mutually exclusive with some? (If it's mutually exclusive with some, then we should document which ones, otherwise I think the docs are fine as-is.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a check, I think its mutually exclusive with all of them currently. I will put this in the Limitations section

goussepi pushed a commit that referenced this pull request Jan 24, 2025
TySan supports some preprocessor checks and ignorelists, but they are
currently untested. This PR adds some tests to make sure they all work.

@fhahn @AaronBallman, this is based off the discussion in the
documentation PR [#123595]
Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming @fhahn is happy as well. Thank you for the docs!

@vitalybuka
Copy link
Collaborator

Maybe paragraph or link to a doc about how it works can be useful?
Basic understanding of algorithm may help users to avoid false expectations.

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

I think it would be great to add a paragraph or link to a doc about how it works as suggested by @vitalybuka but this could also be done as follow-up

@gbMattN
Copy link
Contributor Author

gbMattN commented Jan 28, 2025

I could add a link to your initial pull request? You have a decent writeup there on how it works.

@fhahn
Copy link
Contributor

fhahn commented Jan 28, 2025

I could add a link to your initial pull request? You have a decent writeup there on how it works.

It's probably better to add a paragraph instead of linking to a PR. Of course the description can be taken from the PR

@gbMattN gbMattN force-pushed the users/nagym/tysan-initial-documentation branch from 33ed4a2 to a27e1b7 Compare January 28, 2025 16:18
@gbMattN gbMattN force-pushed the users/nagym/tysan-initial-documentation branch from a27e1b7 to b9e542b Compare January 28, 2025 16:30
@goussepi goussepi merged commit 822954b into llvm:main Jan 28, 2025
6 of 8 checks passed
The TypeSanitizer Algorithm
===========================
For each TBAA type-access descriptor, encoded in LLVM IR using TBAA Metadata, the instrumentation
pass generates descriptor tales. Thus there is a unique pointer to each type (and access descriptor).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo: s/tales/tables/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants