Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TySan] Fix struct access with different bases #108385

Open
wants to merge 1 commit into
base: users/fhahn/tysan-a-type-sanitizer-runtime-library
Choose a base branch
from

Conversation

gbMattN
Copy link
Contributor

@gbMattN gbMattN commented Sep 12, 2024

Fixes issue #105960

If a member in a struct is also a struct, accessing a member partway through this inner struct currently causes a false positive. This is because when checking aliasing, the access offset is seen as greater than the starting offset of the inner struct, so the loop continues one iteration, and believes we are accessing the member after the inner struct.

The next member's offset is greater than the offset we are looking for, so when we subtract the next member's offset from what we are looking for, the offset underflows.

To fix this, we check if the member we think we are accessing has a greater offset than the offset we are looking for. If so, we take a step back. We cannot do this in the loop, since the loop does not check the final member. This means the penultimate member would still cause false positives.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 12, 2024

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: None (gbMattN)

Changes

Fixes issue #105960

If a member in a struct is also a struct, accessing a member partway through this inner struct currently causes a false positive. This is because when checking aliasing, the access offset is seen as greater than the starting offset of the inner struct, so the loop continues one iteration, and believes we are accessing the member after the inner struct.

The next member's offset is greater than the offset we are looking for, so when we subtract the next member's offset from what we are looking for, the offset underflows.

To fix this, we check if the member we think we are accessing has a greater offset than the offset we are looking for. If so, we take a step back. We cannot do this in the loop, since the loop does not check the final member. This means the penultimate member would still cause false positives.


Full diff: https://github.com/llvm/llvm-project/pull/108385.diff

2 Files Affected:

  • (modified) compiler-rt/lib/tysan/tysan.cpp (+4)
  • (added) compiler-rt/test/tysan/struct-offset-different-base.cpp (+31)
diff --git a/compiler-rt/lib/tysan/tysan.cpp b/compiler-rt/lib/tysan/tysan.cpp
index f627851d049e6a..f2cb6faddf45ac 100644
--- a/compiler-rt/lib/tysan/tysan.cpp
+++ b/compiler-rt/lib/tysan/tysan.cpp
@@ -128,6 +128,10 @@ static bool isAliasingLegalUp(tysan_type_descriptor *TDA,
           break;
       }
 
+      //You can't have negative offset, you must be partially inside the last type
+      if (TDA->Struct.Members[Idx].Offset > OffsetA)
+        Idx -=1;
+        
       OffsetA -= TDA->Struct.Members[Idx].Offset;
       TDA = TDA->Struct.Members[Idx].Type;
     } else {
diff --git a/compiler-rt/test/tysan/struct-offset-different-base.cpp b/compiler-rt/test/tysan/struct-offset-different-base.cpp
new file mode 100644
index 00000000000000..c1ef5f8669c280
--- /dev/null
+++ b/compiler-rt/test/tysan/struct-offset-different-base.cpp
@@ -0,0 +1,31 @@
+// RUN: %clangxx_tysan -O0 %s -o %t && %run %t >%t.out 2>&1
+// RUN: FileCheck %s < %t.out
+
+#include <stdio.h>
+
+struct inner {
+	char buffer;
+	int i;
+};
+
+void init_inner(inner *list) {
+	list->i = 0;
+}
+
+struct outer {
+	inner foo;
+    char buffer;
+};
+
+int main(void) {
+	outer *l = new outer();
+
+    init_inner(&l->foo);
+    
+    int access_offsets_with_different_base = l->foo.i;
+	printf("%d\n", access_offsets_with_different_base);
+    
+	return 0;
+}
+
+// CHECK-NOT: ERROR: TypeSanitizer: type-aliasing-violation

@gbMattN gbMattN force-pushed the users/nagym/fix-member-access-from-different-bases branch from 2dffe46 to d312bd9 Compare September 12, 2024 13:24
@gbMattN
Copy link
Contributor Author

gbMattN commented Sep 12, 2024

(Manually pinging potential reviewers) @tavianator @fhahn

@gbMattN gbMattN force-pushed the users/nagym/fix-member-access-from-different-bases branch 2 times, most recently from 91f560d to 65f1e1f Compare September 12, 2024 13:33
Copy link
Contributor

@tavianator tavianator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes my reduced testcase but not the unreduced one. I'll try to make a new reduction.

Comment on lines 133 to 134
Idx -=1;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Idx -=1;
Idx -= 1;

@tavianator
Copy link
Contributor

Here's the new testcase. Not sure if this bug is related or not. It has to do with memcpy(); if you replace the call with the commented-out line above it, it works.

struct node {
	struct node *next;
};

struct list {
	struct node *head, **tail;
};

int main(void) {
	struct list *list = __builtin_malloc(sizeof(*list));
	list->head = 0;
	list->tail = &list->head;

	struct node *node = __builtin_malloc(sizeof(*node));
	node->next = 0;

	*list->tail = node;
	list->tail = &node->next;

	while (list->head) {
		struct node *node = list->head;
		// list->head = node->next;
		__builtin_memcpy(&list->head, &node->next, sizeof(list->head));
		node->next = 0;
	}

	return 0;
}
tavianator@tachyon $ ~/code/llvm/llvm-project/build/bin/clang -Wall -g -fsanitize=type foo.c -o foo
tavianator@tachyon $ ./foo
==5885==ERROR: TypeSanitizer: type-aliasing-violation on address 0x55af02a8c2a0 (pc 0x55aef600fb36 bp 0x7ffcbf810cf0 sp 0x7ffcbf810c90 tid 5885)
READ of size 8 at 0x55af02a8c2a0 with type any pointer (in list at offset 0) accesses an existing object of type any pointer (in node at offset 0)
    #0 0x55aef600fb35 in main /home/tavianator/code/bfs/foo.c:20:15

@tavianator
Copy link
Contributor

I guess the bug there is that the memcpy() interceptor literally copies the dynamic type from node->next to list->head. Then list->head is accessed but tysan thinks the memory has type struct node::next which doesn't match.

@gbMattN
Copy link
Contributor Author

gbMattN commented Sep 13, 2024

Documenting this here as its part of the same issue: the following reproducer can be made (see the pull request above)

#include <string.h>
#include <stdlib.h>

struct inner {
	struct inner *n;
};

struct outer {
	struct inner *i;
};

struct outer* getOuter(){
	struct outer *out = malloc(sizeof(struct outer));
	struct inner *in = malloc(sizeof(struct inner));

	in->n = 0;
	out->i = in;

	return out;
}

int main(void) {
	
	struct outer* out = getOuter();

	while (out->i) {
		//out->i = out->i->n;
		memcpy(&out->i, &out->i->n, sizeof(out->i));
	}

	return 0;
}

If memcpy is replaced by the commented code, no error is detected. Both code runs the same checking function, but they are inserted at different places in the Transformation pass. This implies that the wrong checks are being inserted for memcpy calls.
The failing check is checking any pointer (in outer at offset 0) against any pointer (in inner at offset 0), but due to how the outer is set up, its member is recorded simply as "any pointer", with no reference to inner anymore. The commented out path doesn't call tysan_check, meaning that their actual TDs should be an exact match.

@tavianator
Copy link
Contributor

I have consulted with an expert in the strict aliasing rules and we came to the horrifying (to me) conclusion that TySan is actually correct in this case, at least according to the C standard.

@gbMattN
Copy link
Contributor Author

gbMattN commented Sep 16, 2024

! Oh wow! ... Should the commented out line cause a type violation too? Or is everything in the case above (commented line fine, memcpy not) really the correct behaviour?

@tavianator
Copy link
Contributor

tavianator commented Sep 16, 2024

! Oh wow! ... Should the commented out line cause a type violation too?

No, out->i = out->i->n; is fine because the type of the expression out->i->n is just struct inner *, so that's the type that will be given to the storage for out->i. EDIT: I think that was inaccurate--for an assignment lvalue = rvalue it is the type of the lvalue that matters. So out->i = ... will set the type to struct outer::i. (Because out is dynamically allocated, it has no declared type and writes will set the effective type.)

But memcpy(&out->i, &out->i->n, sizeof(out->i)) is specified to exactly copy the effective type from the source to the destination (again because out is dynamically allocated). The type that gets copied includes knowledge of exactly which struct field it is (struct inner::n), and TySan is faithfully copying that over. The later access with type struct outer::i doesn't match.

There are more details in this paper, for example: https://web.archive.org/web/20190219170809/https://trust-in-soft.com/wp-content/uploads/2017/01/vmcai.pdf

@gbMattN gbMattN force-pushed the users/nagym/fix-member-access-from-different-bases branch from 65f1e1f to 4bde5de Compare September 23, 2024 10:28
tavianator added a commit to tavianator/bfs that referenced this pull request Sep 30, 2024
This avoids what might be a strict aliasing violation in some models.

Link: llvm/llvm-project#108385 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants