Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TSan] Ignore reads if not stored early #74575

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

felilxtomski
Copy link

@felilxtomski felilxtomski commented Dec 6, 2023

As documented in this paper https://publications.rwth-aachen.de/record/840022/files/840022.pdf, we could trace back a significant runtime overhead introduced by certain HPC/scientific applications to concurrent shared read accesses.
A typical scenario for such read accesses is matric-vector multiplication which is frequently used to solve linar equation system. Accidentally, similar operations are also present in different machine learning algorithms.
The performance issue typically arises when the code executes with more than 4 threads and gets worse when the threads are spread across different NUMA domains / sockets.

The proposed change is to skip logging of reads, of they are not logged early. This means that previous reads by the current threads will still be updated. Empty shadow cells will also be used for logging.
This change also avoids that previous writes get randomly overwritten by a read access.

@jprotze jprotze requested a review from dvyukov December 6, 2023 17:14
@jprotze
Copy link
Collaborator

jprotze commented Dec 6, 2023

Should this be a runtime option, so that is can actively be en/disabled?

@jprotze jprotze added the compiler-rt:tsan Thread sanitizer label Dec 6, 2023
jprotze pushed a commit to RWTH-HPC/llvm-project that referenced this pull request Dec 8, 2023
As documented in this paper https://publications.rwth-aachen.de/record/840022/files/840022.pdf,
we could trace back a significant runtime overhead introduced by certain HPC/scientific
applications to concurrent shared read accesses.
A typical scenario for such read accesses is matrix-vector multiplication which is
frequently used to solve linar equation system. Accidentally, similar operations are
also present in different machine learning algorithms.
The performance issue typically arises when the code executes with more than 4 threads
and gets worse when the threads are spread across different NUMA domains / sockets.

The proposed change is to skip logging of reads, of they are not logged early. This
means that previous reads by the current threads will still be updated. Empty shadow
cells will also be used for logging.
This change also avoids that previous writes get randomly overwritten by a read access.

Under review as llvm#74575
@dvyukov
Copy link
Collaborator

dvyukov commented Dec 12, 2023

If the 8-byte granule contains only 1 variable, or multiple uniformly accessed variables, then this may be OK as is, not sure completely.
But I am concerned about the case when the granule contains non-uniformly accessed variables, e.g. just a set of bool fields in a class. In this case, we may not track accessed to variables accessed later at all and systematically miss races. Am I missing something here?

@dvyukov
Copy link
Collaborator

dvyukov commented Dec 12, 2023

Should this be a runtime option, so that is can actively be en/disabled?

This looks like a property of region of memory, rather than property of a program. Think of e.g. ML model serving server that does both matmult and lots of generic server logic.

I am also concerned about performance overhead of a global flag check in handling of every memory access.

I am thinking is we could check that all existing accesses access the same byte range as the current read, then skipping the read looks more reasonable. This is still subject to some potential false negatives, which are extremely hard to discover later in real life (much worse than fixable false positives). So I thinking if we could also skip using some pseudo-random check. All of this needs benchmarking on real programs to estimate effects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler-rt:tsan Thread sanitizer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants