Skip to content

Commit 672d544

Browse files
felilxtomskijprotze
authored andcommitted
[TSan] Ignore reads if not stored early
As documented in this paper https://publications.rwth-aachen.de/record/840022/files/840022.pdf, we could trace back a significant runtime overhead introduced by certain HPC/scientific applications to concurrent shared read accesses. A typical scenario for such read accesses is matrix-vector multiplication which is frequently used to solve linar equation system. Accidentally, similar operations are also present in different machine learning algorithms. The performance issue typically arises when the code executes with more than 4 threads and gets worse when the threads are spread across different NUMA domains / sockets. The proposed change is to skip logging of reads, of they are not logged early. This means that previous reads by the current threads will still be updated. Empty shadow cells will also be used for logging. This change also avoids that previous writes get randomly overwritten by a read access. Under review as llvm#74575
1 parent f8575ff commit 672d544

File tree

1 file changed

+7
-1
lines changed

1 file changed

+7
-1
lines changed

compiler-rt/lib/tsan/rtl/tsan_rtl_access.cpp

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,8 @@ bool CheckRaces(ThreadState* thr, RawShadow* shadow_mem, Shadow cur,
224224
// the current access info, so we are done.
225225
if (LIKELY(stored))
226226
return false;
227+
if (LIKELY(typ & kAccessRead))
228+
return false;
227229
// Choose a random candidate slot and replace it.
228230
uptr index =
229231
atomic_load_relaxed(&thr->trace_pos) / sizeof(Event) % kShadowCnt;
@@ -345,8 +347,12 @@ STORE : {
345347
const m128 empty = _mm_cmpeq_epi32(shadow, zero);
346348
const int empty_mask = _mm_movemask_epi8(empty);
347349
index = __builtin_ffs(empty_mask);
348-
if (UNLIKELY(index == 0))
350+
if (UNLIKELY(index == 0)) {
351+
// If we reach here, we give up storing reads
352+
if (typ & kAccessRead)
353+
return false;
349354
index = (atomic_load_relaxed(&thr->trace_pos) / 2) % 16;
355+
}
350356
}
351357
StoreShadow(&shadow_mem[index / 4], cur.raw());
352358
// We could zero other slots determined by rewrite_mask.

0 commit comments

Comments
 (0)