Status: Prototype on-hold
Documentation: Documentation/ktsan.txt (somewhat outdated)
Found bugs: here
Contacts: Dmitry Vyukov <@dvyukov>, Andrey Konovalov <@xairy>
Kernel Thread Sanitizer (KTSAN) is a happens-before dynamic data-race detector for the Linux kernel.
KTSAN adapts the data-race detection algorithm of the userspace ThreadSanitizer (version 2; don't confuse with version 1) to the Linux kernel.
Due to a significant complexity of the bug-detection algorithm when adapted to the Linux kernel and large CPU and RAM overheads, the project was put on-hold.
See Kernel Concurrency Sanitizer (KCSAN) for an alternative approach that uses watchpoints.
The latest KTSAN version based on 5.3 can be found in the ktsan branch. The original prototype based on 4.2 can be found under the tag ktsan_v4.2-with-fixes (also includes fixes for found data-races).
For more details about KTSAN, see:
-
KernelThreadSanitizer (KTSAN): a data race detector for the Linux kernel
-
Автоматический поиск состояний гонок в ядре ОС Linux [in Russian]
-
See this for unresolved issues in KTSAN.
-
Make some internal structures per CPU instead of per thread (VC cache, what else?). VCs themselves stay per thread.
-
Monitor some kernel thread scheduler events (thread execution started/stopped on CPU).
-
Disable interrupts during TSAN events (kernel scheduler events, synchronization events) (CLI, STI).
-
Use 4 bytes per slot: 1 for thread id, 2 for clock, 1 for everything else (flags, ...).
-
Different threads might have the same thread id (only 256 different values available).
-
When clock overflows it is possible to change thread id and connect "old" and "new" threads with a happens-before relation.
-
Find races in both kmalloc and vmalloc ranges.
-
Use two-level shadow memory mapping scheme for now.
-
Do a flush when we run out of clocks. The flush might work as follows. There is a global epoch variable which is increased during each flush. Each thread have a local epoch variable. When a thread is starting it will flush itself if the thread local epoch is less than the global one.