-
Notifications
You must be signed in to change notification settings - Fork 1
[Sketch] Rtsan/intercept more apple locks sanitizer dense map #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Sketch] Rtsan/intercept more apple locks sanitizer dense map #40
Conversation
| void __rtsan::Context::RealtimePush() { realtime_depth_++; } | ||
| void __rtsan::Context::RealtimePush() { | ||
| __sanitizer::SpinMutexLock lock{&spin_mutex_}; | ||
| depths_[pthread_self()].realtime++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to think about handling the case where depths_ would need to be resized and (potentially) allocate more memory
| ExpectBypassed(true); | ||
| } | ||
|
|
||
| TEST_F(TestRtsanContext, IsProbablyThreadSafe) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume this test would almost certainly not be able to be kept. But you never know.
| return REAL(os_unfair_lock_lock)(lock); | ||
| } | ||
|
|
||
| INTERCEPTOR(void, os_unfair_lock_unlock, os_unfair_lock_t lock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the one that triggered infinite recursion previously
|
Alright, some updates from my investigation. We can see the BSD implementation of pthread_once most easily here: https://github.com/lattera/freebsd/blob/master/lib/libthr/thread/thr_once.c This is MOSTLY very straightforward. In general:
I think this backs up us doing some simple userspace locking and CAS looping to set up our data structure. As for get/set specific, in BSD at least, this is just a basic lock protected hash table. https://github.com/lattera/freebsd/blob/master/lib/libthr/thread/thr_spec.c What's possibly interesting is that this is a static array: struct pthread_key _thread_keytable[PTHREAD_KEYS_MAX]; With 256 slots. so TECHNICALLY to maintain what we are doing now, we could just match this and cap out at 256 and not worry about dynamic allocation. My takeaway from this is that this approach in this PR is completely viable, and we are basically replicating in userland what pthread does. We also have some subtle performance things in the existing implementation that have been lurking that I wasn't aware of:
As for the locking they do in pthread, they use THR_LOCK_ACQUIRE This is defined here: Which looks like it calls a recursive spinlock, https://github.com/lattera/freebsd/blob/master/lib/libthr/thread/thr_umtx.h ALL THAT IS TO SAY, I think we are doing a perfectly fine job here of approximating this in userspace with this approach |
| void __rtsan::Context::BypassPop() { bypass_depth_--; } | ||
| void __rtsan::Context::BypassPop() { | ||
| __sanitizer::SpinMutexLock lock{&spin_mutex_}; | ||
| pthread_t const thread_id = pthread_self(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to selves... could pthread_self() ever lock? In which case, could we ever deadlock if we get unlucky with the spin mutex above?
| private: | ||
| int realtime_depth_{0}; | ||
| int bypass_depth_{0}; | ||
| static constexpr int max_concurrent_threads_{4096}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self - is this truly a max, or is it more of an initial capacity? Ultimately we need to look at __sanitizer::DenseMap to see whether it allocates in operator[] if out of space. We should also have a quick think about when it must/can be locked, if we're going to support its resizing internally.
| } | ||
| }; | ||
|
|
||
| void MaybeCleanup(pthread_t thread_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: naming can be improved here
| depths_.erase(thread_id); | ||
| } | ||
|
|
||
| Context &__rtsan::GetContext() { return GetContextImpl(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: this is a pointless abstraction, let's bring the implementation of GetContextImpl inside this function
This a just a draft PR to show a possible approach to userspace TLS in RTSan, thereby avoiding recursion from
pthread_onceimplementations calling an intercepted function during__rtsan::Contextinitialisation.