Skip to content

Conversation

@davidtrevelyan
Copy link

This a just a draft PR to show a possible approach to userspace TLS in RTSan, thereby avoiding recursion from pthread_once implementations calling an intercepted function during __rtsan::Context initialisation.

@davidtrevelyan davidtrevelyan requested a review from cjappl October 30, 2025 16:32
void __rtsan::Context::RealtimePush() { realtime_depth_++; }
void __rtsan::Context::RealtimePush() {
__sanitizer::SpinMutexLock lock{&spin_mutex_};
depths_[pthread_self()].realtime++;
Copy link
Author

@davidtrevelyan davidtrevelyan Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to think about handling the case where depths_ would need to be resized and (potentially) allocate more memory

ExpectBypassed(true);
}

TEST_F(TestRtsanContext, IsProbablyThreadSafe) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume this test would almost certainly not be able to be kept. But you never know.

return REAL(os_unfair_lock_lock)(lock);
}

INTERCEPTOR(void, os_unfair_lock_unlock, os_unfair_lock_t lock) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the one that triggered infinite recursion previously

@cjappl
Copy link
Collaborator

cjappl commented Nov 7, 2025

Alright, some updates from my investigation. We can see the BSD implementation of pthread_once most easily here:

https://github.com/lattera/freebsd/blob/master/lib/libthr/thread/thr_once.c

This is MOSTLY very straightforward. In general:

  • Make sure the thread is initialized (of course this will be handled for us)
  • Do a CAS loop with some very careful mutexes to make sure something is only done one time
  • Deal with cancellation (complex, not sure if we need to do this...)
  • Set state to ONCE_DONE

I think this backs up us doing some simple userspace locking and CAS looping to set up our data structure.

As for get/set specific, in BSD at least, this is just a basic lock protected hash table.

https://github.com/lattera/freebsd/blob/master/lib/libthr/thread/thr_spec.c

What's possibly interesting is that this is a static array:

struct pthread_key _thread_keytable[PTHREAD_KEYS_MAX];

With 256 slots. so TECHNICALLY to maintain what we are doing now, we could just match this and cap out at 256 and not worry about dynamic allocation.

My takeaway from this is that this approach in this PR is completely viable, and we are basically replicating in userland what pthread does. We also have some subtle performance things in the existing implementation that have been lurking that I wasn't aware of:

  • Possible CAS loop when initializing the context keys
  • Always locking, unlocking under the hood with any system call.

As for the locking they do in pthread, they use THR_LOCK_ACQUIRE

https://github.com/lattera/freebsd/blob/401a161083850a9a4ce916f37520c084cff1543b/lib/libthr/thread/thr_spec.c#L63

This is defined here:

https://github.com/lattera/freebsd/blob/401a161083850a9a4ce916f37520c084cff1543b/lib/libthr/thread/thr_private.h#L565-L569

Which looks like it calls a recursive spinlock,

https://github.com/lattera/freebsd/blob/master/lib/libthr/thread/thr_umtx.h

ALL THAT IS TO SAY, I think we are doing a perfectly fine job here of approximating this in userspace with this approach

void __rtsan::Context::BypassPop() { bypass_depth_--; }
void __rtsan::Context::BypassPop() {
__sanitizer::SpinMutexLock lock{&spin_mutex_};
pthread_t const thread_id = pthread_self();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to selves... could pthread_self() ever lock? In which case, could we ever deadlock if we get unlucky with the spin mutex above?

private:
int realtime_depth_{0};
int bypass_depth_{0};
static constexpr int max_concurrent_threads_{4096};
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self - is this truly a max, or is it more of an initial capacity? Ultimately we need to look at __sanitizer::DenseMap to see whether it allocates in operator[] if out of space. We should also have a quick think about when it must/can be locked, if we're going to support its resizing internally.

}
};

void MaybeCleanup(pthread_t thread_id);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: naming can be improved here

depths_.erase(thread_id);
}

Context &__rtsan::GetContext() { return GetContextImpl(); }
Copy link
Author

@davidtrevelyan davidtrevelyan Nov 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: this is a pointless abstraction, let's bring the implementation of GetContextImpl inside this function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants