[release/7.0] Fix pthread_cond_wait race on macOS #82893
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of #82709 to release/7.0
/cc @janvorli
Customer Impact
Applications compiled with NativeAOT can hang intermittently at startup on macOS. This was occurring with our own crossgen2 in the CI.
The problem is caused by the implementation of
pthread_cond_broadcast
not adhering to the documentation in a race condition case. There is a tiny window of opportunity within which the relatedpthread_cond_wait
isn't woken by thepthread_cond_broadcast
when the latter is not invoked with the related mutex taken.Testing
Stress testing running of crossgen2 compiled with NativeAOT on macOS without any arguments. Without the fix, it hanged in tens or hundreds of thousands of iterations. With the fix, it was running ok for 5.5 million of iterations.
Risk
Very low, the change just moves
pthread_cond_broadcast
inside of a mutex and the doc for that method says it should not matter whether it is called inside of the mutex or not.