-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Threading unit tests and linux threading implementation #1317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hot damn this is amazing work! |
|
Note to reviewers: If you check out the branch and run into issues, please send me a coredump if one is produced. Let me know what distro you run, which kernel version, the compiler with version, machine specs (cpu, number of cores, memory, load) and reproduceability. Having all this information is incredibly useful to find and fix threading bugs. Any comment is appreciated and I will try to make requested changes within a week. |
|
First of all. This is excellent work. This has been a long-standing missing component of the emulator. I cloned your repository and then rebased it against the master for the latest fixes. I implemented the filesystem headers required and tried running it. I had some issues with it crashing and ran it under valgrind a few times and noticed this. To be clear, the crash wasn't caused by this output. The vulkan backend was crashing. Given my troubles with Vulkan, could you verify this behavior and fix if necessary? I have also added the three functions that were required to get the filesystem working. |
|
Consider this high on my priority list, I'll review it as soon as I am able. Great work! |
|
@chris-hawley I plan to tackle vulkan on linux next. Other than the filesystem stubs which are easy to fix with hard-coded values, I identified two issues :
I am pretty familiar with Vulkan and graphics so I plan on taking a more in-depth look at the implementation and following up with fixes in another pull request. As for:
I haven't seen this message before, do you have a branch I can check out? |
|
I'll create a branch called linux-testing in my xenia fork. |
|
Great. By the way, you can see my Vulkan work and changes on this branch. Subject to change. |
|
branch is all set to go |
|
@chris-hawley The is due to the xcb connection/native_platform_handle not being set (0e5af7d) and the crash happens within either gtk or the graphics driver (in my case nvidia). If you cherry-pick a9c2053 and 0e5af7d from my branch, you won't get those issues. Anyways, vulkan crashing is a separate issue which will be resolved with another PR. |
|
that fixed the vulkan crashing. |
|
Also, we have a discord if you would rather do this back and forth there. discord |
|
|
d4970e2 to
ea2edef
Compare
|
I fixed some memory leaks that were pointed out by valgrind. |
ea2edef to
d866588
Compare
DrChat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preliminary review with some cosmetic requests.
This is really cool, and using conditional variables is a very interesting solution! Good work!
d866588 to
27be7c8
Compare
|
Thanks for the review @DrChat. I have made the requested changes and rebased on master. |
66b10e9 to
e2fe990
Compare
|
So, I didn't merge this at the time because of the complex nature of this pull request - it's modifying a delicate part of the emulator, which can introduce very subtle bugs. The changes here look pretty well thought out though, and I'm leaning towards merging this. One thing though: What about that deadlock on Travis from the unit tests? |
// TODO(bwrsandman): due_times of under 1ms deadlock under travisThis was something which only happened on Travis. I wasn't able to get to the bottom of this since none of my machines could replicate. If I had to guess it's the nature of the container and the kernel used is somehow interpreting less than 1 ms as infinite. |
e4b89bc to
0a7294b
Compare
|
Rebased to latest master |
Test logical_processor_count() 3 times to test static return value stays correct. Run EnableAffinityConfiguration(). No asserts possible. Test setting thread id, test using uint32_t max to reset. Test setting thread name. No asserts possible. Test running MaybeYield(). No obvious more complex test case. Test running SyncMemory(). No obvious more complex test case.
Add Sleep Test for 50ms. Fix Sleep under linux that was using microseconds as nanoseconds. Factor timespec creation to template function using div/mod and nanoseconds from duration cast.
Implement HighResolutionTimer for Posix by using native timers. Callbacks are triggered with realtime interrupts if they are supported. Create an enum to track user-defined interrupts as well as an initializer and handler to register these interrupts per thread. Add test cases for timers for both single and multiple. Fix Sleep function to continue sleeping if interrupted by system. Add local .gdbinit to ignore signal 34 which is used by high res timer
Remove atomic boolean in fence. Variable signaled_ is already protected by mutex. Remove wait loop with single predicate wait protected with mutex. Add Fence Signal and Wait tests Test signaling without waiting. Test signaling before waiting. Test signaling twice before waiting. Test synchronizing threads with fence. Few REQUIRES were used to test as there are no return codes. A failing test may hang indefinitely or cause a segfault which would still register as a fail.
Linux: Remove copy and destroy call in make_unique invokation which closes handles on all events. Testing: Add Wait test for Events set and unset.
Remove file-descriptor specific wait implementation to PosixFdHandle class which breaks on waits of non-fd handles. Replace with PosixConditionHandle and extend to support auto reset and initial values. Simplify mutex and conditional variable use with stdlib versions which wrap these primitives but provide better C++ interface. Test Event and Reset
Make conditional_variable and mutex static and create generalisation of Wait for vector of handles. Use std::any for waitany and std::all for waitall
cd681c3 to
3363dd5
Compare
Add PosixConditionBase as base class for Waitables to use common primitives mutex and conditional variable Add abstract signaled() and post_execution() to use single WaitMultiple implementation.
Test acquiring and releasing semaphores on same and on different threads. Test previous_count values. Test WaitAll and WaitAny. Add tests for invalid semaphore creation parameters but disactivated as they do not pass on any platform. These should be enabled and the implementations fixed to match documentation.
Keep track of recursive locks with owner and count of locks. Only allow recursive locks from same thread and increment count. Only allow first locks from when count is zero. Test acquiring and releasing mutant on same and on different threads. Test Release return values. Test WaitAll and WaitAny.
Test Manual Reset and Synchronization timers single threaded. Test Cancelling timers. Test WaitMultiple. Ignore real-time event 35 in .gdbinit which is used to signal timer. Callbacks don't seem to be called so testing them is difficult.
Add Basic Tests on Threads
Use real-time event interrupt to communicate suspend in timely manner. Use conditional_variable to implement suspend wait and resume trigger. Ignore real-time event 36 in .gdbinit which is used to signal suspend. Test suspending threads.
Add thread local bool for alertable state. Use real-time event interrupt to run callback. Fix sleep duration from miliseconds (microseconds / 1000) to seconds in sleep command. Add note for future implementation. Ignore real-time event 37 in .gdbinit which is used to signal callback. Test AlertableSleep Test Thread QueueUserCallback. TODO: Test alerted wait result when using IO functions.
Implement TLSHandle with pthread_key_t. Test Alloc, Free, Get and Set.
Add Signal abstract function to handles. Test SignalAndWait.
Give other threads access to initially suspended threads by signalling conditional variable before waiting for state to be changed again.
Shorten names to 16. Rename Win32 to Windowing. Shorten GraphicsSystem thread names due to 16 length limit of pthread. Without this change, both show up as GraphicsSystem. Remove redundant "Worker" and "Thread" from names. Remove redundant thread handle from thread name.
Add suspend count to thread implementation. Increment suspend count on suspend and decrement on resume. Wait on suspend count to be decremented to 0. Return suspend count on suspend and on resume before incr/decr. Fix naming of resume suspend count to make clear that suspend count is before incr/decr. Add test.
Move wait implementation to not use native_handle. Implement native_handle for each primitive using posix natives.
3363dd5 to
7f13543
Compare
Current Status on master
On Linux, the Xenia app and many of the test ui tests do not run correctly due to waits on
WaitHandlesflowing through and returning early.Fix status of this PR
I have been working on this PR for about a year and it now has reached a point where I consider it done and ready for review.
This PR implements most of
threading_posix.ccby TDD to make sure behaviour is the same as win32.This is a follow-up to (#1108) in order to get xenia to actually run on Linux. Following this, a few GTK-Vulkan fixes need to be done as well as implementing POSIX file system stubs. With those, a GTK window actually appears and responds to events.
Fix description
Borrowed commit from #1076 for Linux debug symbols.
Fixed occurrences of micro/millisecond confusion on timers.
Posix threads do not support suspension in the same way as windows threads. Real-time interrupts used to give this functionality and to implement thread callback when in alertable state.
More real-time interrupts used to signal timers.
Since there are more than two signals used,
SIGUSR1andSIGUSR2are not used. Signals are allocated betweenSIGRTMINandSIGRTMAXwhich are dynamic and can change per machine or kernel version but should start fromSIGUSR2 + 1.Real-time interrupts handlers are installed in a lazy-load fashion. This could be moved to an initialize phase to avoid lazy-load, but no obvious location for this was found.
Waiting is implemented using static condition variables. To do this, each
WaitHandleis also an instance ofPosixConditionBase, the latter which has the same functionality of wait and wait multiple. This implementation was more successful than using the existingPosixFdHandlewhich was then removed in order to use a more generalizable super class which gave the added bonus of waiting on multiple types of primitives at the same time since they all use the same conditional variables.Alertable state is set as a thread local static variable.
I ran these tests multiple times concurrently in Windows and Linux on my machine, my laptop and on travis. I will also test it on my LattePanda Alpha SBC to test a lower-end machine. Throughout the development, there have been fails which are not always reproducible and I brought this number down with fixes and hopefully there aren't any rare bugs left.
Things left to check later
Suspending an already suspended thread was not tested nor implemented and left for a later time.Implementing affinity and priority was attempted but in the end abandoned due to Linux's technicalities.
I could be wrong but there doesn't seem to be any way to set a thread's priority which is higher than the main thread's without running as a super user so we could only set it to Normal or Lower.
Sleeping for times lower than 1ms works on my setups but causes deadlocks on Travis. This is no problem for the unit tests. The times were just increased by factors of 10 to 100. This, however, means that tests which run in about 5s could potentially run in < 1s if that deadlock issue is fixed.
There will always be fundamental differences between Windows and Linux which could be tested against. My hope is that the existence of these unit tests can make pin pointing these differences more easy.