-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Memory tests #1108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory tests #1108
Conversation
|
Tests for auto dest = reinterpret_cast<uint32_t*>(dest_ptr);
auto src = reinterpret_cast<const uint16_t*>(src_ptr); |
ac0e0b1 to
15d28d2
Compare
DrChat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch - and appreciate the tests! Those are actually really useful!
I have a couple of requests for you and then we should be good to go.
src/xenia/base/memory.cc
Outdated
| size_t unaligned_words = (reinterpret_cast<uintptr_t>(src_ptr) & 0xF) / 2; | ||
| size_t unaligned_words = | ||
| (0x10 - (reinterpret_cast<uintptr_t>(src_ptr) & 0xF)) / sizeof(uint16_t); | ||
| for (; unaligned_words > 0 && i < count; unaligned_words--, i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! One thing I wanted to avoid was if src_ptr is actually aligned, this will come out with unaligned_words = 8 and needlessly copy 8 words using the slow route.
src/xenia/base/memory.cc
Outdated
| __m128i input = _mm_load_si128(reinterpret_cast<const __m128i*>(&src[i])); | ||
| __m128i output = _mm_shuffle_epi8(input, shufmask); | ||
| _mm_store_si128(reinterpret_cast<__m128i*>(&dest[i]), output); | ||
| _mm_storeu_si128(reinterpret_cast<__m128i*>(&dest[i]), output); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to use _mm_store_si128 on aligned pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue I ran into here is that there is no guarentee that dest_ptr is properly aligned.
While on Windows, there is no issue going from an aligned src_ptr to unaligned dest_ptr, on Linux this causes a segfault.
I could either:
- Assert alignment on
dest_ptr - Conditionally use
_mm_storeu_si128only on unaligneddest_ptr
Which is preferable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I just realized that as well. If both pointers aren't unaligned in the same manner, the src pointer will be aligned and the dest pointer just gets unaligned.
Do you have a backtrace for those crashes you were getting on linux? I think it's a good idea to assert alignment on both pointers, since this is an aligned routine after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding to that, src_ptr and dst_ptr should both be aligned (or unaligned by the same amount of bits) or else they only way to avoid a segfault on Linux is to run _mm_storeu_si128
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a backtrace: https://gist.github.com/bwrsandman/7d91bb6a1af79366e15161f992858a09
It happens on the call to _mm_load_si128
5c5a106 to
ba8df67
Compare
|
More restrictions for |
defa810 to
0778c73
Compare
DrChat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one more change - let's make the aligned routines require src/dest alignment, then this should be good to pull! :)
src/xenia/base/memory.cc
Outdated
| void copy_and_swap_16_aligned(void* dest_ptr, const void* src_ptr, | ||
| size_t count) { | ||
| assert_zero(reinterpret_cast<uintptr_t>(src_ptr) & 0x1); | ||
| assert_true((reinterpret_cast<uintptr_t>(src_ptr) & 0xF) == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this - let's just make sure src_ptr and dest_ptr are aligned to a 16-byte boundary (in the aligned routines).
Any cases you run into where this is called with unaligned pointers you can just change to use the *_unaligned routines.
It's really confusing to make sure both pointers are unaligned the same way, and very atypical for this to occur :)
src/xenia/base/memory.cc
Outdated
| size_t i = 0; | ||
| size_t unaligned_words = (reinterpret_cast<uintptr_t>(src_ptr) & 0xF) / 2; | ||
| size_t unaligned_words = | ||
| ((0x10 - (reinterpret_cast<uintptr_t>(src_ptr) & 0xF)) & 0xF) / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix! (though won't be needed with the above changes)
src/xenia/base/memory.cc
Outdated
| void copy_and_swap_32_aligned(void* dest_ptr, const void* src_ptr, | ||
| size_t count) { | ||
| assert_zero(reinterpret_cast<uintptr_t>(src_ptr) & 0x3); | ||
| assert_true((reinterpret_cast<uintptr_t>(src_ptr) & 0xF) == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto on the rest of these.
| } | ||
|
|
||
| TEST_CASE("copy_and_swap_16_aligned", "Copy and Swap") { | ||
| uint16_t a[8] = {0x1111}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the test cases with src/dest on the stack, I think the alignas(16) keyword would work.
0778c73 to
3d0afc3
Compare
Fix segmentation fault happening on linux when `src` or `dst` is not aligned to 16 bytes. Assert against use of `src_ptr` and `dest_ptr` which are not unaligned to 16 bits.
3d0afc3 to
24e7bb5
Compare
|
Now asserting 16bit alignment for each align copy. They all are tested for 16 bit offsets and cause no issues on Linux or Windows. |
|
Excellent! Have you encountered any cases in Xenia where we're using aligned routines when we shouldn't? |
|
No, as far as I can tell, there aren't. |
Add base tests for memory.
Running memory tests on Linux revealed inconsistent behaviour of
_mm_load_si128and_mm_store_si128where pointers not aligned with 128 bits address space would cause segmentation fault.A patch for this issue is included.