Skip to content

Conversation

@bwrsandman
Copy link
Contributor

Add base tests for memory.
Running memory tests on Linux revealed inconsistent behaviour of _mm_load_si128 and _mm_store_si128 where pointers not aligned with 128 bits address space would cause segmentation fault.
A patch for this issue is included.

@bwrsandman
Copy link
Contributor Author

bwrsandman commented Mar 10, 2018

Tests for copy_and_swap_16_in_32_unaligned and copy_and_swap_16_in_32_aligned were left out since I don't see what they are supposed to do and therefore cannot create good test cases.
I'd be happy to add some in given a bit more context.
I'm particularly confused about the reinterpret_cast<const uint64_t*> part. Intuition tells me it should be this instead:

  auto dest = reinterpret_cast<uint32_t*>(dest_ptr);
  auto src = reinterpret_cast<const uint16_t*>(src_ptr);

Copy link
Member

@DrChat DrChat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch - and appreciate the tests! Those are actually really useful!

I have a couple of requests for you and then we should be good to go.

size_t unaligned_words = (reinterpret_cast<uintptr_t>(src_ptr) & 0xF) / 2;
size_t unaligned_words =
(0x10 - (reinterpret_cast<uintptr_t>(src_ptr) & 0xF)) / sizeof(uint16_t);
for (; unaligned_words > 0 && i < count; unaligned_words--, i++) {
Copy link
Member

@DrChat DrChat Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! One thing I wanted to avoid was if src_ptr is actually aligned, this will come out with unaligned_words = 8 and needlessly copy 8 words using the slow route.

__m128i input = _mm_load_si128(reinterpret_cast<const __m128i*>(&src[i]));
__m128i output = _mm_shuffle_epi8(input, shufmask);
_mm_store_si128(reinterpret_cast<__m128i*>(&dest[i]), output);
_mm_storeu_si128(reinterpret_cast<__m128i*>(&dest[i]), output);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to use _mm_store_si128 on aligned pointers.

Copy link
Contributor Author

@bwrsandman bwrsandman Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I ran into here is that there is no guarentee that dest_ptr is properly aligned.
While on Windows, there is no issue going from an aligned src_ptr to unaligned dest_ptr, on Linux this causes a segfault.

I could either:

  • Assert alignment on dest_ptr
  • Conditionally use _mm_storeu_si128 only on unaligned dest_ptr

Which is preferable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I just realized that as well. If both pointers aren't unaligned in the same manner, the src pointer will be aligned and the dest pointer just gets unaligned.

Do you have a backtrace for those crashes you were getting on linux? I think it's a good idea to assert alignment on both pointers, since this is an aligned routine after all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to that, src_ptr and dst_ptr should both be aligned (or unaligned by the same amount of bits) or else they only way to avoid a segfault on Linux is to run _mm_storeu_si128

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a backtrace: https://gist.github.com/bwrsandman/7d91bb6a1af79366e15161f992858a09
It happens on the call to _mm_load_si128

@bwrsandman bwrsandman force-pushed the memory_tests branch 6 times, most recently from 5c5a106 to ba8df67 Compare April 9, 2018 05:24
@bwrsandman
Copy link
Contributor Author

More restrictions for src_ptr and dest_ptr to have similar aligments have been added
I had to change the tests to use these alignments.
The extra bytes slowly copied on aligned data should now be gone.

@bwrsandman bwrsandman force-pushed the memory_tests branch 2 times, most recently from defa810 to 0778c73 Compare April 10, 2018 06:17
Copy link
Member

@DrChat DrChat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one more change - let's make the aligned routines require src/dest alignment, then this should be good to pull! :)

void copy_and_swap_16_aligned(void* dest_ptr, const void* src_ptr,
size_t count) {
assert_zero(reinterpret_cast<uintptr_t>(src_ptr) & 0x1);
assert_true((reinterpret_cast<uintptr_t>(src_ptr) & 0xF) ==
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this - let's just make sure src_ptr and dest_ptr are aligned to a 16-byte boundary (in the aligned routines).
Any cases you run into where this is called with unaligned pointers you can just change to use the *_unaligned routines.

It's really confusing to make sure both pointers are unaligned the same way, and very atypical for this to occur :)

size_t i = 0;
size_t unaligned_words = (reinterpret_cast<uintptr_t>(src_ptr) & 0xF) / 2;
size_t unaligned_words =
((0x10 - (reinterpret_cast<uintptr_t>(src_ptr) & 0xF)) & 0xF) /
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix! (though won't be needed with the above changes)

void copy_and_swap_32_aligned(void* dest_ptr, const void* src_ptr,
size_t count) {
assert_zero(reinterpret_cast<uintptr_t>(src_ptr) & 0x3);
assert_true((reinterpret_cast<uintptr_t>(src_ptr) & 0xF) ==
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto on the rest of these.

}

TEST_CASE("copy_and_swap_16_aligned", "Copy and Swap") {
uint16_t a[8] = {0x1111};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the test cases with src/dest on the stack, I think the alignas(16) keyword would work.

Fix segmentation fault happening on linux when `src` or `dst` is not aligned to
16 bytes.
Assert against use of `src_ptr` and `dest_ptr` which are not unaligned to 16
bits.
@bwrsandman
Copy link
Contributor Author

Now asserting 16bit alignment for each align copy. They all are tested for 16 bit offsets and cause no issues on Linux or Windows.
Removed the unaligned slow copies on aligned copies since the assets no longer allow for this.

@DrChat
Copy link
Member

DrChat commented Apr 21, 2018

Excellent! Have you encountered any cases in Xenia where we're using aligned routines when we shouldn't?

@bwrsandman
Copy link
Contributor Author

No, as far as I can tell, there aren't.
If we run into any cases in Linux in the future, these asserts will tip us off.

@DrChat DrChat merged commit 6209c4a into xenia-project:master Apr 21, 2018
@bwrsandman bwrsandman deleted the memory_tests branch April 21, 2018 18:41
@Margen67 Margen67 added the tests label Jul 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants