Skip to content

Conversation

binhdvo
Copy link
Contributor

@binhdvo binhdvo commented Dec 14, 2021

Addresses performance issues caused by gcc failing to properly inline ZSTD_memmove when used inside ZSTD_copy16 macro. This change (from the original ZSTD_memcpy) was necessary because changes to the dctx handling now make overlap in memory a possibility. That original change was introduced in 6a7ede3 and had a negative impact of about 15% when compiling with the -m32 flag and using gcc.

Performance when applying this change to that commit is reduced to a 5% drop for -m32 using gcc. Performance improvement of this fix when applied to the current HEAD and using -m32 is 10%.

Performance testing indicates the loss was primarily due to this inlining behavior and not alignment issues. Continuing to try different approaches to see if there is a way to recoup the remaining performance loss.

@terrelln
Copy link
Contributor

What does the x86-64 performance look like with this change?

@binhdvo
Copy link
Contributor Author

binhdvo commented Dec 14, 2021

What does the x86-64 performance look like with this change?

It was neutral as compared to the current HEAD

@ghost
Copy link

ghost commented Dec 14, 2021

There is a SSE2 path:

#elif defined(ZSTD_ARCH_X86_SSE2)
    _mm_storeu_si128((__m128i*)dst, _mm_loadu_si128((const __m128i*)src));

But even use -m32 -msse2 options, gcc doesn't use that path:
https://godbolt.org/z/Yz3qsaca7

~~ Maybe ZSTD_ARCH_X86_SSE2 macro can be improved to support 32bit+SSE2.~~

edit: -m32 -msse2 shoudl enable ZSTD_ARCH_X86_SSE2 macro.
It seems clang use SSE2 by default, gcc doesn't.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Dec 14, 2021

~~ Maybe ZSTD_ARCH_X86_SSE2 macro can be improved to support 32bit+SSE2.~~

-m32 -msse2 enables ZSTD_ARCH_X86_SSE2.
But -m32 alone doesn't (and shouldn't, since x86 cpus without sse2 exist, in contrast with x64).
Verified for both gcc and clang.

@binhdvo binhdvo merged commit 64205b7 into facebook:dev Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants