Fix performance degradation with -m32 #2926
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Addresses performance issues caused by gcc failing to properly inline ZSTD_memmove when used inside ZSTD_copy16 macro. This change (from the original ZSTD_memcpy) was necessary because changes to the dctx handling now make overlap in memory a possibility. That original change was introduced in 6a7ede3 and had a negative impact of about 15% when compiling with the -m32 flag and using gcc.
Performance when applying this change to that commit is reduced to a 5% drop for -m32 using gcc. Performance improvement of this fix when applied to the current HEAD and using -m32 is 10%.
Performance testing indicates the loss was primarily due to this inlining behavior and not alignment issues. Continuing to try different approaches to see if there is a way to recoup the remaining performance loss.