Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero register optimization for AVX-512-VBMI #14241

Merged
merged 2 commits into from
Aug 28, 2023

Conversation

Whatcookie
Copy link
Member

I had an idea for an optimization and decided to write it before I forgot about it.

Take advantage of the fact that AVX instructions zero the upper 128 bits for a nice optimization when one input vector is zeroed, using the same 256 wide vpermb trick. Since bit 0x10 is selecting the second (zeroed) vector, with a 256 wide vpermb, the 0x10 bit conveniently selects the already zeroed bits.

- Take advantage of the fact that AVX instructions zero the upper 128 bits for a nice optimization when one input vector is zeroed
@elad335 elad335 added the LLVM Related to LLVM instruction decoders label Jul 22, 2023
@elad335 elad335 requested a review from Nekotekina August 6, 2023 08:43
@Nekotekina Nekotekina merged commit 290ff5b into RPCS3:master Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LLVM Related to LLVM instruction decoders
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants