Skip to content

[SYCL][CUDA] Fix generating permute bytes from register pair when the initial values are undefined. #12068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

mmoadeli
Copy link
Contributor

@mmoadeli mmoadeli commented Dec 4, 2023

When generating the permute bytes for the prmt instruction, the existence of an undefined initial value initialises the int32 that holds the mask with all 1's (0xFFFFFFFF). That initialization subsequently leads to complications during the subsequent OR operation, leading to inaccuracies in populating mask values for the following bytes. Consequently, the final value persists as a constant -1, irrespective of the actual mask values that succeed the initial set value.

@mmoadeli mmoadeli requested review from a team as code owners December 4, 2023 20:07
@mmoadeli mmoadeli requested review from bso-intel and AlexeySachkov and removed request for bso-intel December 4, 2023 20:07
@mmoadeli mmoadeli linked an issue Dec 4, 2023 that may be closed by this pull request
Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we push this directly to the upstream llvm/llvm-project?

@mmoadeli
Copy link
Contributor Author

mmoadeli commented Dec 5, 2023

Should we push this directly to the upstream llvm/llvm-project?

I can push to llvm/llvm-project. It usually takes ages that they act, though.

@bader
Copy link
Contributor

bader commented Dec 5, 2023

Should we push this directly to the upstream llvm/llvm-project?

I can push to llvm/llvm-project. It usually takes ages that they act, though.

This is the way.

jsji pushed a commit that referenced this pull request Jan 19, 2024
#11840 is resolved by usptreaming
the fix in #12068 to
llvm/llvm-project#74437.
@npmiller
Copy link
Contributor

npmiller commented Mar 6, 2024

@npmiller npmiller closed this Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vector conversion does not work correctly on CUDA
4 participants