-
Notifications
You must be signed in to change notification settings - Fork 787
[ESIMD] Add lsc_slm_block_load() with merging semantics #8552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ESIMD] Add lsc_slm_block_load() with merging semantics #8552
Conversation
The new lsc_slm_block_load() has an additional operand 'old_values' that contains the values returned from the function if the predicate passed to it is 0. Signed-off-by: Vyacheslav N Klochkov <vyacheslav.n.klochkov@intel.com>
__ESIMD_NS::simd<uint32_t, N> offsets = offset; | ||
return __esimd_lsc_load_slm<T, cache_hint::none, cache_hint::none, | ||
_AddressScale, _ImmOffset, _DS, _VS, _Transposed, | ||
N>(pred.data(), offsets.data()); | ||
AddressScale, ImmOffset, FDS, VS, _Transposed, N>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I'm missing something but shouldn't _Transposed
be Transposed
? I don't see _Transposed
defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Fixed now.
I re-started a LIT test before uploading for review, but forgot to rebuild the compiler after this last-minute change (removal of those underscores).
/// @tparam NElts is the number of elements to load per address. | ||
/// @tparam DS is the data size. | ||
/// @param offset is the zero-based offset for SLM buffer in bytes. | ||
/// @param pred is the predicate; if it contains 0, then the actual load |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"if it contains 0, then the actual load is not performed and the returned value is undefined" is this accurate? I thought it would be old_values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another good catch. It a copy-paste error. Fixed now. Thank you.
…t being tested before 'git push' Signed-off-by: Vyacheslav N Klochkov <vyacheslav.n.klochkov@intel.com>
…nsics_p3_lsc_slm_block_load
These timeouts on Linux are unrelated to this PR: |
This ESIMD PR is guaranteed to be unrelated to any potential issues in CUDA CI. |
The new lsc_slm_block_load() has an additional operand 'old_values' that contains the values returned from the function if the predicate passed to it is 0.
The corresponding LIT test: intel/llvm-test-suite#1637