-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][ESIMD] Make lsc_block_load/store try to use 64 bit blocks by default #8316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Complementary test PR: intel/llvm-test-suite#1590 |
Test failure |
# Conflicts: # sycl/include/sycl/ext/intel/experimental/esimd/memory.hpp
This reverts commit 783f6bb. # Conflicts: # sycl/test/esimd/lsc.cpp
/// must be 64. | ||
/// When \c DS equals \c lsc_data_size::u64 or \c sizeof(T) equal to 8 or DS is | ||
/// not set or set to \c lsc_data_size::default_size and the total amount of | ||
/// data i.e. \c sizeof(T) \c * \c NELTS is multiple of 8, the address must be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this statement basically tells that lsc_block_load of simd<int, 2> with lsc_data_size::default_size, then the address 'p' must be aligned by 8?
If so, then that seems wrong to me because for load of simd<int,2> still should be possible for address aligned by only 4.
Also, this comment is not synced with the static assert below, which requires NElems > 64:
constexpr bool Use64BitData =
sizeof(T) == 8 || (DS == lsc_data_size::default_size && NElts > 64 &&
(NElts * sizeof(T)) % 8 == 0);
IMO, even with additional N > 64, it seems questionable: Let's say,
simd<uint8_t, 128> res = lsc_block_load(addr) must work properly when 'addr' is aligned only by 4 bytes, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, using bigger element size (assuming it requires also bigger address alignment) would require extension of this API by adding hint/guarantee from user that the address passed to this routine is properly aligned.
'overaligned_tag<8>' could be used for that purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is wrong comment. I updated it to reflect the real code. The code will try to split the code to 32 bit words and only if the number of words exceeds 64 it will try to split the data to 64 bit words. so simd<uint8_t, 128> res = lsc_block_load(addr) will require 4 bytes alignment as it is possible to use 32 bit words to transfer 128 bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add 'Request changes' tag til resolution of the address alignment question.
HIP Test failures : are not related to the change |
CUDA failure |
ESIMD EMU failures |
typename FlagsT = __ESIMD_NS::element_aligned_tag, | ||
typename = std::enable_if_t<__ESIMD_NS::is_simd_flag_type_v<FlagsT>>> | ||
__ESIMD_API __ESIMD_NS::simd<T, NElts> | ||
typename FlagsT = __ESIMD_NS::overaligned_tag<4>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gregory, I believe what we agreed on the default value previously was: std::max(__ESIMD_NS::overaligned_tag<4>, __ESIMD_NS::element_aligned_tag).
So, passing int64_t pointer would assume element_aligned_tag (i.e. 8-byte alignment) and would not require explicit passing of the tag argument for longs.
From other side, if user explicitly passes overaligned<4> for 'int64_t* ptr', then that may be an indicator of
some alignment problems in users's code and it would be good to report that in compile time.
The latest update of this PR handles this scenario differently,
the explicitly set alignment to 4 would be just ignored.
It is not possible to write std::max as shown above, but we can have something like 'dqword_element_aligned_tag' defined in detail namespace, which would return 4 for byte/short/int and 8 for int64_t.
That also would remove the need in Alignment evaluation on the lines 625-632
That is a temporary solution before we get approval for using element_aligned_tag that breaks backward compatibility with previous calls with int8_t* and int16_t* without explicit alignment tag passed.
static_assert(FDS == lsc_data_size::u16 || FDS == lsc_data_size::u8 || | ||
FDS == lsc_data_size::u32 || FDS == lsc_data_size::u64, | ||
"Conversion data types are not supported"); | ||
static_assert(FlagsT::template alignment<T> >= | ||
__ESIMD_DNS::OperandSize::DWORD, | ||
static_assert(Alignment >= __ESIMD_DNS::OperandSize::DWORD, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If element type in int64_t, the required alignment is >= 8-bytes
AMD GPU failure: |
No description provided.