Skip to content

[SYCL][ESIMD] Make lsc_block_load/store try to use 64 bit blocks by default #8316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Mar 16, 2023

Conversation

fineg74
Copy link
Contributor

@fineg74 fineg74 commented Feb 11, 2023

No description provided.

@fineg74 fineg74 requested a review from a team as a code owner February 11, 2023 06:32
@fineg74
Copy link
Contributor Author

fineg74 commented Feb 11, 2023

Complementary test PR: intel/llvm-test-suite#1590

@fineg74 fineg74 temporarily deployed to aws February 12, 2023 12:07 — with GitHub Actions Inactive
@fineg74 fineg74 temporarily deployed to aws February 13, 2023 17:22 — with GitHub Actions Inactive
@fineg74 fineg74 temporarily deployed to aws February 13, 2023 18:55 — with GitHub Actions Inactive
@fineg74
Copy link
Contributor Author

fineg74 commented Feb 13, 2023

Test failure
ESIMD/imulh_umulh.cpp
is not related to the change

# Conflicts:
#	sycl/include/sycl/ext/intel/experimental/esimd/memory.hpp
@fineg74 fineg74 temporarily deployed to aws February 15, 2023 19:41 — with GitHub Actions Inactive
@fineg74 fineg74 temporarily deployed to aws February 16, 2023 00:44 — with GitHub Actions Inactive
@bader bader temporarily deployed to aws February 18, 2023 19:54 — with GitHub Actions Inactive
@bader bader temporarily deployed to aws February 18, 2023 20:41 — with GitHub Actions Inactive
@fineg74 fineg74 temporarily deployed to aws February 21, 2023 04:36 — with GitHub Actions Inactive
@fineg74 fineg74 temporarily deployed to aws February 21, 2023 05:48 — with GitHub Actions Inactive
@fineg74 fineg74 temporarily deployed to aws February 21, 2023 09:55 — with GitHub Actions Inactive
/// must be 64.
/// When \c DS equals \c lsc_data_size::u64 or \c sizeof(T) equal to 8 or DS is
/// not set or set to \c lsc_data_size::default_size and the total amount of
/// data i.e. \c sizeof(T) \c * \c NELTS is multiple of 8, the address must be
Copy link
Contributor

@v-klochkov v-klochkov Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this statement basically tells that lsc_block_load of simd<int, 2> with lsc_data_size::default_size, then the address 'p' must be aligned by 8?
If so, then that seems wrong to me because for load of simd<int,2> still should be possible for address aligned by only 4.
Also, this comment is not synced with the static assert below, which requires NElems > 64:

  constexpr bool Use64BitData =
      sizeof(T) == 8 || (DS == lsc_data_size::default_size && NElts > 64 &&
                         (NElts * sizeof(T)) % 8 == 0);

IMO, even with additional N > 64, it seems questionable: Let's say,
simd<uint8_t, 128> res = lsc_block_load(addr) must work properly when 'addr' is aligned only by 4 bytes, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, using bigger element size (assuming it requires also bigger address alignment) would require extension of this API by adding hint/guarantee from user that the address passed to this routine is properly aligned.
'overaligned_tag<8>' could be used for that purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is wrong comment. I updated it to reflect the real code. The code will try to split the code to 32 bit words and only if the number of words exceeds 64 it will try to split the data to 64 bit words. so simd<uint8_t, 128> res = lsc_block_load(addr) will require 4 bytes alignment as it is possible to use 32 bit words to transfer 128 bytes

Copy link
Contributor

@v-klochkov v-klochkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add 'Request changes' tag til resolution of the address alignment question.

@fineg74 fineg74 temporarily deployed to aws March 2, 2023 00:40 — with GitHub Actions Inactive
@fineg74 fineg74 temporarily deployed to aws March 2, 2023 02:33 — with GitHub Actions Inactive
@fineg74
Copy link
Contributor Author

fineg74 commented Mar 2, 2023

HIP Test failures :
600.00s: SYCL :: Basic/reqd_work_group_size.cpp
308.00s: SYCL :: AsyncHandler/default_async_handler.cpp
307.44s: SYCL :: Regression/kernel_bundle_ignore_sycl_external.cpp
306.17s: SYCL :: Regression/static-buffer-dtor.cpp
305.89s: SYCL :: KernelAndProgram/spec_constants_after_link.cpp
305.16s: SYCL :: Regression/same_unnamed_kernels.cpp
120.60s: SYCL :: GroupAlgorithm/SYCL2020/sort.cpp
75.67s: SYCL :: Reduction/reduction_internal.cpp
60.85s: SYCL :: Reduction/reduction_span_pack.cpp
55.19s: SYCL :: USM/copy2d.cpp
52.14s: SYCL :: USM/memcpy2d.cpp
44.53s: SYCL :: DeviceLib/math_test_marray_vec.cpp
40.21s: SYCL :: HostInteropTask/host-task-dependency3.cpp
32.99s: SYCL :: Basic/multisource.cpp
28.79s: SYCL :: Basic/vector_byte.cpp
28.03s: SYCL :: Basic/vector_operators.cpp
27.81s: SYCL :: Basic/stream/stream.cpp
27.26s: SYCL :: Regression/optimization_level_debug_info_specopt.cpp
26.89s: SYCL :: DeviceLib/math_test_marray_vec_fp16.cpp
24.18s: SYCL :: Reduction/reduction_range_usm_dw.cpp

are not related to the change

@fineg74
Copy link
Contributor Author

fineg74 commented Mar 2, 2023

CUDA failure
SYCL :: SubGroup/shuffle_fp64.cpp
Is not related to the change

@fineg74
Copy link
Contributor Author

fineg74 commented Mar 2, 2023

ESIMD EMU failures
SYCL :: lsc/lsc_surf_load_u8_u16.cpp
SYCL :: lsc/lsc_surf_load_u8_u16_stateless.cpp
SYCL :: lsc/lsc_surf_store_u8_u16.cpp
SYCL :: lsc/lsc_usm_block_load_u8_u16.cpp
SYCL :: lsc/lsc_usm_block_load_u8_u16_64.cpp
SYCL :: lsc/lsc_usm_store_u8_u16.cpp
SYCL :: lsc/lsc_usm_store_u8_u16_64.cpp
are expected as the change introduces more stringent checks that fail the tests that were not updated. The tests were updated in complementary test PR

@fineg74 fineg74 temporarily deployed to aws March 7, 2023 03:57 — with GitHub Actions Inactive
@fineg74 fineg74 temporarily deployed to aws March 7, 2023 10:00 — with GitHub Actions Inactive
typename FlagsT = __ESIMD_NS::element_aligned_tag,
typename = std::enable_if_t<__ESIMD_NS::is_simd_flag_type_v<FlagsT>>>
__ESIMD_API __ESIMD_NS::simd<T, NElts>
typename FlagsT = __ESIMD_NS::overaligned_tag<4>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gregory, I believe what we agreed on the default value previously was: std::max(__ESIMD_NS::overaligned_tag<4>, __ESIMD_NS::element_aligned_tag).
So, passing int64_t pointer would assume element_aligned_tag (i.e. 8-byte alignment) and would not require explicit passing of the tag argument for longs.

From other side, if user explicitly passes overaligned<4> for 'int64_t* ptr', then that may be an indicator of
some alignment problems in users's code and it would be good to report that in compile time.
The latest update of this PR handles this scenario differently,
the explicitly set alignment to 4 would be just ignored.

It is not possible to write std::max as shown above, but we can have something like 'dqword_element_aligned_tag' defined in detail namespace, which would return 4 for byte/short/int and 8 for int64_t.
That also would remove the need in Alignment evaluation on the lines 625-632

That is a temporary solution before we get approval for using element_aligned_tag that breaks backward compatibility with previous calls with int8_t* and int16_t* without explicit alignment tag passed.

static_assert(FDS == lsc_data_size::u16 || FDS == lsc_data_size::u8 ||
FDS == lsc_data_size::u32 || FDS == lsc_data_size::u64,
"Conversion data types are not supported");
static_assert(FlagsT::template alignment<T> >=
__ESIMD_DNS::OperandSize::DWORD,
static_assert(Alignment >= __ESIMD_DNS::OperandSize::DWORD,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If element type in int64_t, the required alignment is >= 8-bytes

@fineg74 fineg74 temporarily deployed to aws March 7, 2023 19:42 — with GitHub Actions Inactive
@fineg74 fineg74 temporarily deployed to aws March 8, 2023 08:40 — with GitHub Actions Inactive
@fineg74 fineg74 temporarily deployed to aws March 8, 2023 10:15 — with GitHub Actions Inactive
@fineg74
Copy link
Contributor Author

fineg74 commented Mar 8, 2023

AMD GPU failure:
SYCL :: AsyncHandler/default_async_handler.cpp
is not related to the change

@fineg74 fineg74 temporarily deployed to aws March 10, 2023 21:15 — with GitHub Actions Inactive
@fineg74 fineg74 temporarily deployed to aws March 10, 2023 21:43 — with GitHub Actions Inactive
@fineg74 fineg74 temporarily deployed to aws March 13, 2023 22:50 — with GitHub Actions Inactive
@fineg74 fineg74 temporarily deployed to aws March 13, 2023 23:53 — with GitHub Actions Inactive
@v-klochkov v-klochkov merged commit 6f4fde6 into intel:sycl Mar 16, 2023
@fineg74 fineg74 deleted the lscload64 branch March 16, 2023 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants