Skip to content

[SYCL][CUDA] Implementation of matrix ext using new "unified" interface #7077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 48 commits into from
Dec 12, 2022

Conversation

JackAKirk
Copy link
Contributor

CUDA backend implementation using the "unified" matrix extension interface. The same interface will be used for a future Intel backend implementation of the matrix extension.

JackAKirk and others added 17 commits August 5, 2022 12:59
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
This is a move towards the future looking joint_matrix, joint_matrix_load, joint_matrix_store APIs.
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Also updated the impl functions used in the CUDA backend (Some of these functions may be also used in the HIP AMD case when that is implemented, since the interfaces will match).

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
This is for illustrative purposes: to show the advantage of the proposed change in the joint_matrix_mad interface.

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <chezjakirk@gmail.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Used consistant naming convention in impl.

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@JackAKirk
Copy link
Contributor Author

/verify with intel/llvm-test-suite#1183

steffenlarsen pushed a commit to intel/llvm-test-suite that referenced this pull request Oct 18, 2022
Updated all tests to use new "unified" interfaces from intel/llvm#7077. The old legacy interface implementation is deprecated but still tested via the _legacy files.

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
JackAKirk pushed a commit to JackAKirk/llvm-test-suite that referenced this pull request Oct 18, 2022
Tests require intel/llvm#7077

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
steffenlarsen pushed a commit to intel/llvm-test-suite that referenced this pull request Oct 18, 2022
Xfail tests that are not supported yet.

Tests require intel/llvm#7077

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Copy link
Contributor

@dkhaldi dkhaldi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@JackAKirk
Copy link
Contributor Author

JackAKirk commented Dec 8, 2022

@bader Can this be merged now that it has the two approvals?

AMD failures are unrelated.

Thanks

@JackAKirk
Copy link
Contributor Author

/verify with intel/llvm-test-suite#1334

@bader
Copy link
Contributor

bader commented Dec 8, 2022

GitHub says we need one more approval from @intel/llvm-reviewers-runtime team.

@JackAKirk
Copy link
Contributor Author

GitHub says we need one more approval from @intel/llvm-reviewers-runtime team.

OK. It would be great if this can get a review quite quickly. I will be on holiday after tomorrow and we wanted to have this before the 2023.1 code freeze also. It would mean that we could publish the joint_matrix optimizations for SYCL-DNN and SYCL-BLAS for the 2023.1 release too.

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay from a high-level perspective and @dkhaldi and @yubingex007-a11y have been thorough. 👍

@JackAKirk
Copy link
Contributor Author

Looks okay from a high-level perspective and @dkhaldi and @yubingex007-a11y have been thorough. +1

Thanks for the review!

Copy link
Contributor

@yubingex007-a11y yubingex007-a11y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JackAKirk
Copy link
Contributor Author

JackAKirk commented Dec 9, 2022

I just had to merge the sycl branch to resolve the conflict with c103a6a where the now unnecessary c++17 checks were removed.
This makes no difference to the patch.

@steffenlarsen
Copy link
Contributor

/verify with intel/llvm-test-suite#1334

@JackAKirk
Copy link
Contributor Author

/verify with intel/llvm-test-suite#1334

@steffenlarsen steffenlarsen merged commit 166bbc3 into intel:sycl Dec 12, 2022
std::ignore = sg;
return wi_data(jm);
#else
// TODO add Intel impl.
Copy link
Contributor

@yubingex007-a11y yubingex007-a11y Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AerialMantis @JackAKirk @dkhaldi
since we can't provide wi_data in both cuda&intel's header, i will make wi_data unified again and provide wi_data of host version, so the return type should be "decltype(auto)".

layout Layout>
struct joint_matrix {

#if defined(__SYCL_DEVICE_ONLY__) && defined(__SPIR__)
Copy link
Contributor

@yubingex007-a11y yubingex007-a11y Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AerialMantis @JackAKirk
sorry, i remember previously it is:

#if defined(__SYCL_DEVICE_ONLY__)
#if defined(__NVPTX__)
  sycl::ext::oneapi::detail::joint_matrix_cuda<T, Use, Rows, Cols, Layout>
      cuda_impl;
#else
  __spv::__spirv_JointMatrixINTEL<
      T, Rows, Cols, spv_matrix_layout_traits<Layout>::value,
      spv_scope_traits<Group>::value, spv_matrix_use_traits<Use>::value> *spvm;
#endif // defined(__SYCL_DEVICE_ONLY__)
#endif

in intel side, we can't let host compilation use sycl::ext::oneapi::detail::joint_matrix_cuda. so i go back to the previous code and i can still get passed in cuda's testcases.

aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Mar 27, 2023
…suite#1183)

Updated all tests to use new "unified" interfaces from intel#7077. The old legacy interface implementation is deprecated but still tested via the _legacy files.

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Mar 27, 2023
…vm-test-suite#1331)

Xfail tests that are not supported yet.

Tests require intel#7077

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants