[SYCL][CUDA] Implementation of matrix ext using new "unified" interface #7077

JackAKirk · 2022-10-17T14:28:28Z

CUDA backend implementation using the "unified" matrix extension interface. The same interface will be used for a future Intel backend implementation of the matrix extension.

New "unified" interface uses SYCL_EXT_ONEAPI_MATRIX_VERSION=4
joint_matrix_load, joint_matrix_store, joint_matrix_mad and joint_matrix interfaces match the new spec from [SYCL][Spec] Update the matrix spec based on new use argument #6662
Separated joint_matrix_* functions into new header matrix-unified.hpp: Intel backend implementations can be called from the same functions in the future.
C++17 everywhere in line with [SYCL] Emit an error on attempt to compile in less than C++17 mode #6678
Updated device code tests to use new interfaces
Completely removed uint16 implementations that are replaced by bfloat16 that is being moved out of the experimental namespace
Updated all CUDA runtime matrix tests here: [SYCL][CUDA] Unified matrix interface updated tests llvm-test-suite#1183

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

This is a move towards the future looking joint_matrix, joint_matrix_load, joint_matrix_store APIs. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Also updated the impl functions used in the CUDA backend (Some of these functions may be also used in the HIP AMD case when that is implemented, since the interfaces will match). Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

This is for illustrative purposes: to show the advantage of the proposed change in the joint_matrix_mad interface. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Signed-off-by: JackAKirk <chezjakirk@gmail.com>

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Used consistant naming convention in impl. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk · 2022-10-17T16:16:22Z

/verify with intel/llvm-test-suite#1183

Updated all tests to use new "unified" interfaces from intel/llvm#7077. The old legacy interface implementation is deprecated but still tested via the _legacy files. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Tests require intel/llvm#7077 Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Xfail tests that are not supported yet. Tests require intel/llvm#7077 Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

dkhaldi

LGTM

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk · 2022-12-08T16:49:25Z

@bader Can this be merged now that it has the two approvals?

AMD failures are unrelated.

Thanks

JackAKirk · 2022-12-08T16:50:50Z

/verify with intel/llvm-test-suite#1334

bader · 2022-12-08T16:51:05Z

GitHub says we need one more approval from @intel/llvm-reviewers-runtime team.

JackAKirk · 2022-12-08T16:58:37Z

GitHub says we need one more approval from @intel/llvm-reviewers-runtime team.

OK. It would be great if this can get a review quite quickly. I will be on holiday after tomorrow and we wanted to have this before the 2023.1 code freeze also. It would mean that we could publish the joint_matrix optimizations for SYCL-DNN and SYCL-BLAS for the 2023.1 release too.

steffenlarsen

Looks okay from a high-level perspective and @dkhaldi and @yubingex007-a11y have been thorough. 👍

JackAKirk · 2022-12-08T17:25:00Z

Looks okay from a high-level perspective and @dkhaldi and @yubingex007-a11y have been thorough. +1

Thanks for the review!

yubingex007-a11y

LGTM

JackAKirk · 2022-12-09T10:09:47Z

I just had to merge the sycl branch to resolve the conflict with c103a6a where the now unnecessary c++17 checks were removed.
This makes no difference to the patch.

steffenlarsen · 2022-12-09T11:12:33Z

/verify with intel/llvm-test-suite#1334

JackAKirk · 2022-12-10T00:40:05Z

/verify with intel/llvm-test-suite#1334

steffenlarsen · 2022-12-12T18:52:18Z

Verification failures in Windows CI are unrelated and have been reported:

yubingex007-a11y · 2022-12-14T03:28:21Z

sycl/include/sycl/ext/oneapi/matrix/matrix-unified.hpp

+  std::ignore = sg;
+  return wi_data(jm);
+#else
+  // TODO add Intel impl.


@AerialMantis @JackAKirk @dkhaldi
since we can't provide wi_data in both cuda&intel's header, i will make wi_data unified again and provide wi_data of host version, so the return type should be "decltype(auto)".

yubingex007-a11y · 2022-12-14T03:37:25Z

sycl/include/sycl/ext/oneapi/matrix/matrix-unified.hpp

+          layout Layout>
+struct joint_matrix {
+
+#if defined(__SYCL_DEVICE_ONLY__) && defined(__SPIR__)


@AerialMantis @JackAKirk
sorry, i remember previously it is:

#if defined(__SYCL_DEVICE_ONLY__) #if defined(__NVPTX__) sycl::ext::oneapi::detail::joint_matrix_cuda<T, Use, Rows, Cols, Layout> cuda_impl; #else __spv::__spirv_JointMatrixINTEL< T, Rows, Cols, spv_matrix_layout_traits<Layout>::value, spv_scope_traits<Group>::value, spv_matrix_use_traits<Use>::value> *spvm; #endif // defined(__SYCL_DEVICE_ONLY__) #endif

in intel side, we can't let host compilation use sycl::ext::oneapi::detail::joint_matrix_cuda. so i go back to the previous code and i can still get passed in cuda's testcases.

…suite#1183) Updated all tests to use new "unified" interfaces from intel#7077. The old legacy interface implementation is deprecated but still tested via the _legacy files. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

…vm-test-suite#1331) Xfail tests that are not supported yet. Tests require intel#7077 Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk and others added 17 commits August 5, 2022 12:59

Allow joint_matrix to be loaded from const.

fdc4c42

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

removed duplicates.

68d3150

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Layout accumulator is specified at load/store.

4949464

This is a move towards the future looking joint_matrix, joint_matrix_load, joint_matrix_store APIs. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

joint_matrix_mad takes D matrix as argument.

8c09910

Also updated the impl functions used in the CUDA backend (Some of these functions may be also used in the HIP AMD case when that is implemented, since the interfaces will match). Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Add new mma cases enabled by joint_matrix_mad.

e55e5f0

This is for illustrative purposes: to show the advantage of the proposed change in the joint_matrix_mad interface. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

packed_a, packed_b -> packed

a881055

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Made interface compatible with intel backend.

5b84434

Signed-off-by: JackAKirk <chezjakirk@gmail.com>

Merge branch 'sycl' into nvptx-matrix-const

75774f2

Merge branch 'nvptx-matrix-const' into update-matrix-interface

5c03b3f

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

added unified header, moved nvptx specific impl.

ccdb544

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Merge branch 'sycl' into update-matrix-interface

331760a

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

(very) draft updated interfaces.

46e87a1

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

cuda joint_matrix partial specializations in separate file.

766fd8c

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Merge branch 'sycl' into unified-interface

32dafa3

Refactoring and supporting loading from const.

24d3aa1

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Improve error msg and pass by ref.

b9a051f

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

updated device code tests.

3dbeadb

Used consistant naming convention in impl. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk requested a review from a team as a code owner October 17, 2022 14:28

JackAKirk requested a review from v-klochkov October 17, 2022 14:28

JackAKirk mentioned this pull request Oct 17, 2022

[SYCL][CUDA] Unified matrix interface updated tests intel/llvm-test-suite#1183

Merged

JackAKirk requested a review from dkhaldi October 17, 2022 14:31

JackAKirk added 3 commits October 17, 2022 07:39

Merge branch 'sycl' into unified-interface

49147d3

format.

ee1208e

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

format

446c0a0

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk pushed a commit to JackAKirk/llvm-test-suite that referenced this pull request Oct 18, 2022

Xfail tests that are not supported yet.

f942304

Tests require intel/llvm#7077 Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk mentioned this pull request Oct 18, 2022

[SYCL][CUDA] Xfail matrix tests that are not supported yet. intel/llvm-test-suite#1331

Merged

fix failed tests.

8da0aa7

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

dkhaldi approved these changes Dec 5, 2022

View reviewed changes

get_wi_data no longer auto to host return removed.

b9ca55c

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

steffenlarsen approved these changes Dec 8, 2022

View reviewed changes

yubingex007-a11y approved these changes Dec 8, 2022

View reviewed changes

Merge branch 'sycl' into unified-interface

bb6fc5e

Merge branch 'sycl' into unified-interface

68fcf9a

steffenlarsen merged commit 166bbc3 into intel:sycl Dec 12, 2022

yubingex007-a11y reviewed Dec 14, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][CUDA] Implementation of matrix ext using new "unified" interface #7077

[SYCL][CUDA] Implementation of matrix ext using new "unified" interface #7077

Uh oh!

JackAKirk commented Oct 17, 2022

Uh oh!

JackAKirk commented Oct 17, 2022

Uh oh!

dkhaldi left a comment

Uh oh!

JackAKirk commented Dec 8, 2022 •

edited

Loading

Uh oh!

JackAKirk commented Dec 8, 2022

Uh oh!

bader commented Dec 8, 2022

Uh oh!

JackAKirk commented Dec 8, 2022

Uh oh!

steffenlarsen left a comment

Uh oh!

JackAKirk commented Dec 8, 2022

Uh oh!

yubingex007-a11y left a comment

Uh oh!

JackAKirk commented Dec 9, 2022 •

edited

Loading

Uh oh!

steffenlarsen commented Dec 9, 2022

Uh oh!

JackAKirk commented Dec 10, 2022

Uh oh!

steffenlarsen commented Dec 12, 2022

Uh oh!

yubingex007-a11y Dec 14, 2022 •

edited

Loading

Uh oh!

yubingex007-a11y Dec 14, 2022 •

edited

Loading

Uh oh!

Uh oh!

[SYCL][CUDA] Implementation of matrix ext using new "unified" interface #7077

[SYCL][CUDA] Implementation of matrix ext using new "unified" interface #7077

Uh oh!

Conversation

JackAKirk commented Oct 17, 2022

Uh oh!

JackAKirk commented Oct 17, 2022

Uh oh!

dkhaldi left a comment

Choose a reason for hiding this comment

Uh oh!

JackAKirk commented Dec 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackAKirk commented Dec 8, 2022

Uh oh!

bader commented Dec 8, 2022

Uh oh!

JackAKirk commented Dec 8, 2022

Uh oh!

steffenlarsen left a comment

Choose a reason for hiding this comment

Uh oh!

JackAKirk commented Dec 8, 2022

Uh oh!

yubingex007-a11y left a comment

Choose a reason for hiding this comment

Uh oh!

JackAKirk commented Dec 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steffenlarsen commented Dec 9, 2022

Uh oh!

JackAKirk commented Dec 10, 2022

Uh oh!

steffenlarsen commented Dec 12, 2022

Uh oh!

yubingex007-a11y Dec 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yubingex007-a11y Dec 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JackAKirk commented Dec 8, 2022 •

edited

Loading

JackAKirk commented Dec 9, 2022 •

edited

Loading

yubingex007-a11y Dec 14, 2022 •

edited

Loading

yubingex007-a11y Dec 14, 2022 •

edited

Loading