Skip to content

[SYCL][Matrix] Enable wi_slice for joint_matrix #4979

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 22, 2021

Conversation

yubingex007-a11y
Copy link
Contributor

No description provided.

auto wi_slice_c = joint_matrix_get_slice(sub_c); // M.get_wi_slice()
for (int i = 0; i < wi_slice.length(); i++) {
wi_slice_c.data[i] *= 1;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just create a testcase from previous case and here I do sub_c *1=1 with wi_slice.

@dkhaldi dkhaldi requested a review from mbelicki November 17, 2021 18:58
N, matrix_layout::row_major);
auto wi_slice_c = joint_matrix_get_slice(sub_c); // M.get_wi_slice()
for (int i = 0; i < wi_slice.length(); i++) {
wi_slice_c.data[i] *= 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should look like this: wi_slice_c[i] *=1;
is this because you were not able to overload directly operator[]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accC.get_pointer() + (sg_startx * TM) * N +
sg_starty / SG_SZ * TN,
N, matrix_layout::row_major);
auto wi_slice_c = joint_matrix_get_slice(sub_c); // M.get_wi_slice()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the name to get_work_item_data as this is not a "joint" instruction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

accC.get_pointer() + (sg_startx * TM) * N +
sg_starty / SG_SZ * TN,
N, matrix_layout::row_major);
auto wi_slice_c = sub_c.get_wi_slice(); // M.get_wi_slice()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the name of the function to >> get_wi_data

accC.get_pointer() + (sg_startx * TM) * N +
sg_starty / SG_SZ * TN,
N, matrix_layout::row_major);
}); // parallel for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1st: change the position of the mul
C = mad()
for (int i = 0; i < wi_slice_c.length(); i++) {
wi_slice_c[i] *= alpha;
}
2- change the name of the test: matrix-elemwise-ops.cpp

@@ -86,6 +85,10 @@ void matrix_multiply(big_matrix<T1, NUM_ROWS_C, NUM_COLS_C> &C, big_matrix<T2, N
sg_starty / SG_SZ * TN * 4,
N * 4, matrix_layout::packed_b);
sub_c = joint_matrix_mad(sg, sub_a, sub_b, sub_c);
auto wi_slice_c = sub_c.get_wi_data(); // M.get_wi_data()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the comment


template <typename T, std::size_t R, std::size_t C, __spv::MatrixLayout U,
__spv::Scope::Flag S = __spv::Scope::Flag::Subgroup>
extern SYCL_EXTERNAL size_t __spirv_JointMatrixGetSliceLength(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add Scope parameter to this function, like we did for __spirv_JointMatrixMadINTEL?

@yubingex007-a11y
Copy link
Contributor Author

ping?

}
wi_elem &operator=(const T &rhs) {
#ifdef __SYCL_DEVICE_ONLY__
M.spvm = __spirv_JointMatrixSetSliceElem(M.spvm, idx, rhs);
Copy link
Contributor

@dkhaldi dkhaldi Dec 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that these will be converted to
SPIRVVectorInsertDynamic
and
SPIRVVectorextractDynamic
in SPIRV.
and to:
int_experimental_matrix_wi_slice_extract
int_experimental_matrix_wi_slice_insert
in LLVM IR
so we need to change the names you are using to insert/extract instead of set/get. Something like:
__spirv_JointMatrix_Extract
__spirv_JointMatrix_Insert
is sufficient to match the SPIRV ones.
What do you think @AlexeySotkin?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change it to

Suggested change
M.spvm = __spirv_JointMatrixSetSliceElem(M.spvm, idx, rhs);
M.spvm = __spirv_VectorInsertDynamic(M.spvm, rhs, idx);

template <typename T, size_t NumRows, size_t NumCols,
matrix_layout Layout = matrix_layout::row_major,
typename Group = sycl::sub_group>
class wi_elem {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change it to wi_element

Comment on lines 235 to 236
M.spvm = __spirv_JointMatrixSetSliceElem(
M.spvm, idx, __spirv_JointMatrixGetSliceElem(M.spvm, idx) * rhs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
M.spvm = __spirv_JointMatrixSetSliceElem(
M.spvm, idx, __spirv_JointMatrixGetSliceElem(M.spvm, idx) * rhs);
M.spvm = __spirv_VectorInsertDynamic(
M.spvm, __spirv_VectorExtractDynamic(M.spvm, idx) * rhs, idx,);

wi_slice(joint_matrix<T, NumRows, NumCols, Layout, Group> &Mat) : M(Mat) {}
size_t length() {
#ifdef __SYCL_DEVICE_ONLY__
return __spirv_JointMatrixGetSliceLength(M.spvm);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return __spirv_JointMatrixGetSliceLength(M.spvm);
return __spirv_JointMatrixWorkItemLengthINTEL(M.spvm);

Comment on lines 89 to 95
template <typename T>
using __spirv_wi_slice_t = T __attribute__((ext_vector_type(0xffffff)));

template <typename T, std::size_t R, std::size_t C, __spv::MatrixLayout U,
__spv::Scope::Flag S = __spv::Scope::Flag::Subgroup>
extern SYCL_EXTERNAL __spirv_wi_slice_t<T> &__spirv_JointMatrixGetSliceData(
__spv::__spirv_JointMatrixINTEL<T, R, C, U, S> *);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current SPIR-V design we don't need these.


template <typename T, std::size_t R, std::size_t C, __spv::MatrixLayout U,
__spv::Scope::Flag S = __spv::Scope::Flag::Subgroup>
extern SYCL_EXTERNAL T __spirv_JointMatrixGetSliceElem(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
extern SYCL_EXTERNAL T __spirv_JointMatrixGetSliceElem(
extern SYCL_EXTERNAL T __spirv_VectorExtractDynamic(

Comment on lines 110 to 111
__spirv_JointMatrixSetSliceElem(
__spv::__spirv_JointMatrixINTEL<T, R, C, U, S> *, size_t i, T val);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
__spirv_JointMatrixSetSliceElem(
__spv::__spirv_JointMatrixINTEL<T, R, C, U, S> *, size_t i, T val);
__spirv_VectorInsertDynamic(
__spv::__spirv_JointMatrixINTEL<T, R, C, U, S> *, T val, size_t i);

template <typename T, std::size_t R, std::size_t C, __spv::MatrixLayout U,
__spv::Scope::Flag S = __spv::Scope::Flag::Subgroup>
extern SYCL_EXTERNAL T __spirv_JointMatrixGetSliceElem(
__spv::__spirv_JointMatrixINTEL<T, R, C, U, S> *, size_t i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The width of size_t type is platform dependent I think. Typically it becomes i32 or i64 in LLVM IR. I would prefer a more specific type, like uint32_t.
Is 32 bits enough for indexing in slices, what do you think @dkhaldi?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32 bits is enough. But size_t is widely used in other APIs.
Specifically, what if the user is calculating this iterator using some WI id or other id which is also size_t, will that work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we explicitly say that this parameter must be uint32_t there may be a warning about type narrowing during compilation. But now I think maybe I was wrong and we really should stick to size_t. On LLVM IR level we can handle it in the same way as memcpy (https://llvm.org/docs/LangRef.html#llvm-memcpy-intrinsic), i.e. it is overloaded type.

sub_c = joint_matrix_mad(sg, sub_a, sub_b, sub_c);
auto wi_slice_c = sub_c.get_wi_data();
for (int i = 0; i < wi_slice_c.length(); i++) {
wi_slice_c[i] *= 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

choose a different number other than the neutral element :)

Copy link
Contributor

@dkhaldi dkhaldi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yubingex007-a11y yubingex007-a11y requested review from bader and removed request for AlexeySotkin December 21, 2021 15:50
@yubingex007-a11y
Copy link
Contributor Author

ping? @intel/llvm-reviewers-runtime

@yubingex007-a11y
Copy link
Contributor Author

yubingex007-a11y commented Dec 22, 2021

@bader ping? could we merge it?

@bader bader changed the title [Matrix] Enable wi_slice for joint_matrix [SYCL][Matrix] Enable wi_slice for joint_matrix Dec 22, 2021
@bader bader merged commit 97127eb into intel:sycl Dec 22, 2021
alexbatashev added a commit to alexbatashev/llvm that referenced this pull request Dec 23, 2021
* upstream/sycl: (1382 commits)
  [SYCL][XPTI] Report memory allocation info from SYCL runtime (intel#5172)
  [CI] Switch labels for OCL x64 job (intel#5185)
  [SYCL] Add basic support for the generic_space address space (intel#5148)
  [CI] Update CODEOWNERS for SYCL printf support passes (intel#5199)
  [SYCL][Matrix] Enable wi_slice for joint_matrix (intel#4979)
  [SYCL][Group algorithms] Move group sort extension to experimental (intel#5169)
  [SYCL] Fix kernel bundles don't really carry kernel IDs (intel#5121)
  [SYCL] Initial printf support for non-constant AS format strings (intel#5069)
  [SYCL][NFC] Fix static code analysis concerns (intel#5189)
  [SYCL][Doc] Fix typos to fix doc build (intel#5190)
  [Driver][SYCL] Turn on -fsycl-dead-args-optimization by default (intel#3004)
  [SYCL][L0][Plugin] Add support for batching copy commands (intel#5155)
  [CI] Add cache checkout script to docker containers (intel#5184)
  [SYCL][Doc] Add HIP backend to the filter selector (intel#5176)
  [Doc] Add documentation for Docker images (intel#4778)
  [LIBCLC] Add functionality for in-kernel asserts for CUDA backend (intel#5174)
  Force opt to use new pass manager in exponential-deferred-inlining test after a8c2ba1
  [SYCL] Add vec and marray support to known_identity type trait (intel#5163)
  Correctly resolve merge conflicts
  Update SPV_INTEL_hw_thread_queries to rev 2
  ...
@bader
Copy link
Contributor

bader commented Dec 23, 2021

/summary:run

PI_INVALID_DEVICE);
#endif // __SYCL_DEVICE_ONLY__
}
// TODO: add other arithmetic operators
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yubingex007-a11y please do not forget to add overloading for other operators

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants