-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][Matrix] Enable wi_slice for joint_matrix #4979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
auto wi_slice_c = joint_matrix_get_slice(sub_c); // M.get_wi_slice() | ||
for (int i = 0; i < wi_slice.length(); i++) { | ||
wi_slice_c.data[i] *= 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just create a testcase from previous case and here I do sub_c *1=1 with wi_slice.
N, matrix_layout::row_major); | ||
auto wi_slice_c = joint_matrix_get_slice(sub_c); // M.get_wi_slice() | ||
for (int i = 0; i < wi_slice.length(); i++) { | ||
wi_slice_c.data[i] *= 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should look like this: wi_slice_c[i] *=1;
is this because you were not able to overload directly operator[]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't.
Please see http://cmplrexplorer.intel.com/z/PGc96P
accC.get_pointer() + (sg_startx * TM) * N + | ||
sg_starty / SG_SZ * TN, | ||
N, matrix_layout::row_major); | ||
auto wi_slice_c = joint_matrix_get_slice(sub_c); // M.get_wi_slice() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change the name to get_work_item_data as this is not a "joint" instruction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
accC.get_pointer() + (sg_startx * TM) * N + | ||
sg_starty / SG_SZ * TN, | ||
N, matrix_layout::row_major); | ||
auto wi_slice_c = sub_c.get_wi_slice(); // M.get_wi_slice() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change the name of the function to >> get_wi_data
accC.get_pointer() + (sg_startx * TM) * N + | ||
sg_starty / SG_SZ * TN, | ||
N, matrix_layout::row_major); | ||
}); // parallel for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1st: change the position of the mul
C = mad()
for (int i = 0; i < wi_slice_c.length(); i++) {
wi_slice_c[i] *= alpha;
}
2- change the name of the test: matrix-elemwise-ops.cpp
@@ -86,6 +85,10 @@ void matrix_multiply(big_matrix<T1, NUM_ROWS_C, NUM_COLS_C> &C, big_matrix<T2, N | |||
sg_starty / SG_SZ * TN * 4, | |||
N * 4, matrix_layout::packed_b); | |||
sub_c = joint_matrix_mad(sg, sub_a, sub_b, sub_c); | |||
auto wi_slice_c = sub_c.get_wi_data(); // M.get_wi_data() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the comment
|
||
template <typename T, std::size_t R, std::size_t C, __spv::MatrixLayout U, | ||
__spv::Scope::Flag S = __spv::Scope::Flag::Subgroup> | ||
extern SYCL_EXTERNAL size_t __spirv_JointMatrixGetSliceLength( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add Scope
parameter to this function, like we did for __spirv_JointMatrixMadINTEL
?
ping? |
} | ||
wi_elem &operator=(const T &rhs) { | ||
#ifdef __SYCL_DEVICE_ONLY__ | ||
M.spvm = __spirv_JointMatrixSetSliceElem(M.spvm, idx, rhs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that these will be converted to
SPIRVVectorInsertDynamic
and
SPIRVVectorextractDynamic
in SPIRV.
and to:
int_experimental_matrix_wi_slice_extract
int_experimental_matrix_wi_slice_insert
in LLVM IR
so we need to change the names you are using to insert/extract instead of set/get. Something like:
__spirv_JointMatrix_Extract
__spirv_JointMatrix_Insert
is sufficient to match the SPIRV ones.
What do you think @AlexeySotkin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should change it to
M.spvm = __spirv_JointMatrixSetSliceElem(M.spvm, idx, rhs); | |
M.spvm = __spirv_VectorInsertDynamic(M.spvm, rhs, idx); |
template <typename T, size_t NumRows, size_t NumCols, | ||
matrix_layout Layout = matrix_layout::row_major, | ||
typename Group = sycl::sub_group> | ||
class wi_elem { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change it to wi_element
M.spvm = __spirv_JointMatrixSetSliceElem( | ||
M.spvm, idx, __spirv_JointMatrixGetSliceElem(M.spvm, idx) * rhs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
M.spvm = __spirv_JointMatrixSetSliceElem( | |
M.spvm, idx, __spirv_JointMatrixGetSliceElem(M.spvm, idx) * rhs); | |
M.spvm = __spirv_VectorInsertDynamic( | |
M.spvm, __spirv_VectorExtractDynamic(M.spvm, idx) * rhs, idx,); |
wi_slice(joint_matrix<T, NumRows, NumCols, Layout, Group> &Mat) : M(Mat) {} | ||
size_t length() { | ||
#ifdef __SYCL_DEVICE_ONLY__ | ||
return __spirv_JointMatrixGetSliceLength(M.spvm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return __spirv_JointMatrixGetSliceLength(M.spvm); | |
return __spirv_JointMatrixWorkItemLengthINTEL(M.spvm); |
template <typename T> | ||
using __spirv_wi_slice_t = T __attribute__((ext_vector_type(0xffffff))); | ||
|
||
template <typename T, std::size_t R, std::size_t C, __spv::MatrixLayout U, | ||
__spv::Scope::Flag S = __spv::Scope::Flag::Subgroup> | ||
extern SYCL_EXTERNAL __spirv_wi_slice_t<T> &__spirv_JointMatrixGetSliceData( | ||
__spv::__spirv_JointMatrixINTEL<T, R, C, U, S> *); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current SPIR-V design we don't need these.
|
||
template <typename T, std::size_t R, std::size_t C, __spv::MatrixLayout U, | ||
__spv::Scope::Flag S = __spv::Scope::Flag::Subgroup> | ||
extern SYCL_EXTERNAL T __spirv_JointMatrixGetSliceElem( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extern SYCL_EXTERNAL T __spirv_JointMatrixGetSliceElem( | |
extern SYCL_EXTERNAL T __spirv_VectorExtractDynamic( |
__spirv_JointMatrixSetSliceElem( | ||
__spv::__spirv_JointMatrixINTEL<T, R, C, U, S> *, size_t i, T val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__spirv_JointMatrixSetSliceElem( | |
__spv::__spirv_JointMatrixINTEL<T, R, C, U, S> *, size_t i, T val); | |
__spirv_VectorInsertDynamic( | |
__spv::__spirv_JointMatrixINTEL<T, R, C, U, S> *, T val, size_t i); |
template <typename T, std::size_t R, std::size_t C, __spv::MatrixLayout U, | ||
__spv::Scope::Flag S = __spv::Scope::Flag::Subgroup> | ||
extern SYCL_EXTERNAL T __spirv_JointMatrixGetSliceElem( | ||
__spv::__spirv_JointMatrixINTEL<T, R, C, U, S> *, size_t i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The width of size_t
type is platform dependent I think. Typically it becomes i32
or i64
in LLVM IR. I would prefer a more specific type, like uint32_t
.
Is 32 bits enough for indexing in slices, what do you think @dkhaldi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
32 bits is enough. But size_t is widely used in other APIs.
Specifically, what if the user is calculating this iterator using some WI id or other id which is also size_t, will that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we explicitly say that this parameter must be uint32_t
there may be a warning about type narrowing during compilation. But now I think maybe I was wrong and we really should stick to size_t
. On LLVM IR level we can handle it in the same way as memcpy
(https://llvm.org/docs/LangRef.html#llvm-memcpy-intrinsic), i.e. it is overloaded type.
sub_c = joint_matrix_mad(sg, sub_a, sub_b, sub_c); | ||
auto wi_slice_c = sub_c.get_wi_data(); | ||
for (int i = 0; i < wi_slice_c.length(); i++) { | ||
wi_slice_c[i] *= 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
choose a different number other than the neutral element :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ping? @intel/llvm-reviewers-runtime |
@bader ping? could we merge it? |
* upstream/sycl: (1382 commits) [SYCL][XPTI] Report memory allocation info from SYCL runtime (intel#5172) [CI] Switch labels for OCL x64 job (intel#5185) [SYCL] Add basic support for the generic_space address space (intel#5148) [CI] Update CODEOWNERS for SYCL printf support passes (intel#5199) [SYCL][Matrix] Enable wi_slice for joint_matrix (intel#4979) [SYCL][Group algorithms] Move group sort extension to experimental (intel#5169) [SYCL] Fix kernel bundles don't really carry kernel IDs (intel#5121) [SYCL] Initial printf support for non-constant AS format strings (intel#5069) [SYCL][NFC] Fix static code analysis concerns (intel#5189) [SYCL][Doc] Fix typos to fix doc build (intel#5190) [Driver][SYCL] Turn on -fsycl-dead-args-optimization by default (intel#3004) [SYCL][L0][Plugin] Add support for batching copy commands (intel#5155) [CI] Add cache checkout script to docker containers (intel#5184) [SYCL][Doc] Add HIP backend to the filter selector (intel#5176) [Doc] Add documentation for Docker images (intel#4778) [LIBCLC] Add functionality for in-kernel asserts for CUDA backend (intel#5174) Force opt to use new pass manager in exponential-deferred-inlining test after a8c2ba1 [SYCL] Add vec and marray support to known_identity type trait (intel#5163) Correctly resolve merge conflicts Update SPV_INTEL_hw_thread_queries to rev 2 ...
/summary:run |
PI_INVALID_DEVICE); | ||
#endif // __SYCL_DEVICE_ONLY__ | ||
} | ||
// TODO: add other arithmetic operators |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yubingex007-a11y please do not forget to add overloading for other operators
No description provided.