-
Notifications
You must be signed in to change notification settings - Fork 50
New implementation of GEMM without using local memory #80
New implementation of GEMM without using local memory #80
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments. In general, some more clarifications on the algorithm would be useful. A README explaining how data is fetch and stored would be useful
include/operations/blas3_trees.hpp
Outdated
"%ld\n", | ||
id.get_global_id(0), tile_size, tile_id, tile_local_id, tiles_per_col, | ||
tile_row, tile_col, wg_row, wg_col);*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove commented code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
include/operations/blas3_trees.hpp
Outdated
@@ -169,6 +170,303 @@ inline bool do_check<false>(bool) { | |||
return true; | |||
} | |||
|
|||
/*! | |||
* @brief NoLocalGemmFactory is a template class whose instantiations provide | |||
* different implementations of the GEMM device function where the is no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of the GEMM device function -> of the GEMM kernel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
include/operations/blas3_trees.hpp
Outdated
* line size, some values fetched will be wasted, which can | ||
* significantly reduce performance. It can be set to a | ||
* multiple of the physical cache line size. In this case, it | ||
* will significantly increase scratchpad memory usage, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean local memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
include/operations/blas3_trees.hpp
Outdated
auto A = _A.getData().get_pointer().get(); | ||
auto B = _B.getData().get_pointer().get(); | ||
auto C = _C.getData().get_pointer().get(); | ||
const auto tile_per_row = ((m - 1) / block_rows) + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
number of blocks per row?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
include/operations/blas3_trees.hpp
Outdated
value_type reg_a[item_rows]; | ||
value_type reg_b[item_cols]; | ||
|
||
/*! @brief exiting from any threads outside of the m and n boundary */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brief is only for starting a function or a method. This is just a normal comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
include/operations/blas3_trees.hpp
Outdated
* A, B and C matrices. | ||
*/ | ||
int ma = (local_item_id_row + wg_row); | ||
int na = (local_item_id_col + wg_col); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ma -> dimA_start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
: store<true>(C, reg_res, alpha, beta, ldc, ma, na, boundary_check_c); | ||
} | ||
|
||
void bind(cl::sycl::handler &h) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a doxygen comment clarifying what bind does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
Looks good to me |
No description provided.