Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

New implementation of GEMM without using local memory #80

Merged

Conversation

mehdi-goli
Copy link
Collaborator

No description provided.

@Ruyk Ruyk changed the title Adding no local mem gemm New implementation of GEMM without using local memory Oct 5, 2018
Copy link
Contributor

@Ruyk Ruyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments. In general, some more clarifications on the algorithm would be useful. A README explaining how data is fetch and stored would be useful

"%ld\n",
id.get_global_id(0), tile_size, tile_id, tile_local_id, tiles_per_col,
tile_row, tile_col, wg_row, wg_col);*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented code

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -169,6 +170,303 @@ inline bool do_check<false>(bool) {
return true;
}

/*!
* @brief NoLocalGemmFactory is a template class whose instantiations provide
* different implementations of the GEMM device function where the is no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of the GEMM device function -> of the GEMM kernel

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

* line size, some values fetched will be wasted, which can
* significantly reduce performance. It can be set to a
* multiple of the physical cache line size. In this case, it
* will significantly increase scratchpad memory usage, but
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean local memory

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

auto A = _A.getData().get_pointer().get();
auto B = _B.getData().get_pointer().get();
auto C = _C.getData().get_pointer().get();
const auto tile_per_row = ((m - 1) / block_rows) + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

number of blocks per row?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

value_type reg_a[item_rows];
value_type reg_b[item_cols];

/*! @brief exiting from any threads outside of the m and n boundary */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brief is only for starting a function or a method. This is just a normal comment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

* A, B and C matrices.
*/
int ma = (local_item_id_row + wg_row);
int na = (local_item_id_col + wg_col);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ma -> dimA_start?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

: store<true>(C, reg_res, alpha, beta, ldc, ma, na, boundary_check_c);
}

void bind(cl::sycl::handler &h) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a doxygen comment clarifying what bind does

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@AdamBrouwersHarries
Copy link
Contributor

Looks good to me

@AdamBrouwersHarries AdamBrouwersHarries self-requested a review October 5, 2018 12:48
@mehdi-goli mehdi-goli merged commit 003c432 into codeplaysoftware:master Oct 5, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants