Skip to content

Commit 85a91d5

Browse files
sryapfacebook-github-bot
authored andcommitted
Enable subwarp only for unweighted (#2051)
Summary: Pull Request resolved: #2051 Disable the subwarp optimization for weighted TBE - The cause of the NE issue has to be investigated. - This does not affect the overall E2E performance much since the majority of TBEs in the model is unweighted Reviewed By: jasonjk-park Differential Revision: D49159119 fbshipit-source-id: 805639d94d4ce7b3be8f275db4dfd0ecc95a539a
1 parent 3146ef5 commit 85a91d5

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

fbgemm_gpu/codegen/embedding_forward_split_kernel_v2_template.cu

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -940,6 +940,7 @@ __global__ void split_embedding_codegen_forward_{{ wdesc }}_v2_kernel(
940940

941941
// Tail warp
942942
// STEP_MASK computation assumes STEP = 4
943+
{% if not weighted %}
943944
if (load_D - load_d < kWarpSize) {
944945
const auto tail_warp_size = load_D % kWarpSize;
945946
if (tail_warp_size <= 8) {
@@ -955,6 +956,9 @@ __global__ void split_embedding_codegen_forward_{{ wdesc }}_v2_kernel(
955956
else {
956957
INVOKE_PROCESS_ALL_INDICES(large_Ls, 32, 0xf)
957958
}
959+
{% else %}
960+
INVOKE_PROCESS_ALL_INDICES(large_Ls, 32, 0xf)
961+
{% endif %}
958962

959963
#undef INVOKE_PROCESS_ALL_INDICES_HELPER
960964
#undef INVOKE_PROCESS_ALL_INDICES

0 commit comments

Comments
 (0)