Skip to content

Commit 75136e4

Browse files
committed
optimize fp8 tile sizes for headdim 64 for faster fp8 decoding
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
1 parent 5714f9d commit 75136e4

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

hopper/tile_size.h

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,10 @@ constexpr std::tuple<int, int, bool, bool> tile_size_fwd_sm90(
5252
}
5353
} else {
5454
if (headdim <= 64) {
55-
return {192, 160, true, true};
55+
if (use_one_mma_wg) {
56+
return {64, 128, true, true};
57+
} else {
58+
return {192, 160, true, true};
5659
} else if (headdim <= 96) {
5760
return {192, 128, true, true};
5861
} else if (headdim <= 128) {

0 commit comments

Comments
 (0)