Skip to content

Commit a14bb2f

Browse files
authored
Optimizing the performance of fused_layer_norm and top_p_sampling operators (#65711)
* optim fused_layer_norm and top_p_sampling * update * update * update * support hip * fix comment * update
1 parent 49772bc commit a14bb2f

File tree

3 files changed

+188
-410
lines changed

3 files changed

+188
-410
lines changed

paddle/phi/kernels/fusion/gpu/blha_get_max_len.cu

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,4 +65,7 @@ PD_REGISTER_KERNEL(blha_get_max_len,
6565
ALL_LAYOUT,
6666
phi::fusion::BlhaGetMaxLenKernel,
6767
int,
68-
int64_t) {}
68+
int64_t) {
69+
kernel->OutputAt(0).SetBackend(phi::Backend::CPU);
70+
kernel->OutputAt(1).SetBackend(phi::Backend::CPU);
71+
}

paddle/phi/kernels/fusion/gpu/fused_layernorm_kernel.cu

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -537,7 +537,7 @@ inline GPU(Error_t)
537537
// Note(Zhengzekang): We choose a fixed blocksize to avoid layernorm diff, by
538538
// RichardWooSJTU.
539539

540-
constexpr int block_size_conf_1 = 128;
540+
constexpr int block_size_conf_1 = 512;
541541

542542
int dev = 0;
543543
{

0 commit comments

Comments
 (0)