Skip to content

Commit

Permalink
Optimizing the performance of fused_layer_norm and top_p_sampling ope…
Browse files Browse the repository at this point in the history
…rators (PaddlePaddle#65711)

* optim fused_layer_norm and top_p_sampling

* update

* update

* update

* support hip

* fix comment

* update
  • Loading branch information
yuanlehome authored Jul 5, 2024
1 parent 49772bc commit a14bb2f
Show file tree
Hide file tree
Showing 3 changed files with 188 additions and 410 deletions.
5 changes: 4 additions & 1 deletion paddle/phi/kernels/fusion/gpu/blha_get_max_len.cu
Original file line number Diff line number Diff line change
Expand Up @@ -65,4 +65,7 @@ PD_REGISTER_KERNEL(blha_get_max_len,
ALL_LAYOUT,
phi::fusion::BlhaGetMaxLenKernel,
int,
int64_t) {}
int64_t) {
kernel->OutputAt(0).SetBackend(phi::Backend::CPU);
kernel->OutputAt(1).SetBackend(phi::Backend::CPU);
}
2 changes: 1 addition & 1 deletion paddle/phi/kernels/fusion/gpu/fused_layernorm_kernel.cu
Original file line number Diff line number Diff line change
Expand Up @@ -537,7 +537,7 @@ inline GPU(Error_t)
// Note(Zhengzekang): We choose a fixed blocksize to avoid layernorm diff, by
// RichardWooSJTU.

constexpr int block_size_conf_1 = 128;
constexpr int block_size_conf_1 = 512;

int dev = 0;
{
Expand Down
Loading

0 comments on commit a14bb2f

Please sign in to comment.