Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CINN] preload scalar tensor for vectorize situation #71249

Merged
merged 1 commit into from
Mar 3, 2025

Conversation

ZhangX-21
Copy link
Contributor

PR Category

CINN

PR Types

Improvements

Description

Pcard-88155

Copy link

paddle-bot bot commented Feb 24, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@ZhangX-21 ZhangX-21 changed the title [CINN] prefetch load scalar tensor for vectorize situation [WIP][CINN] prefetch load scalar tensor for vectorize situation Feb 24, 2025
@ZhangX-21 ZhangX-21 force-pushed the deal_with_scalar_tensor branch 5 times, most recently from e5a5bcf to 04c3c4b Compare February 27, 2025 06:54
@ZhangX-21 ZhangX-21 changed the title [WIP][CINN] prefetch load scalar tensor for vectorize situation [CINN] preload scalar tensor for vectorize situation Feb 27, 2025
@ZhangX-21 ZhangX-21 force-pushed the deal_with_scalar_tensor branch 3 times, most recently from c25dd08 to 39e5d06 Compare February 27, 2025 09:56
zyfncg
zyfncg previously approved these changes Mar 2, 2025
return vectorize_tensors_.size() != 0 && schedule_block_can_vectorize_;
}

std::unordered_set<std::string> GetVectorizeTensors() const {
return vectorize_tensors_;
}

std::unordered_set<std::string> GetScalarTensorsWithoutVectorizeAxis() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

考虑后端的处理效率,这里返回类型建议用const &

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

ir::ir_utils::CollectIRNodesWithoutTensor(body, [&](const Expr* x) {
if (x->as_tensor() && x->as_tensor()->buffer.defined() &&
!buffer_names.count(x->as_tensor()->buffer->name) &&
utils::StartsWith(x->as_tensor()->buffer->name, "pre_load")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre_load的name是在哪个阶段引入的?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

向量化处理scalar Tensor含有vectorize axis申请了local buffer

origin:
vectorized_var_33_0.x = max((float16)((float)(vectorized_var_1.x)), var_2[(((((int)blockIdx.x * 1024) + ((int)threadIdx.x * 4)) % 200704) / 196)])
vectorized_var_33_0.y = max((float16)((float)(vectorized_var_1.y)), var_2[(((((int)blockIdx.x * 1024) + ((int)threadIdx.x * 4) + 1) % 200704) / 196)])
vectorized_var_33_0.z = max((float16)((float)(vectorized_var_1.z)), var_2[(((((int)blockIdx.x * 1024) + ((int)threadIdx.x * 4) + 2) % 200704) / 196)])
vectorized_var_33_0.w = max((float16)((float)(vectorized_var_1.w)), var_2[(((((int)blockIdx.x * 1024) + ((int)threadIdx.x * 4) + 3) % 200704) / 196)])

preload:
float16  pre_load_var_2_buffer[4];
float16* pre_load_var_2_local_0 = pre_load_var_2_buffer;
pre_load_var_2_local_0[0] = var_2[(((((int)blockIdx.x * 1024) + ((int)threadIdx.x * 4)) % 200704) / 196)];
pre_load_var_2_local_0[1] = var_2[(((((int)blockIdx.x * 1024) + ((int)threadIdx.x * 4) + 1) % 200704) / 196)];
pre_load_var_2_local_0[2] = var_2[(((((int)blockIdx.x * 1024) + ((int)threadIdx.x * 4) + 2) % 200704) / 196)];
pre_load_var_2_local_0[3] = var_2[(((((int)blockIdx.x * 1024) + ((int)threadIdx.x * 4) + 3) % 200704) / 196)];

vectorized_var_33_0.x = max((float16)((float)(vectorized_var_1.x)), pre_load_var_2_local_0[0])
vectorized_var_33_0.y = max((float16)((float)(vectorized_var_1.y)), pre_load_var_2_local_0[1])
vectorized_var_33_0.z = max((float16)((float)(vectorized_var_1.z)), pre_load_var_2_local_0[2])
vectorized_var_33_0.w = max((float16)((float)(vectorized_var_1.w)), pre_load_var_2_local_0[3])

在PreLoadScalarTensorWithVectorizeAxis申请Tensor和buffer引入pre_load name。

@zyfncg zyfncg merged commit 79bdacc into PaddlePaddle:develop Mar 3, 2025
33 checks passed
Enigmatisms pushed a commit to Enigmatisms/Paddle that referenced this pull request Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants