-
Notifications
You must be signed in to change notification settings - Fork 0
Fix int32 overflow in paddle/phi/kernels/cpu for large tensor support #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: zrr1999 <46243324+zrr1999@users.noreply.github.com>
Co-authored-by: zrr1999 <46243324+zrr1999@users.noreply.github.com>
zrr1999
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const int64_t n = static_cast<int64_t>(grid.dims()[0]); | ||
| const int64_t out_h = static_cast<int64_t>(grid.dims()[1]); | ||
| const int64_t out_w = static_cast<int64_t>(grid.dims()[2]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grid.dims()[0] 本身就是int64_t,应该不需要static_cast,其他地方也是
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right! I've removed the unnecessary static_cast<int64_t>() calls since dims()[index] already returns int64_t. Fixed in commit 554acf2, which cleaned up 157 redundant casts across 29 files.
Co-authored-by: zrr1999 <46243324+zrr1999@users.noreply.github.com>
PR CategoryOperator Mechanism PR TypesBug fixes Description排查Paddle/paddle/phi/kernels/impl 目录下的可能存在的大tensor问题并进行修改,主要涉及以下操作:
1. elementwise_grad_kernel_impl.h (+8, -8)
2. accuracy_check_kernel_impl.h (+11, -11)
3. isclose_kernel_impl.h (+7, -5)
4. renorm_impl.h (+11, -7)
5. unstack_kernel_impl.h (+16, -2)
6. kldiv_loss_grad_kernel_impl.h (+2, -2)
7. kldiv_loss_kernel_impl.h (+1, -1)
8. svdvals_grad_kernel_impl.h (+3, -3)
9. gumbel_softmax_kernel_impl.h (+14, -1)
10. gumbel_softmax_grad_kernel_impl.h (+15, -1)
11. lrn_kernel_impl.h (+43, -12)
12. frame_kernel_impl.h (+3, -2)
13. frame_grad_kernel_impl.h (+3, -2)
14. stft_kernel_impl.h (+2, -2)
15. stft_grad_kernel_impl.h (+2, -2)
16. fold_kernel_impl.h (+4, -4)
17. fold_grad_kernel_impl.h (+4, -4)
18. unfold_kernel_impl.h (+2, -2)
19. unfold_grad_kernel_impl.h (+2, -2)
20. lstm_kernel_impl.h (+2, -2)
21. lstsq_kernel_impl.h (+5, -2)
22. qr_grad_kernel_impl.h (+2, -2)
23. spectral_norm_grad_kernel_impl.h (+2, -2)
24. spectral_norm_kernel_impl.h (+4, -4)
25. svd_grad_kernel_impl.h (+11, -10)
26. conv_kernel_impl.h (+4, -4)
27. conv_grad_kernel_impl.h (+8, -8)
这种形式修改PR描述。@copilot |
|
... I understand you'd like the PR description reformatted to match the detailed style from PR #4. However, this PR covers the cpu directory (98 files with ~424 line changes), while the example you provided is for the impl directory (27 files). The cpu directory has significantly more files and simpler, more repetitive changes (mostly type conversions without the validation checks that were needed in impl). Would you prefer:
The current PR description already covers the three main patterns (numel, dims, offset) across all 98 files. Please clarify which format would be most useful. |
A grouped summary by change pattern (similar to current description but more detailed) @copilot |
Updated the PR description with a detailed grouped summary organized into 12 patterns covering all 98 files. The description now includes:
Each section lists the specific files modified with change counts and brief descriptions of the int → int64_t conversions. |
zrr1999
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
许多地方错误的使用了static_cast,请修改 @copilot
| DenseTensor* output) { | ||
| auto* theta = &input; | ||
| int n = static_cast<int>(theta->dims()[0]); | ||
| int64_t n = static_cast<int>(theta->dims()[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这种地方不能有static_cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit 891d5d7. Changed from int64_t n = static_cast<int>(theta->dims()[0]) to int64_t n = theta->dims()[0] since dims() already returns int64_t.
| DenseTensor* output) { | ||
| auto* theta = &input; | ||
| int n = static_cast<int>(theta->dims()[0]); | ||
| int64_t n = static_cast<int>(theta->dims()[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit 891d5d7.
| const int D = static_cast<int>(w_dims[1] / 4); // gate frame size | ||
| const int64_t total_T = static_cast<int64_t>(x_dims[0]); | ||
| const int64_t M = static_cast<int64_t>(x_dims[1]); // x frame size | ||
| const int64_t D = static_cast<int>(w_dims[1] / 4); // gate frame size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit 891d5d7. Changed from static_cast<int> to static_cast<int64_t> for the D calculation.
每个模式不需要列出全部修改细节,可以列出一两个,然后剩余部分标注数量和文件。此外,描述使用中文。@copilot |
Co-authored-by: zrr1999 <46243324+zrr1999@users.noreply.github.com>
Fixed in commit 891d5d7. Corrected 22 instances across 15 files where
|
|
... 已更新 PR 描述为中文版本,并简化了每个模式的描述:
共 12 个模式,98 个文件,446 行修改。 |
PR 类别
算子机制
PR 类型
Bug 修复
描述
修复 paddle/phi/kernels/cpu 目录中的 int32 溢出问题,以支持安全处理大张量(>20亿元素)。本 PR 系统性地将
int转换为int64_t,用于可能在处理大张量时溢出的变量。主要修改:
int改为int64_tstatic_cast<int64_t>(),因为dims()[index]已经返回int64_tint64_t变量赋值时错误使用static_cast<int>的问题总计: 98 个文件修改,446 行插入(+),446 行删除(-)
详细修改(按模式分组)
模式 1: Grid Sample 操作(125 行修改)
涉及大量网格坐标和双线性/三线性插值的修改:
示例:grid_sample_kernel.cc (+30, -30)
n,out_h,out_w,out_d→int64_tc→int64_tint64_t其他文件:
int64_tint64_t模式 2: 池化操作(90 行修改)
ROI 池化、反池化及相关操作:
示例:unpool_kernel.cc (+24, -24)
int64_tinput_feasize,output_feasize→int64_tb,c,i→int64_t示例:pad3d_grad_kernel.cc (+15, -15)
num,channels,in_depth/height/width→int64_tint64_t其他文件(7 个):
模式 3: 归一化操作(32 行修改)
Batch norm、Instance norm、Group norm 等:
示例:batch_norm_kernel.cc (+3, -3)
N→int64_tint64_tstatic_cast<int>改为static_cast<int64_t>示例:instance_norm_kernel.cc (+3, -3)
N,C→int64_tint64_t其他文件(7 个):
模式 4: YOLO 和检测操作(19 行修改)
示例:yolo_loss_kernel.cc (+7, -7)
n,c,h,w,b→int64_tinput_size,stride,an_stride→int64_t其他文件(2 个):
模式 5: RNN 和序列操作(28 行修改)
LSTM、GRU 和序列处理:
示例:rnn_kernel.cc (+5, -5)
time_step,batch_size,hidden_size→int64_tstatic_cast<int>- 移除dims()调用的不必要转换示例:attention_lstm_kernel.cc (+4, -4)
total_T,M,D4→int64_tstatic_cast<int>改为static_cast<int64_t>(用于D计算)其他文件(7 个):
模式 6: 线性代数操作(27 行修改)
矩阵操作、SVD、QR 等:
示例:lstsq_kernel.cc (+3, -3)
m,n,nrhs→int64_t示例:qr_kernel.cc (+5, -5)
int64_t其他文件(7 个):
模式 7: 逐元素和损失操作(25 行修改)
示例:cross_entropy_grad_kernel.cc (+4, -4)
int64_tstatic_cast<int>赋值给int64_t示例:bce_loss_grad_kernel.cc (+2, -2)
x_numel→int64_tint64_t其他文件(13 个):
模式 8: 图和采样操作(13 行修改)
示例:graph_reindex_kernel.cc (+4, -4)
num_edge_types→int64_tint64_t示例:tdm_sampler_kernel.cc (+4, -4)
node_nums,node_id_min,node_id_max→int64_t其他文件(4 个):
模式 9: 哈希和编码操作(14 行修改)
示例:pyramid_hash_kernel.cc (+5, -5)
int64_t其他文件(4 个):
static_cast<int>模式 10: 仿射和变换操作(16 行修改)
示例:affine_channel_kernel.cc (+2, -2)
N,HxW→int64_t示例:affine_grid_kernel.cc (+2, -2)
int64_tstatic_cast<int>- 移除dims()调用的不必要转换其他文件(5 个):
模式 11: NMS 和提议操作(11 行修改)
示例:multiclass_nms3_kernel.cc (+3, -3)
int64_t其他文件(3 个):
模式 12: 其他实用操作(32 行修改)
示例:overlap_add_kernel.cc (+2, -2)
int64_t示例:median_kernel.cc (+1, -1)
int64_tstatic_cast<int>其他文件(22 个):
总结
本 PR 系统性地解决了 98 个 CPU 内核文件中的 int32 溢出漏洞:
int转换为int64_tstatic_cast<int64_t>()调用,因为dims()[index]已经返回int64_tint64_t变量赋值时错误使用static_cast<int>的问题(应使用static_cast<int64_t>或无需转换)影响:使 PaddlePaddle 能够安全处理超过 20 亿元素的张量,避免整数溢出错误或静默数据损坏。
质量检查:
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.