[Precision Depth Alignment] Add support for CUDNN to paddle.nn.functional.grid_sample to align with torch accuracy. #75355

zhengshengning · 2025-09-18T07:34:15Z

PR Category

Operator Mechanism

PR Types

New features

Description

背景说明

在 padding_mode == zeros 且 mode == bilinear 且 align_corners == true 的配置下，Paddle 的原始实现与 PyTorch 存在一定数值差异。通过调用 cuDNN 提供的 Spatial Transformer API，可以在该配置下与 PyTorch 实现精度对齐。

该逻辑通过 condCudnnGridSampler<T> 进行适配判断，确保仅在支持的设备和数据类型（如 float / double）下调用 cuDNN。

本 PR 对 grid_sample 在 GPU 上的实现进行了优化与精度对齐处理。当满足以下特定条件时：

padding_mode == zeros
mode == bilinear
align_corners == true
condCudnnGridSampler<T>(x, grid) == true（即当前环境支持调用 cuDNN 的 grid sampler）

将启用 cuDNN 的 cudnnSpatialTfSamplerForward 和 cudnnSpatialTfSamplerBackward 接口来计算前向和反向传播，以提高性能并与 PyTorch 在该配置下的行为保持数值一致性。

主要改动

✅ 前向部分

在 phi::grid_sample 前向实现中增加判断逻辑，当满足上述条件时：

创建 cuDNN tensor 描述符 x_desc 与 y_desc
创建 cuDNN 空间变换描述符 st_desc
调用 cudnnSpatialTfSamplerForward 实现 bilinear 采样
自动释放相关资源（x_desc / y_desc / st_desc）

✅ 反向部分

在反向传播中增加相应逻辑：

创建 x_desc / dx_desc / y_desc 描述符
创建空间变换描述符 st_desc
调用 cudnnSpatialTfSamplerBackward 计算 dx 和 dgrid
自动释放资源

测试

【前向】：与torch完全对齐。(195个case)
【反向】：存在atomicAdd，在并行计算时，原子操作先后顺序不同导致了结果的随机性。（torch运行两次，都无法对齐）

pcard-67164

paddle-bot · 2025-09-18T07:34:24Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

wanghuancoder

LGTM

zrr1999

LGTM

A-nnonymous

possible type related issues

A-nnonymous · 2025-09-19T08:17:00Z

paddle/phi/kernels/gpu/grid_sample_grad_kernel.cu

+                                                 cudnn_dtype,
+                                                 static_cast<int>(N),
+                                                 static_cast<int>(C),
+                                                 static_cast<int>(H_in),


这里的NCHW，是否涉及到类型收窄？前面声明时采用的可是int64_t，超过2147483647时怎么办？

要求加一个溢出判断，或者将声明时的数据类型直接限制为int32_t

前面 if 判断中使用了 canUse32bitIndexMath 进行了 int32 的限制，超过int32 将不会进入该分支中，所以不会出现溢出的情况。

A-nnonymous · 2025-09-22T02:35:28Z

paddle/phi/kernels/gpu/grid_sample_grad_kernel.cu

+                                                 cudnn_dtype,
+                                                 static_cast<int>(N),
+                                                 static_cast<int>(C),
+                                                 static_cast<int>(H_in),


…onal.grid_sample to align with torch accuracy. (PaddlePaddle#75355) * accuracy_stable_grid_sample * fix

* CallScalarFunction uses the dtype of 'self' as the type of 'other' when opotype is 'div'(#75237) * LinspaceKernel uses the dtype of 'self' as the type of 'step' when tensor is floating (#75238) * align LinspaceKernel * update meta * update gpu kernel * fix LinspaceKernelInner * improve kernel * fix CudaSigmoidGradFunctor and CudaSiluGradFunctor (#75341) * Softplus accuracy and torch alignment 1 (#75363) * [Precision Depth Alignment] paddle.tan reverse calculation: dx = dout *(1 + tan(x)^2) (#75335) * Tan reverse calculation: dx = dout *(1 + tan(x)^2) * [Precision Depth Alignment] Add support for CUDNN to paddle.nn.functional.grid_sample to align with torch accuracy. (#75355) * accuracy_stable_grid_sample * fix * correlation supports big tensor (#75383) * fix * fix test * fix * paddle.tanh Grad and torch alignment (float16) (#75454) * [Precision Depth Alignment] paddle.sin and paddle.cos aligns with torch precision. (#75503) * accuracy_stable_sin * accuracy_stable_cos * [深度对齐]Divide (#75379) * fix * fix * fix * fix * fix * [Precision Depth Alignment] fix precision for float16 of paddle.tan backward (#75525) * fix precision for float16 of paddle.tan backward * fix else branch of CudaTanGradFunctor * [Precision Depth Alignment] fix precision for paddle.expm1 (#75549) * accuracy_stable_expm1 * fix * Bigtensor排查修复[Paddle/paddle/phi/kernels/funcs] (#75523) * fix * fix * [Precision Depth Alignment] fix beta and threshold of paddle.nn.functional.softplus to double (#75426) * fix beta and threshold of Softplus to double * fix test_softplus_activation_fuse_pass v1 * fix test_activation_zero * fix flaot of SoftplusDoubleGradKernel to double * add op_patches for softplus * add yaml for ops/yaml/legacy * fix infershape/operator for FLOAT64 * fix * add SoftPlusOpTranscriber * fix * fix * fix1 * fix2 * fix coverage * fix coverage2 * fix (#75605) * [深度对齐] dot (#75717) * fix * fix * fix dcu * [Precision Depth Alignment] paddle.log aligns with torch precision (#75799) * accuracy_stable_log * accuracy_stable_log * fix * fix * fix * fix * fix5 * [Precision Depth Alignment] fix eps of paddle.logit from float to double (#75816) * accuracy_stable_logit * add LogitOpTranscriber * fix coverage * fix 0yaml * [Precision Depth Alignment] paddle.log_sigmoid (#75898) * accuracy_stable_log_sigmoid * fix test_activation_stride_op.py * [Precision Depth Alignment] Modify the negative_slope parameter of the paddle.nn.functional.leaky_relu API to double (#75547) * [big tensor] Paddle/paddle/phi/kernels/funcs gpuBigtensor (#75856) * fix funcs * gpu * fix * fix * 修改PADDLE_ENFORCE信息 * fix cpu error * fix dcu * fix dcu * fix * [Fix] log sigmoid complex (#75953) * feature: Add specialized LogSigmoidFunctor and CudaLogSigmoidFunctor for complex numbers This commit introduces specialized implementations of LogSigmoidFunctor and CudaLogSigmoidFunctor to handle complex number inputs. The new implementations utilize direct formulas for improved accuracy and stability in calculations involving complex types. * refactor: Optimize LogSigmoidFunctor and CudaLogSigmoidFunctor for complex types by caching exp(-x) to reduce redundant computations. This change enhances performance while maintaining accuracy in calculations. * refactor: modified the formula in LogSigmoidFunctor to make it numerical stable --------- Co-authored-by: Zhan Rongrui <46243324+zrr1999@users.noreply.github.com> Co-authored-by: 正在学习 <62892980+cszdrg@users.noreply.github.com> Co-authored-by: Bvicii <98971614+scyyh11@users.noreply.github.com>

accuracy_stable_grid_sample

b30f007

fix

207121e

zhengshengning requested a review from jiahy0825 September 19, 2025 03:25

wanghuancoder approved these changes Sep 19, 2025

View reviewed changes

zrr1999 requested a review from Copilot September 19, 2025 07:03

This comment was marked as resolved.

Sign in to view

zrr1999 approved these changes Sep 19, 2025

View reviewed changes

A-nnonymous suggested changes Sep 19, 2025

View reviewed changes

A-nnonymous approved these changes Sep 22, 2025

View reviewed changes

zhengshengning merged commit ae06f38 into PaddlePaddle:develop Sep 22, 2025
81 of 85 checks passed

zhengshengning added a commit to zhengshengning/Paddle that referenced this pull request Oct 24, 2025

[Precision Depth Alignment] Add support for CUDNN to paddle.nn.functi…

17a77d4

…onal.grid_sample to align with torch accuracy. (PaddlePaddle#75355) * accuracy_stable_grid_sample * fix

wanghuancoder mentioned this pull request Oct 24, 2025

[Cherry-pick Fleety_12] Bigtensor and api precision #76023

Closed

zhengshengning added a commit to zhengshengning/Paddle that referenced this pull request Oct 24, 2025

[Precision Depth Alignment] Add support for CUDNN to paddle.nn.functi…

ed270cd

…onal.grid_sample to align with torch accuracy. (PaddlePaddle#75355) * accuracy_stable_grid_sample * fix

zhengshengning mentioned this pull request Oct 24, 2025

[Cherry-pick Fleety_12] Bigtensor and api precision #76028

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Precision Depth Alignment] Add support for CUDNN to paddle.nn.functional.grid_sample to align with torch accuracy. #75355

[Precision Depth Alignment] Add support for CUDNN to paddle.nn.functional.grid_sample to align with torch accuracy. #75355

Uh oh!

zhengshengning commented Sep 18, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Sep 18, 2025

Uh oh!

wanghuancoder left a comment

Uh oh!

This comment was marked as resolved.

Uh oh!

zrr1999 left a comment

Uh oh!

A-nnonymous left a comment

Uh oh!

A-nnonymous Sep 19, 2025

Uh oh!

A-nnonymous Sep 19, 2025

Uh oh!

zhengshengning Sep 21, 2025

Uh oh!

A-nnonymous Sep 22, 2025

Uh oh!

A-nnonymous Sep 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Precision Depth Alignment] Add support for CUDNN to paddle.nn.functional.grid_sample to align with torch accuracy. #75355

[Precision Depth Alignment] Add support for CUDNN to paddle.nn.functional.grid_sample to align with torch accuracy. #75355

Uh oh!

Conversation

zhengshengning commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

背景说明

主要改动

✅ 前向部分

✅ 反向部分

测试

Uh oh!

paddle-bot bot commented Sep 18, 2025

Uh oh!

wanghuancoder left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

zrr1999 left a comment

Choose a reason for hiding this comment

Uh oh!

A-nnonymous left a comment

Choose a reason for hiding this comment

Uh oh!

A-nnonymous Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

A-nnonymous Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

zhengshengning Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

A-nnonymous Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

A-nnonymous Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zhengshengning commented Sep 18, 2025 •

edited

Loading