Skip to content

[Operator Mechanism]Align cuBLAS workspace size for SM10 GPUs#79373

Open
feixi139 wants to merge 4 commits into
PaddlePaddle:developfrom
feixi139:ducc-feixi139-develop-worktree
Open

[Operator Mechanism]Align cuBLAS workspace size for SM10 GPUs#79373
feixi139 wants to merge 4 commits into
PaddlePaddle:developfrom
feixi139:ducc-feixi139-develop-worktree

Conversation

@feixi139

Copy link
Copy Markdown
Contributor

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

调整 SM10 架构 GPU 上的 cuBLAS workspace size 设置,将其与 PyTorch 在 Hopper/Blackwell 架构上的行为对齐。

具体修改:

  • gpu_context.cc 中更新 GetCublasWorkspaceSize 逻辑;
  • 对 SM9 和 SM10 架构使用 32MiB cuBLAS workspace;
  • 其他架构保持原有约 8.125MiB workspace 设置不变。

该修改用于减少因 cuBLAS workspace size 不一致导致的算法选择差异,从而改善部分 matmul 场景下与 PyTorch
的数值对齐表现。

是否引起精度变化

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

wanghuancoder
wanghuancoder previously approved these changes Jun 25, 2026
@PaddlePaddle-bot

PaddlePaddle-bot commented Jun 25, 2026

Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-26 11:59:23 UTC+08:00

CI报告基于以下代码生成(30分钟更新一次):
PR commit: 8378256 | Merge base: 406c7af (branch: develop)


1 Required任务 : 46/48 通过

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
209(128) 81 78 0 2 0 1

当前 required 任务无失败,仍有 2 个运行中、0 个等待中。

任务 错误类型 置信度 日志

2 失败详情

PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-25 22:07:14

📋 Review 摘要

PR 概述:调整 SM9/SM10 及后续 CUDA 架构的 cuBLAS/cuBLASLt workspace 下限,以影响 GEMM 算法选择并对齐 PyTorch 行为
变更范围paddle/phi/backends/gpu/gpu_context.cc
影响面 Tag[Operator Mechanism] [Performance Optimization]

问题

未发现阻塞性问题。PR 规范问题在下面章节报,不要在这里重复

📝 PR 规范检查

标题缺少 Tag 后的空格,且 PR 填写“是否引起精度变化”为“是”但未给出明确验证方式。建议替换为以下可复制内容。

标题建议(可直接复制):

  • [Operator Mechanism] Align cuBLAS workspace size for SM10 GPUs
PR 描述建议(点击展开,可直接复制)
### PR Category
<!-- One of [ User Experience | Execute Infrastructure | Operator Mechanism | CINN | Custom Device | Performance Optimization | Distributed Strategy | Parameter Server | Communication Library | Auto Parallel | Inference | Environment Adaptation ] -->
Operator Mechanism

### PR Types
<!-- One of [ New features | Bug fixes | Improvements | Performance | BC Breaking | Deprecations | Docs | Devs | Not User Facing | Security | Others ] -->
Bug fixes

### Description
<!-- Describe what you’ve done -->
调整 SM10 架构 GPU 上的 cuBLAS workspace size 设置,将其与 PyTorch 在 Hopper/Blackwell 架构上的行为对齐。

具体修改:
-`gpu_context.cc` 中更新 `GetCublasWorkspaceSize` 逻辑;
- 对 SM9 和 SM10 架构使用 32MiB cuBLAS workspace;
- 其他架构保持原有约 8.125MiB workspace 设置不变。

影响范围:
- CUDA 非 Windows 路径下 `GPUContext` 初始化 cuBLAS handle workspace;
- 可能影响 Hopper/SM10 GPU 上 matmul 的 cuBLAS 算法选择与数值结果。

验证方式:
- N/A(当前 PR 描述未提供具体验证命令或精度对比数据)

### 是否引起精度变化
<!-- one of the following [ 是 | 否 ]-->
是。精度变化来源于 SM9/SM10 GPU 上 cuBLAS workspace size 从约 8.125MiB 调整为 32MiB 后,cuBLAS 可能选择不同 matmul 算法;影响范围为使用 cuBLAS GEMM/matmul 的 CUDA 非 Windows 路径;验证方式为 N/A(当前 PR 描述未提供具体验证命令或精度对比数据)。

总体评价

本轮基于 PR diff、Paddle checklist/architecture 和 gpu_context.cc 相关调用链审查,未确认到需要阻塞的资源生命周期、设备 fallback 或整数溢出问题。PR 标题和精度变化验证说明仍沿用历史未解决的规范建议,请在合入前补齐.

@codecov-commenter

codecov-commenter commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@406c7af). Learn more about missing BASE report.

Files with missing lines Patch % Lines
paddle/phi/backends/gpu/gpu_context.cc 60.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop   #79373   +/-   ##
==========================================
  Coverage           ?   60.00%           
==========================================
  Files              ?        1           
  Lines              ?        5           
  Branches           ?        0           
==========================================
  Hits               ?        3           
  Misses             ?        2           
  Partials           ?        0           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@paddle-bot paddle-bot Bot added the contributor External developers label Jun 25, 2026

@sneaxiy sneaxiy left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for coverage due to lack of SM90+ GPUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants