Skip to content

[Operator Mechanism] Support mm out_dtype for BF16 CUDA#79252

Merged
A-nnonymous merged 10 commits into
PaddlePaddle:developfrom
A-nnonymous:mm_out_dtype_bf16_fp32
Jun 9, 2026
Merged

[Operator Mechanism] Support mm out_dtype for BF16 CUDA#79252
A-nnonymous merged 10 commits into
PaddlePaddle:developfrom
A-nnonymous:mm_out_dtype_bf16_fp32

Conversation

@A-nnonymous

Copy link
Copy Markdown
Contributor

PR Category

Operator Mechanism

PR Types

New features

Description

Temporary add a narrow CUDA BF16 x BF16 -> FP32 path for paddle.mm(out_dtype=paddle.float32), including schema, infermeta, stride dispatch, fused cuBLAS GEMM, and focused tests.

pcard-91067

是否引起精度变化

Add a narrow CUDA BF16 x BF16 -> FP32 path for paddle.mm(out_dtype=paddle.float32), including schema, infermeta, stride dispatch, fused cuBLAS GEMM, and focused tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot

PaddlePaddle-bot commented Jun 4, 2026

Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-09 02:53:51 UTC+08:00

CI报告基于以下代码生成(30分钟更新一次):
PR commit: 11579f8 | Merge base: 8df3d8a (branch: develop)


1 Required任务 : 41/48 通过

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
89(9) 80 73 3 1 0 0
任务 错误类型 置信度 日志
Static-Check / Test PR问题:API approval 未同步 Job
Coverage test PR问题:新增代码覆盖率不足 Job
Check approval 需要 Approval Job

2 失败详情

🔴 Static-Check / Test — PR问题(置信度: 高)

错误类型: PR问题 | 置信度: 高
分析器: 通用分析(fallback)
失败用例: API approval 检查

用例 错误摘要
tools/check_api_approvals.sh paddle.Tensor.mm 公开签名新增 out_dtype,API approval 未更新

关键日志:

API Difference is:
- paddle.Tensor.mm(... kwonlyargs=['out'] ...)
+ paddle.Tensor.mm(... kwonlyargs=['out_dtype', 'out'] ...)
##[error]Process completed with exit code 6.
  • 根因摘要: paddle.Tensor.mm API 签名变更未审批
    PR 在 python/paddle/tensor/math.pymm 新增 out_dtype keyword,导致 API 签名 hash 变化。静态检查要求同步 API approval,否则按未审批公开 API 变更失败。

修复建议:

  1. 按 Paddle API approval 流程更新对应 approval 基线,或确认该公开签名变更无需保留后调整实现。
  2. 关联文件:python/paddle/tensor/math.py

关联变更: python/paddle/tensor/math.py 新增 out_dtype 参数。

🔴 Coverage test — PR问题(置信度: 高)

错误类型: PR问题 | 置信度: 高
分析器: 通用分析(fallback)
失败用例: 覆盖率门禁

用例 错误摘要
Assert Diff Coverage, Assert Python Diff Coverage C++ diff coverage 0.0%、Python diff coverage 64.7%,低于 90% 阈值

关键日志:

Assert Diff Coverage
expected >= 90.0 %, actual 0.0 %, failed
Assert Python Diff Coverage
expected >= 90.0 %, actual 64.7 %, failed
Coverage check failed, unit tests have all passed, please do not rerun
  • 根因摘要: 新增 mm out_dtype 分支覆盖率不足
    PR 新增 MmOutDtypeInferMetamm_out_dtype CUDA kernel 路径和 python/paddle/tensor/math.pyout_dtype 分支,但覆盖率环境未覆盖这些新增行。日志中的 test_enable_cinn_kernel_cache 初次失败后 rerun 通过,最终失败点是 coverage diff 门禁,不是该 CINN 用例。

修复建议:

  1. python/paddle/tensor/math.pyout_dtype 参数校验分支补充不依赖 SM80 BF16 成功路径的覆盖用例。
  2. paddle/phi/infermeta/binary.ccMmOutDtypeInferMeta 等新增 C++ 逻辑补充可在 coverage 环境执行的覆盖。
  3. 关联文件:paddle/phi/infermeta/binary.ccpython/paddle/tensor/math.pytest/legacy_test/test_mm_out.py

关联变更: 新增 mm BF16 -> FP32 out_dtype schema、infermeta、CUDA kernel 和 Python API 分支。

🔴 Check approval — 需要 Approval(置信度: 高)

错误类型: 需要 Approval | 置信度: 高
分析器: builtin
失败用例: Approval 检查

用例 错误摘要
Approval 该 Job 需要人工 Approval,完成审批后 CI 才会继续执行

关键日志:

Process completed with exit code 6.
  • 根因摘要: CI 需要人工 Approval
    该任务属于 approval 阻塞,不是代码编译或测试失败。

修复建议:

  1. 请通过人工审批。

关联变更: 无。

Use the canonical matmul path for static mm out_dtype handling and keep legacy compatibility attrs limited to supported types.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PaddlePaddle-bot

This comment was marked as outdated.

Keep matmul compatible with unknown symbolic dimensions and legacy matmul_v2 to PIR translation when out_dtype is unset.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PaddlePaddle-bot

This comment was marked as outdated.

Preserve the legacy static mm path when out_dtype is unset and avoid rejecting unknown symbolic matmul dimensions during InferMeta.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PaddlePaddle-bot

This comment was marked as outdated.

Allow the explicit static out_dtype path to pass BF16 variables through Python validation and feed BF16 static test data using the existing uint16 encoding helper.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PaddlePaddle-bot

This comment was marked as outdated.

Add missing default/propagated out_dtype handling for legacy matmul translation, PIR serialization compatibility, and handwritten PIR/DRR matmul rewrites.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PaddlePaddle-bot

This comment was marked as outdated.

Avoid fusing explicit matmul out_dtype paths in PIR rewrite passes, document BF16 GEMM lda/ldb narrowing safety, and add a legacy matmul_v2 translator regression.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PaddlePaddle-bot

This comment was marked as outdated.

Route static mm out_dtype through matmul_v2 so it reaches the phi matmul kernel, preserve user-provided out tensors, and let legacy matmul_v2 fusion pass compatibility accept only missing/default out_dtype while rejecting explicit output dtype paths.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@A-nnonymous

Copy link
Copy Markdown
Contributor Author

/re-run all-failed

PaddlePaddle-bot

This comment was marked as outdated.

A-nnonymous and others added 2 commits June 8, 2026 17:02
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-08 22:39:26

📋 Review 摘要

PR 概述:为 paddle.mm(out_dtype=paddle.float32) 增加 CUDA BF16 x BF16 -> FP32 前向路径。
变更范围paddle.mm Python API、dygraph op YAML、InferMeta、GPU matmul kernel、cuBLAS BF16 GEMM、legacy 单测。
影响面 Tag[Operator Mechanism] [User Experience]

问题

级别 文件 概述
🔴 Bug paddle/phi/ops/yaml/inconsistent/dygraph_ops.yaml:297 paddle.mm(out_dtype=...) 分支没有生成 backward,训练场景会丢失梯度

历史 Findings 修复情况

Finding 问题 状态
F1 paddle.mm(..., out_dtype=paddle.float32) 在旧静态图落到 matmul_v2 ✅ 已修复

📝 PR 规范检查

符合规范。PR Category、PR Types、Description、精度变化字段均已填写。

总体评价

前向 CUDA BF16 GEMM 路径、InferMeta 和注册链路基本对齐,历史静态图误落 matmul_v2 的问题也已通过显式 NotImplementedError 关闭。但当前公开 API 分支会失去 mm 原有的一阶梯度能力,需要补齐 backward 或在 Python 入口明确禁止需要梯度的调用后再合入。

Comment thread paddle/phi/ops/yaml/inconsistent/dygraph_ops.yaml
@codecov-commenter

codecov-commenter commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 32.35294% with 23 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@8df3d8a). Learn more about missing BASE report.

Files with missing lines Patch % Lines
paddle/phi/infermeta/binary.cc 0.00% 17 Missing ⚠️
python/paddle/tensor/math.py 64.70% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop   #79252   +/-   ##
==========================================
  Coverage           ?   32.35%           
==========================================
  Files              ?        2           
  Lines              ?       34           
  Branches           ?        0           
==========================================
  Hits               ?       11           
  Misses             ?       23           
  Partials           ?        0           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@XiaoguangHu01 XiaoguangHu01 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wanghuancoder wanghuancoder left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sneaxiy sneaxiy left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@A-nnonymous A-nnonymous merged commit 82c04f9 into PaddlePaddle:develop Jun 9, 2026
174 of 190 checks passed
@risemeup1111

Copy link
Copy Markdown
Contributor

✅ Cherry-pick successful! Created PR: #79282

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.