[Release/3.4][Operator] cherry-pick mm out_dtype dynamic path by A-nnonymous · Pull Request #79285 · PaddlePaddle/Paddle

A-nnonymous · 2026-06-09T09:24:20Z

PR Category

Operator Mechanism

PR Types

Improvements

Description

该 PR 基于 release/3.4 手动完整 cherry-pick develop PR #79252 中的 paddle.mm dynamic-only out_dtype 改动，用于替代自动 cherry-pick PR #79282 中 CI 暴露的问题。

主要内容：

为动态图 paddle.mm 增加临时最小 out_dtype 支持，仅支持 CUDA 2-D BF16 x BF16 -> FP32。
保持静态图/PIR 对显式 out_dtype fail closed。
添加窄范围 mm_out_dtype dygraph op、InferMeta、CUDA PHI kernel 与 BF16 输入 FP32 输出 cuBLAS GEMM 路径。
支持 transposed/strided RHS，通过 BF16 contiguous copy 处理，不对输入做 FP32 cast。
在 release/3.4 额外补齐 convert_nptype_to_datatype_or_vartype 导出，修复自动 cherry-pick 中 test_static_out_dtype_fails_closed 遇到的 NameError。

验证说明：

已确认 10 个主 cherry-pick 文件在去除 release/3.4 专属 import shim 后，patch-id 与 develop PR [Operator Mechanism] Support mm out_dtype for BF16 CUDA #79252 的 net diff 一致。
release/3.4 专属额外改动仅用于补齐 dtype 转换 helper/export，语义按 develop 的 DataType/VarType 分支展开，不使用简单 alias。
已运行 Python 语法检查和 git diff --check。

devPR:#79252 (comment)

pcard-91067

是否引起精度变化

否

addmm incorrectly cast alpha/beta to tensor dtype (bf16/fp16) before passing to cuBLAS, causing significant scalar precision loss (e.g. alpha=2.9270 → bf16(2.921875), losing 0.17%). Use MPTypeTrait<T>::Type pattern (same as baddbmm) to keep scalars in float32 for half-precision types, matching PyTorch's opmath_type behavior. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

wanghuancoder

LGTM

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

codecov-commenter · 2026-06-09T17:47:46Z

Codecov Report

❌ Patch coverage is 32.35294% with 23 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/3.4@6627d52). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
paddle/phi/infermeta/binary.cc	0.00%	17 Missing ⚠️
python/paddle/tensor/math.py	64.70%	6 Missing ⚠️

❌ Your patch status has failed because the patch coverage (32.35%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@              Coverage Diff               @@
##             release/3.4   #79285   +/-   ##
==============================================
  Coverage               ?   32.35%           
==============================================
  Files                  ?        2           
  Lines                  ?       34           
  Branches               ?        0           
==============================================
  Hits                   ?       11           
  Misses                 ?       23           
  Partials               ?        0

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

XiaoguangHu01

LGTM

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-06-10 15:29:50

📋 Review 摘要

PR 概述：为动态图 paddle.mm 增加 CUDA BF16 输入、FP32 输出的 out_dtype 最小支持，并保持静态图/PIR fail closed。
变更范围：Phi InferMeta、CUDA matmul kernel/BLAS 路径、dygraph YAML、Python API 与 legacy 单测。
影响面 Tag：[Operator Mechanism] [User Experience] [Performance Optimization]

问题

未发现阻塞性问题。PR 规范问题在下面章节报,不要在这里重复。

📝 PR 规范检查

符合规范。标题包含合法 Tag，描述包含 release 分支 cherry-pick 的 develop PR 链接，且精度变化字段已填写为“否”。

总体评价

本轮按风险优先审查了新增 mm_out_dtype 的 Python 入口、dygraph YAML、InferMeta、CUDA kernel 注册、BF16 GEMM 调用链、out= 路径和新增测试。当前未形成可阻塞发现；受时间上限影响，PIR/codegen 生成产物和更宽硬件矩阵未继续展开，待后续深挖。

A-nnonymous and others added 2 commits May 13, 2026 15:00

[Operator Mechanism] Cherry-pick mm out_dtype to release 3.4

4006a6f

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A-nnonymous requested review from XiaoguangHu01, Xreki, qili93 and zhangbo9674 as code owners June 9, 2026 09:24

Merge branch 'release/3.4' into mm_out_dtype_release34

9a7ae67

wanghuancoder previously approved these changes Jun 9, 2026

View reviewed changes

[Operator Mechanism] Use release dtype conversion for mm out_dtype

2f4528c

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A-nnonymous dismissed wanghuancoder’s stale review via 2f4528c June 9, 2026 13:38

SigureMo approved these changes Jun 9, 2026

View reviewed changes

XieYunshen added the skip-ci: coverage label Jun 10, 2026

XiaoguangHu01 approved these changes Jun 10, 2026

View reviewed changes

swgu98 added the skip-ci: Doc-Preview label Jun 10, 2026

zrr1999 approved these changes Jun 10, 2026

View reviewed changes

PaddlePaddle-bot reviewed Jun 10, 2026

View reviewed changes

zhangbo9674 approved these changes Jun 10, 2026

View reviewed changes

wanghuancoder approved these changes Jun 11, 2026

View reviewed changes

sneaxiy approved these changes Jun 11, 2026

View reviewed changes

sneaxiy merged commit d3c4748 into PaddlePaddle:release/3.4 Jun 11, 2026
198 of 211 checks passed

ShigureNyako mentioned this pull request Jun 14, 2026

[release/3.4][Operator Mechanism] Support mm out_dtype for BF16 CUDA #79282

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Release/3.4][Operator] cherry-pick mm out_dtype dynamic path#79285

[Release/3.4][Operator] cherry-pick mm out_dtype dynamic path#79285
sneaxiy merged 4 commits into
PaddlePaddle:release/3.4from
A-nnonymous:mm_out_dtype_release34

A-nnonymous commented Jun 9, 2026 •

edited

Loading

Uh oh!

wanghuancoder left a comment

Uh oh!

codecov-commenter commented Jun 9, 2026

Uh oh!

XiaoguangHu01 left a comment

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Conversation

A-nnonymous commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

是否引起精度变化

Uh oh!

wanghuancoder left a comment

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jun 9, 2026

Codecov Report

Uh oh!

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

总体评价

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

A-nnonymous commented Jun 9, 2026 •

edited

Loading