Skip to content

[Release/3.4][Operator] cherry-pick mm out_dtype dynamic path#79285

Merged
sneaxiy merged 4 commits into
PaddlePaddle:release/3.4from
A-nnonymous:mm_out_dtype_release34
Jun 11, 2026
Merged

[Release/3.4][Operator] cherry-pick mm out_dtype dynamic path#79285
sneaxiy merged 4 commits into
PaddlePaddle:release/3.4from
A-nnonymous:mm_out_dtype_release34

Conversation

@A-nnonymous

@A-nnonymous A-nnonymous commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

PR Category

Operator Mechanism

PR Types

Improvements

Description

该 PR 基于 release/3.4 手动完整 cherry-pick develop PR #79252 中的 paddle.mm dynamic-only out_dtype 改动,用于替代自动 cherry-pick PR #79282 中 CI 暴露的问题。

主要内容:

  • 为动态图 paddle.mm 增加临时最小 out_dtype 支持,仅支持 CUDA 2-D BF16 x BF16 -> FP32。
  • 保持静态图/PIR 对显式 out_dtype fail closed。
  • 添加窄范围 mm_out_dtype dygraph op、InferMeta、CUDA PHI kernel 与 BF16 输入 FP32 输出 cuBLAS GEMM 路径。
  • 支持 transposed/strided RHS,通过 BF16 contiguous copy 处理,不对输入做 FP32 cast。
  • 在 release/3.4 额外补齐 convert_nptype_to_datatype_or_vartype 导出,修复自动 cherry-pick 中 test_static_out_dtype_fails_closed 遇到的 NameError。

验证说明:

  • 已确认 10 个主 cherry-pick 文件在去除 release/3.4 专属 import shim 后,patch-id 与 develop PR [Operator Mechanism] Support mm out_dtype for BF16 CUDA #79252 的 net diff 一致。
  • release/3.4 专属额外改动仅用于补齐 dtype 转换 helper/export,语义按 develop 的 DataType/VarType 分支展开,不使用简单 alias。
  • 已运行 Python 语法检查和 git diff --check

devPR:#79252 (comment)

pcard-91067

是否引起精度变化

A-nnonymous and others added 2 commits May 13, 2026 15:00
addmm incorrectly cast alpha/beta to tensor dtype (bf16/fp16) before
passing to cuBLAS, causing significant scalar precision loss
(e.g. alpha=2.9270 → bf16(2.921875), losing 0.17%).

Use MPTypeTrait<T>::Type pattern (same as baddbmm) to keep scalars in
float32 for half-precision types, matching PyTorch's opmath_type behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
wanghuancoder
wanghuancoder previously approved these changes Jun 9, 2026

@wanghuancoder wanghuancoder left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 32.35294% with 23 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/3.4@6627d52). Learn more about missing BASE report.

Files with missing lines Patch % Lines
paddle/phi/infermeta/binary.cc 0.00% 17 Missing ⚠️
python/paddle/tensor/math.py 64.70% 6 Missing ⚠️

❌ Your patch status has failed because the patch coverage (32.35%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@              Coverage Diff               @@
##             release/3.4   #79285   +/-   ##
==============================================
  Coverage               ?   32.35%           
==============================================
  Files                  ?        2           
  Lines                  ?       34           
  Branches               ?        0           
==============================================
  Hits                   ?       11           
  Misses                 ?       23           
  Partials               ?        0           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@XiaoguangHu01 XiaoguangHu01 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-10 15:29:50

📋 Review 摘要

PR 概述:为动态图 paddle.mm 增加 CUDA BF16 输入、FP32 输出的 out_dtype 最小支持,并保持静态图/PIR fail closed。
变更范围:Phi InferMeta、CUDA matmul kernel/BLAS 路径、dygraph YAML、Python API 与 legacy 单测。
影响面 Tag[Operator Mechanism] [User Experience] [Performance Optimization]

问题

未发现阻塞性问题。PR 规范问题在下面章节报,不要在这里重复。

📝 PR 规范检查

符合规范。标题包含合法 Tag,描述包含 release 分支 cherry-pick 的 develop PR 链接,且精度变化字段已填写为“否”。

总体评价

本轮按风险优先审查了新增 mm_out_dtype 的 Python 入口、dygraph YAML、InferMeta、CUDA kernel 注册、BF16 GEMM 调用链、out= 路径和新增测试。当前未形成可阻塞发现;受时间上限影响,PIR/codegen 生成产物和更宽硬件矩阵未继续展开,待后续深挖。

@sneaxiy sneaxiy merged commit d3c4748 into PaddlePaddle:release/3.4 Jun 11, 2026
198 of 211 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.