Skip to content

[API Compatibility] Add alias for SGD, Adagrad, AdamW#79284

Closed
algorithm1832 wants to merge 15 commits into
PaddlePaddle:developfrom
algorithm1832:improvement_optim_1
Closed

[API Compatibility] Add alias for SGD, Adagrad, AdamW#79284
algorithm1832 wants to merge 15 commits into
PaddlePaddle:developfrom
algorithm1832:improvement_optim_1

Conversation

@algorithm1832

Copy link
Copy Markdown
Contributor

PR Category

User Experience

PR Types

Improvements

Description

  • Update api alias with new params
  • Add tests
  • Remove old tests

Used AI Studio

是否引起精度变化

@zhwesky2010

Copy link
Copy Markdown
Contributor

这个冲突了。其他PR做了alias精简处理,可能要重新处理下别名这块,必要的话重新新增文件。

@paddle-bot paddle-bot Bot added the contributor External developers label Jun 9, 2026
PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@codecov-commenter

codecov-commenter commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.56098% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@4ce8ac4). Learn more about missing BASE report.

Files with missing lines Patch % Lines
python/paddle/optim/adagrad.py 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop   #79284   +/-   ##
==========================================
  Coverage           ?   97.56%           
==========================================
  Files              ?        5           
  Lines              ?       41           
  Branches           ?        0           
==========================================
  Hits               ?       40           
  Misses             ?        1           
  Partials           ?        0           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread python/paddle/optim/__init__.py Outdated
from . import lr_scheduler # noqa: F401


class Adagrad(PaddleAdagrad):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个会使paddle.optim.adagrad.Adagrad调用成原来的吧,需要新增文件

Comment thread python/paddle/optim/__init__.py Outdated
)


class SGD(PaddleSGD):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddle.optim.sgd.SGD -> paddle.optimizer.sgd.SGD(旧SGD)

paddle.optim.SGD 新SGD

PaddlePaddle-bot

This comment was marked as outdated.

Comment thread python/paddle/optim/sgd.py Outdated
weight_decay: float | Tensor = 0,
nesterov: bool = False,
) -> None:
warnings.warn(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

传入了再warning吧,不要直接打warning。误以为这是个问题API

Comment thread python/paddle/optim/adagrad.py Outdated
foreach: bool | None = None,
) -> None:
warnings.warn(
"lr_decay, foreach are currently not supported in Adagrad and will be ignored. "

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

传入了再warning吧,不要直接打warning。误以为这是个问题API

lr_decay上次讨论的是比较容易实现?

PaddlePaddle-bot

This comment was marked as outdated.

@zhwesky2010 zhwesky2010 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看下上次讨论的工作完成了没,是否遇到什么困难

grad_clip: GradientClipBase | None = None,
name: str | None = None,
initial_accumulator_value: float = 0.0,
lr_decay: float = 0.0,

@zhwesky2010 zhwesky2010 Jun 14, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里有一个简单的实现,lr_decay可以通过LRScheduler来实现,这个会封装好global_step这些变量。

另外上次讨论的一些容易实现的参数,实现了没?maximize你在取lr时再取下反。default实现了吗

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maximize目前是准备在基类的step方法中取梯度的时候给梯度取反

defaults准备等maximize实现之后再支持

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个lr_decay公式代表的应该是InverseTimeDecay,我直接作为成员进行初始化,然后在需要的时候get_lr()应该就可以

@PaddlePaddle-bot

PaddlePaddle-bot commented Jun 15, 2026

Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-17 19:07:27

CI报告基于以下代码生成(30分钟更新一次):
PR commit: c99f202 | Merge base: 4ce8ac4 (branch: develop)


1 Required任务 : 43/48 通过

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
80(0) 80 75 5 0 0 0
任务 错误类型 置信度 日志
Static-Check / Test PR问题:公共 API 变更未通过 approval Job
Model-Benchmark / Benchmark test 环境问题:构建产物解压后源码目录缺失 Job
Coverage test 环境问题:Paddle.tar.gz 从 CFS 获取失败 Job
Slice / Slice test 不稳定问题:slice benchmark 单项性能下降 Job
Fleet Unit test (single card) 不稳定问题:非 PR 相关性能/数值断言 Job

2 失败详情

🔴 Static-Check / Test — PR问题(置信度: 高)

错误类型: PR问题 | 置信度: 高
分析器: 通用分析(fallback)
失败用例: API approval 检查

用例 错误摘要
Check api approval 新增/变更 public API 条目未通过审批

关键日志:

+ paddle.optimizer.AdamW (ArgSpec(), ('document', 'e16a332b24fb338d74ab0f3440958dca'))
+ paddle.optimizer.SGD (ArgSpec(), ('document', '9de29746bc2c73010f7202371bbc6162'))
##[error]Process completed with exit code 6.
  • 根因摘要: API approval 发现公共 API 文档变更
    PR 新增 python/paddle/optim/sgd.pyadagrad.pyadamw.py 并调整 python/paddle/optim/__init__.py 的导出,同时修改了 paddle.optimizer.Adagradlr_decay 参数。Static-Check 在 Check api approval 阶段检测到 optimizer 相关 public API 的文档/ArgSpec hash 变化,但本 PR 未提供对应 API approval 更新。

修复建议:

  1. 按 Paddle API approval 流程确认 SGDAdamWAdagrad/lr_decay 的公开签名和文档变化是否符合预期,并补充/更新对应 API spec approval 文件;若不希望改变 paddle.optimizer.* 的 public API hash,则调整 wrapper/import 实现避免影响原有 API 文档输出。

关联变更: python/paddle/optim/__init__.pypython/paddle/optim/sgd.pypython/paddle/optim/adamw.pypython/paddle/optim/adagrad.pypython/paddle/optimizer/adagrad.py

🔴 Model-Benchmark / Benchmark test — 环境问题(置信度: 高)

错误类型: 环境问题 | 置信度: 高
分析器: 通用分析(fallback)
失败用例: 环境准备阶段

用例 错误摘要
Prepare environment and download paddle 构建产物解压失败,容器内没有 paddle 源码目录

关键日志:

tar: Error is not recoverable: exiting now
bash: line 12: cd: paddle: No such file or directory
bash: line 13: /workspace/paddle/ci/model_benchmark.sh: No such file or directory
bash: line 14: check_paddle: command not found
##[error]Process completed with exit code 127.
  • 根因摘要: benchmark 环境未拿到可用构建产物
    该 job 在运行 model_benchmark.sh 前已经因构建产物解压/目录准备失败退出,测试脚本本身没有执行。失败点与本 PR 的 optimizer API 变更文件无直接路径关联。

修复建议:

  1. 环境问题,请 rerun;若重跑仍失败,检查上游构建产物上传和 CFS/下载链路是否生成了可解压的 build.tar.gz/Paddle 源码目录。

关联变更: 未发现与 PR 修改文件有关联

🔴 Coverage test — 环境问题(置信度: 高)

错误类型: 环境问题 | 置信度: 高
分析器: 通用分析(fallback)
失败用例: 覆盖率测试环境准备

用例 错误摘要
Download paddle.tar.gz and update test branch 从 CFS 下载 Paddle.tar.gz 后 exit code 8,测试阶段被跳过

关键日志:

Downloading Paddle.tar.gz from cfs
##[error]Process completed with exit code 8.
rm: cannot remove ‘Paddle.tar.gz’: No such file or directory
  • 根因摘要: coverage job 未获取到 Paddle.tar.gz
    失败发生在下载/更新测试分支阶段,TestFA Test 均被跳过。日志中的缺失文件说明该 job 没拿到期望的 Paddle.tar.gz,不是单测或覆盖率逻辑本身失败。

修复建议:

  1. 环境问题,请 rerun;若重跑仍失败,检查 CFS 上对应 commit 的 Paddle.tar.gz 产物是否存在、权限和下载脚本是否正常。

关联变更: 未发现与 PR 修改文件有关联

🟡 Slice / Slice test — 不稳定问题(置信度: 中)

错误类型: 不稳定问题 | 置信度: 中
分析器: 通用分析(fallback)
失败用例: slice benchmark

用例 错误摘要
Getitem - forward - Slice - Slice with Step - float16 - paddle 相对性能变化 -0.10096326397731986,触发 slice benchmark 阈值

关键日志:

slice测试失败, 存在性能下降case, 失败case性能变化:
{'Getitem - forward - Slice - Slice with Step - float16 - paddle': -0.10096326397731986}
File "/paddle/PaddleTest/framework/slice_benchmark/run.py", line 164, in ci_test
  raise Exception("slice测试失败")
Exception: slice测试失败
  • 根因摘要: slice benchmark 单项性能波动
    PR 修改集中在 optimizer alias/API compatibility 和 Adagrad 参数上,没有修改 slice/getitem kernel、benchmark 脚本或相关依赖。该失败更像 benchmark 性能波动或环境差异导致的阈值触发。

修复建议:

  1. 已知不稳定,请 rerun;若连续复现,再由 slice benchmark 维护者检查该 float16 slice-with-step case 的性能基线和运行环境。

关联变更: 未发现与 PR 修改文件有关联

🟡 Fleet Unit test (single card) — 不稳定问题(置信度: 中)

错误类型: 不稳定问题 | 置信度: 中
分析器: 通用分析(fallback)
失败用例: Fleet single-card tests

用例 错误摘要
test_autocudagraph.py::TestEndToEndPerformance::test_resnext50_accuracy_and_speed CUDAGraph 39.23s 慢于 Eager 35.77s
test_tilelang_csa_indexer_cp.py::TestIndexerBwdCPGradients::test_cp2_long dW mismatch,diff 5.44e-02 大于 0.01

关键日志:

AssertionError: 39.23310104385018 not less than 35.766187934204936 : Performance Regression! CUDAGraph (39.23s) is slower than Eager (35.77s).
E   AssertionError: 0.05439642816781998 not less than 0.01 : dW mismatch: sq=256 rank=1 diff=5.44e-02
FAILED tests/single_card_tests/test_autocudagraph.py::TestEndToEndPerformance::test_resnext50_accuracy_and_speed
  • 根因摘要: Fleet 单卡性能/数值断言波动
    失败用例分别位于 autocudagraph 性能测试和 tilelang custom op 梯度测试。本 PR 未修改 Fleet、CUDAGraph、TileLang custom op 或相关测试文件,和当前 optimizer alias/API 变更缺少直接关联。

修复建议:

  1. 已知不稳定,请 rerun;若重跑稳定复现,分别检查 CUDAGraph 性能基线和 test_tilelang_csa_indexer_cp.py 的梯度容差/输入配置。

关联变更: 未发现与 PR 修改文件有关联

PaddlePaddle-bot

This comment was marked as outdated.

@algorithm1832

Copy link
Copy Markdown
Contributor Author

添加maximize参数涉及到多个文件修改,下个PR和defaults一并处理

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-15 16:28:02

📋 Review 摘要

PR 概述:为 paddle.optim 新增 SGDAdagradAdamW 兼容 wrapper,并给 paddle.optimizer.Adagrad 增加 lr_decay
变更范围python/paddle/optimpython/paddle/optimizer/adagrad.py、相关 legacy 测试
影响面 Tag[User Experience]

问题

级别 文件 概述
🔴 Bug python/paddle/optimizer/adagrad.py:173 lr_decay 的内部 scheduler 状态未进入 optimizer state_dict,断点恢复后衰减步数重置
🟡 建议 test/legacy_test/test_adagrad_op.py:462 新增 param group 用例没有断言,无法覆盖 lr_decay 组间行为和保存恢复

历史 Findings 修复情况

Finding 问题 状态
F1 paddle.optim.SGD wrapper 仍未兼容原 alias 的 learning_rate/parameters/grad_clip 等参数 ⚠️ 仍存在
F2 paddle.optim.Adagrad wrapper 仍未兼容原 alias 的 learning_rate/parameters/grad_clip/epsilon 等参数 ⚠️ 仍存在
F3 paddle.optim.AdamW wrapper 仍未兼容原 alias 的多项 Paddle 参数 ⚠️ 仍存在
F4 Adagrad.lr_decay 不再完全被忽略,wrapper 已向底层 optimizer 传递该参数 ✅ 已修复
F5 SGDmomentum/nesterov 仍只是 warning 后忽略,实际仍执行普通 SGD ⚠️ 仍存在
F6 Adagrad.lr_decay 的 step 顺序和静态图更新语义仍未修复 ⚠️ 仍存在
F7 参数组 lr_decay 仍依赖 optimizer 级 _current_lr_decay_scheduler,静态图/组间状态仍不可靠 ⚠️ 仍存在

📝 PR 规范检查

标题 Tag [API Compatibility] 不在 Paddle PR Category/Types 枚举中;描述结构和精度字段齐全。

标题建议(可直接复制):

  • [User Experience] Add API aliases for SGD, Adagrad, and AdamW
PR 描述建议(点击展开,可直接复制)
### PR Category
<!-- One of [ User Experience | Execute Infrastructure | Operator Mechanism | CINN | Custom Device | Performance Optimization | Distributed Strategy | Parameter Server | Communication Library | Auto Parallel | Inference | Environment Adaptation ] -->
User Experience

### PR Types
<!-- One of [ New features | Bug fixes | Improvements | Performance | BC Breaking | Deprecations | Docs | Devs | Not User Facing | Security | Others ] -->
Improvements

### Description
<!-- Describe what you’ve done -->
- Add `paddle.optim.SGD`, `paddle.optim.Adagrad`, and `paddle.optim.AdamW` compatibility wrappers with `params`/`lr` style arguments.
- Add API compatibility tests for positional, keyword, and mixed argument construction.
- Remove these optimizers from strict alias identity tests because `paddle.optim` now wraps `paddle.optimizer` implementations.

### 是否引起精度变化
<!-- one of the following [ 是 | 否 ]-->

总体评价

本轮覆盖了 8 个变更文件中的公开 API wrapper、Adagrad.lr_decay 调用链和新增测试。当前仍有多个历史兼容性/lr_decay 语义问题未修复,且新增 checkpoint 状态问题会影响训练恢复,建议先修正后再合入。

self._master_weights = {}
self.initial_accumulator_value = initial_accumulator_value
self._lr_decay = lr_decay
self._lr_decay_schedulers = {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug lr_decay 的步数状态没有进入 optimizer checkpoint,断点恢复后学习率衰减会从头开始。

这里把 lr_decay 包成私有的 _lr_decay_schedulers,但基类 Optimizer.state_dict() 只会在 _learning_rate 本身是 LRScheduler 时保存 LR_Scheduler。本实现里 _learning_rate 仍是 float,训练若在若干步后保存 optimizer.state_dict() 再恢复,InverseTimeDecaylast_epoch 会重新从 0 初始化,后续 Adagrad 更新使用错误的有效学习率。

建议修复方式:将 lr_decay 的 step 状态纳入 Adagrad.state_dict()/set_state_dict(),或改成由可持久化 step accumulator 计算衰减;同时补一个保存、加载 optimizer 后继续训练时有效 lr 与未中断训练一致的回归测试。

lr_decay=0.0,
)

for epoch in range(3):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 这个 param group 用例没有任何断言,无法防住 lr_decay 组间状态错误。

当前测试只运行三步训练,即使第二组的 lr_decay 被忽略、错用默认 scheduler、或保存恢复后重置,也会通过。这里至少应比较两个 group 的有效更新量,或者用可解析的一维参数和固定梯度断言第二组按 lr_decay=0.2 产生的参数值;再增加 state_dict 恢复场景,避免这类 scheduler 状态问题继续漏测。

else None
)

param_lr = self._create_param_lr(param_and_grad)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

意思是这里不额外处理lr,直接传入LRScheduler到基类,基类自己会处理LRScheduler。

这个lr_decay对应的是MultiStepDecay?

'initial_accumulator_value',
self._default_dict['initial_accumulator_value'],
)
self._lr_decay = parameters.get(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不加额外处理,直接遵循基类的LRScheduler处理

]
return parameters

def step(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不加额外处理

@algorithm1832

Copy link
Copy Markdown
Contributor Author

直接把LRScheduler传给基类的话,同时只能存在一个LRScheduler,如果多个param_group设置了不同的lr_decay,一个LRScheduler处理不了这种情况

@algorithm1832

Copy link
Copy Markdown
Contributor Author

预计和其它修改合并提交,本PR关闭

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants