Skip to content

Replace dcu-z100 with test-safety and add test function in batch.py#79297

Open
Ray961123 wants to merge 3 commits into
PaddlePaddle:developfrom
Ray961123:fix-dcu-test-safety-v3
Open

Replace dcu-z100 with test-safety and add test function in batch.py#79297
Ray961123 wants to merge 3 commits into
PaddlePaddle:developfrom
Ray961123:fix-dcu-test-safety-v3

Conversation

@Ray961123

Copy link
Copy Markdown
Collaborator

Changes

  • Replace dcu-z100 with test-safety in .github/workflows/_Linux-DCU.yml
  • Add test function _test_batch_placeholder in python/paddle/batch.py

PaddlePaddle-bot

This comment was marked as outdated.

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@1430f5b). Learn more about missing BASE report.

Files with missing lines Patch % Lines
python/paddle/batch.py 50.00% 1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (50.00%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop   #79297   +/-   ##
==========================================
  Coverage           ?   50.00%           
==========================================
  Files              ?        1           
  Lines              ?        2           
  Branches           ?        0           
==========================================
  Hits               ?        1           
  Misses             ?        1           
  Partials           ?        0           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-11 18:00:01

📋 Review 摘要

PR 概述:将 Linux DCU test job runner group 从 dcu-z100 切换为 test-safety,并在 python/paddle/batch.py 添加测试占位函数。
变更范围.github/workflows/_Linux-DCU.ymlpython/paddle/batch.py
影响面 TagExecute Infrastructure User Experience

问题

未发现新的阻塞性问题。PR 规范问题在下面章节报,不在这里重复。

历史 Findings 修复情况

Finding 问题 状态
F1 生产模块中新增测试占位函数,但没有验证 batch() 的真实行为 ⚠️ 仍存在

📝 PR 规范检查

PR 标题缺少官方 Tag,PR 描述未使用仓库要求的四段模板。

标题建议(可直接复制):

  • [Execute Infrastructure] Replace DCU test runner group and add batch coverage
PR 描述建议(点击展开,可直接复制)
### PR Category
<!-- One of [ User Experience | Execute Infrastructure | Operator Mechanism | CINN | Custom Device | Performance Optimization | Distributed Strategy | Parameter Server | Communication Library | Auto Parallel | Inference | Environment Adaptation ] -->
Execute Infrastructure

### PR Types
<!-- One of [ New features | Bug fixes | Improvements | Performance | BC Breaking | Deprecations | Docs | Devs | Not User Facing | Security | Others ] -->
Improvements

### Description
<!-- Describe what you’ve done -->
- Replace the Linux-DCU test job runner group from `dcu-z100` to `test-safety` in `.github/workflows/_Linux-DCU.yml`.
- Add validation coverage for the `python/paddle/batch.py` module. The current implementation should use a real test under `test/` instead of a production placeholder function.
- Validation: N/A in PR description.

### 是否引起精度变化
<!-- one of the following [ 是 | 否 ]-->

总体评价

本轮覆盖了 2 个语义变更点:DCU workflow runner group 修改和 batch.py 新增函数。workflow 变更在 diff 内未发现可确定的新缺陷;历史测试占位问题仍未修复,建议改为 test/ 下对 paddle.batch 真实行为的测试覆盖。

@PaddlePaddle-bot

PaddlePaddle-bot commented Jun 13, 2026

Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-24 01:38:11 UTC+08:00

CI报告基于以下代码生成(30分钟更新一次):
PR commit: a694823 | Merge base: 1430f5b (branch: develop)


1 Required任务 : 42/47 通过

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
79(0) 79 73 5 0 0 1
任务 错误类型 置信度 日志
Linux-CPU / Build and test 环境问题:目标分支远端引用缺失 Job
Linux-DCU / Test PR问题:workflow 改动需审批 Job
Check PR问题:PR 模板字段未按规范填写 Job
Windows-GPU / Build and test 未知:CMake 生成源文件缺失 Job
Coverage test 不稳定问题:CINN 临时缓存文件缺失 Job

2 失败详情

🔴 Linux-CPU / Build and test — 环境问题(置信度: 高)

分析器: 通用分析(fallback)
失败用例:

用例 错误摘要
Download paddle.tar.gz and merge target branch 容器内合并目标分支时找不到远端 develop 引用

关键日志:

Already up to date.
fatal: Couldn't find remote ref develop
##[error]Process completed with exit code 128.
  • 根因摘要: CI 拉取目标分支引用失败
    该 Job 在下载源码并合并目标分支阶段失败,Build/Test 步骤均未执行。PR 仅修改 .github/workflows/_Linux-DCU.ymlpython/paddle/batch.py,与 Linux-CPU 的 git fetch/merge 流程无直接关联。

修复建议:

  1. 环境问题,请 rerun;若复现,需要 CI 侧检查 upstream/develop/远端 ref 获取逻辑和源码包内 git remote 配置。

关联变更: 未发现与本 PR 修改文件直接关联

🔴 Linux-DCU / Test — PR问题(置信度: 高)

分析器: 通用分析(fallback)
失败用例:

用例 错误摘要
Set up runner 修改 workflow 后触发 runner-policy,要求人工审批

关键日志:

Runner group name: 'test-safety'
[runner-policy] PR author Ray961123 matched ray961123
[runner-policy] block: static policy requires human approval: workflow_changed:.github/workflows/_Linux-DCU.yml
##[error]Process completed with exit code 1.
  • 根因摘要: workflow 改动触发安全审批
    PR 将 .github/workflows/_Linux-DCU.yml 的 DCU test runner group 从 dcu-z100 改为 test-safety。该 workflow 文件变更被 self-hosted runner 的静态策略拦截,测试容器未创建,DCU 测试未开始。

修复建议:

  1. 请通过人工审批后 rerun;若不应触发审批,则回退或调整 .github/workflows/_Linux-DCU.yml:199-200 的 runner group 变更。

关联变更: .github/workflows/_Linux-DCU.yml:199-200

🔴 Check — PR问题(置信度: 高)

分析器: 通用分析(fallback)
失败用例:

用例 错误摘要
Check PR Template PR 描述中的模板字段未按 Paddle PR 模板填写

关键日志:

PR Types should be in ['New features', 'Bug fixes', 'Improvements', 'Performance', 'BC Breaking', 'Deprecations', 'Docs', 'Devs', 'Not User Facing', 'Security', 'Others'].
必须填写是否引起精度变化
EXCODE: 7
##[error]Process completed with exit code 7.
  • 根因摘要: PR 模板必填项缺失
    模板检查将 PR body 中的变更说明 bullet 解析为 PR Types,且检测到“是否引起精度变化”未填写。当前失败与源码编译无关,需要补齐 PR 描述模板。

修复建议:

  1. .github/PULL_REQUEST_TEMPLATE.md 更新 PR 描述,填写合法 PR Types,并明确是否引起精度变化。

关联变更: PR 描述正文

🟡 Windows-GPU / Build and test — 未知(置信度: 中)

分析器: 通用分析(fallback)
失败用例:

用例 错误摘要
Build paddle / CMake op_dialect_vjp 依赖的生成源文件 op_decomp_rule.cc 不存在

关键日志:

CMake Error at cmake/generic.cmake:376 (add_library):
  Cannot find source file:
    C:/actions-runner/_work/Paddle/Paddle/build/paddle/fluid/pir/dialect/operator/ir/op_decomp_rule.cc
CMake Error at cmake/generic.cmake:376 (add_library):
  No SOURCES given to target: op_dialect_vjp
Cmake failed, will exit
  • 根因摘要: Windows CMake 生成文件缺失
    op_dialect_vjppaddle/fluid/pir/dialect/CMakeLists.txt 中依赖 ${PIR_DIALECT_BINARY_DIR}/op_decomp_rule.cc,该文件应由 paddle/fluid/primitive/codegen/CMakeLists.txt 调用 decomp_rule_gen.py 生成。PR 未修改这些生成规则,日志也未显示与新增 batch.py 占位函数或 DCU workflow 变更有关。

修复建议:

  1. 先 rerun 验证是否为 Windows 构建环境/生成步骤偶发问题;若稳定复现,请构建系统 owner 检查 decomp_rule_gen.py 在 Windows 配置阶段的输出路径和执行结果。

关联变更: 未发现与本 PR 修改文件直接关联

🟡 Coverage test — 不稳定问题(置信度: 中)

分析器: 通用分析(fallback)
失败用例:

用例 错误摘要
test/ir/pir/cinn/test_enable_cinn_kernel_cache.py::TestCase::test_case CINN 编译 fatbin 时 /tmp/cinn/.../cinn_cuda_kernel.cu 缺失

关键日志:

23/135 Test #266: test_enable_cinn_kernel_cache ...........***Failed
E compiler.cc:954] Compilation failed with output:
nvcc -c ... /tmp/cinn//0/2589847229578688145/cinn_cuda_kernel.cu ...
F compiler.cc:969] CUDA source file is missing. Expected file: /tmp/cinn//0/2589847229578688145/cinn_cuda_kernel.cu.
99% tests passed, 1 tests failed out of 135
  • 根因摘要: CINN 缓存目录状态异常
    该用例默认使用 /tmp/cinn/ 作为 kernel cache 目录;当目录已存在时会直接走缓存加载分支。日志显示临时缓存目录下期望的 CUDA 源文件缺失,符合共享临时目录/缓存状态不完整导致的间歇性失败特征;PR 未修改 CINN、coverage 或该测试文件。

修复建议:

  1. 已知不稳定,请 rerun;若复现,建议 CINN 测试 owner 隔离 FLAGS_cinn_kernel_cache_save_path 或在用例开始前清理 /tmp/cinn/

关联变更: 未发现与本 PR 修改文件直接关联

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants