Skip to content

cp: [recipe,model,ckpt] DeepSeek-V4 backports (4131, 4271, 4305, 4306, 4338) into r0.5.0#4337

Merged
ko3n1g merged 6 commits into
r0.5.0from
cherry-pick-4131-4271-4305-r0.5.0
Jun 15, 2026
Merged

cp: [recipe,model,ckpt] DeepSeek-V4 backports (4131, 4271, 4305, 4306, 4338) into r0.5.0#4337
ko3n1g merged 6 commits into
r0.5.0from
cherry-pick-4131-4271-4305-r0.5.0

Conversation

@cuichenx

@cuichenx cuichenx commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Combined backport to r0.5.0 for the requested DeepSeek-V4 changes.

Included cherry-picks:

#4131 is included because #4305 updates the DeepSeek-V4-Flash SFT recipes introduced by #4131.

#4306 is included because the DSv4 PP>1 import/export round trip can otherwise fail when run_config.yaml serializes the finalized pipeline layout as an object stub instead of a concrete list. The backport regenerates and forwards the resolved layout during export.

The #4338 follow-up addresses the Claude bot comment about H100 CI determinism by forcing Blackwell fused-kernel support in the shared DeepSeek-V4 recipe test builder; the explicit unavailable-kernel test still overrides the helper to False.

Validation

  • git diff --check origin/r0.5.0..HEAD
  • uv run --no-sync python -m py_compile examples/conversion/convert_checkpoints_multi_gpu.py src/megatron/bridge/models/model_provider.py src/megatron/bridge/models/deepseek/deepseek_v4_bridge.py src/megatron/bridge/recipes/deepseek/deepseek_v4.py tests/unit_tests/models/deepseek/test_deepseek_v4_bridge.py tests/unit_tests/recipes/test_deepseek_recipes.py tests/functional_tests/test_groups/recipes/test_deepseek_recipes_finetune.py
  • uv run --no-sync --with pre-commit pre-commit run --all-files

Attempted targeted unit tests with uv run --frozen --group test python -m pytest tests/unit_tests/models/deepseek/test_deepseek_v4_bridge.py tests/unit_tests/recipes/test_deepseek_recipes.py, but this host cannot install the locked nvidia-resiliency-ext==0.6.0 wheel because the available wheel tag is manylinux_2_39_x86_64 while the host is manylinux_2_31_x86_64.

Meirtz and others added 3 commits June 12, 2026 13:36
Signed-off-by: Lingrui Mei <lmei@nvidia.com>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
(cherry picked from commit 00174bc)
…4131)

Signed-off-by: Lingrui Mei <lmei@nvidia.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
(cherry picked from commit 3af343e)
)

Signed-off-by: Lingrui Mei <lmei@nvidia.com>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
(cherry picked from commit c9ab9cc)
@copy-pr-bot

copy-pr-bot Bot commented Jun 12, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cuichenx added 2 commits June 12, 2026 14:32
Signed-off-by: Chen Cui <chcui@nvidia.com>
(cherry picked from commit 5f24233)
Signed-off-by: Chen Cui <chcui@nvidia.com>
(cherry picked from commit 655eb6d)
@cuichenx cuichenx changed the title cp: [recipe,model] DeepSeek-V4 SFT backports (4131, 4271, 4305) into r0.5.0 cp: [recipe,model] DeepSeek-V4 SFT/H100 backports (4131, 4271, 4305, 4338) into r0.5.0 Jun 12, 2026
@cuichenx cuichenx marked this pull request as ready for review June 12, 2026 21:35
@yaoyu-33 yaoyu-33 added area:recipe Training recipes and launch configs cherry-pick feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer r0.5.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge. labels Jun 12, 2026
yaoyu-33
yaoyu-33 previously approved these changes Jun 12, 2026
…ig lacks it (#4306)

Signed-off-by: Lingrui Mei <lmei@nvidia.com>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
(cherry picked from commit ae87ec3)
@cuichenx cuichenx changed the title cp: [recipe,model] DeepSeek-V4 SFT/H100 backports (4131, 4271, 4305, 4338) into r0.5.0 cp: [recipe,model,ckpt] DeepSeek-V4 backports (4131, 4271, 4305, 4306, 4338) into r0.5.0 Jun 12, 2026
@ko3n1g ko3n1g merged commit 9297e31 into r0.5.0 Jun 15, 2026
171 of 173 checks passed
@ko3n1g ko3n1g deleted the cherry-pick-4131-4271-4305-r0.5.0 branch June 15, 2026 06:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:recipe Training recipes and launch configs cherry-pick feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer r0.5.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants