feat(model): thread weight_dtype through HF export for plain-dtype DeepSeek-V4 output by Meirtz · Pull Request #4301 · NVIDIA-NeMo/Megatron-Bridge

Meirtz · 2026-06-11T11:01:32Z

What

Thread weight_dtype: Optional[torch.dtype] = None through the HF export path — export_hf_weights / save_hf_pretrained / stream_weights_megatron_to_hf — carried per-task via a new optional WeightConversionTask.weight_dtype field. When set, the DeepSeek-V4 bridge emits plain weights in that dtype (no *.scale companions) instead of re-creating the source repo's quantized layout. Default (None) keeps today's behavior. CLI: --export-weight-dtype on the export subcommand.

Why

DSv4 HF export unconditionally re-creates the source repo's quantized weight/scale layout (maybe_modify_converted_hf_weight → requantize_hf_weight_scale_pairs, from #3969). That's right for checkpoint conversion, but bf16-SFT'd weights get silently post-hoc quantized — a user found *.scale tensors in their SFT export and asked about train/inference parity.

Design (revised after reviewer feedback): the requantize hook runs on both export consumers — online weight streaming to rollout engines (export_hf_weights, e.g. verl RL weight sync) and on-disk checkpoints (save_hf_pretrained) — so a bridge-level boolean cannot configure them independently. A dtype-typed parameter on each public API lets callers choose per path (e.g. bf16 to rollout for RL parity, quantized to disk for serving-format artifacts, or vice versa). Hook signatures are unchanged (the dtype rides on the task), so the other bridges overriding this hook (dsv3, gemma4, kimi, mimo, flux) are unaffected; DSv3 can adopt the same field later.

Verified

the saver streams only yielded tensors (omitting .scale keys is safe); exported config.json is built fresh (torch_dtype: bfloat16, no quantization fields); safetensors index regenerated from written tensors;
unit tests cover dtype-set (plain weights, non-float tensors untouched) and default (requantize) paths.

Notes

AI-assisted (Claude); analysis, validation and review by the human author.

🤖 Generated with Claude Code

Meirtz · 2026-06-11T11:44:04Z

Reworked per reviewer feedback (offline discussion): the hook serves two export consumers — online weight streaming to rollout engines and on-disk checkpoints — so the bridge-level boolean is gone. Now weight_dtype: Optional[torch.dtype] on export_hf_weights / save_hf_pretrained, carried per-task (WeightConversionTask.weight_dtype), hook signatures unchanged. d10a8e7e.

Meirtz · 2026-06-11T13:59:07Z

Full-model E2E validation (DeepSeek-V4-Flash, 43 layers, real weights, TP1/PP4/EP8 on 8×GB300; same imported Megatron checkpoint for both runs):

export	tensors	`.scale` tensors	dtypes
default (quantized)	69,187	34,167	F8_E4M3, F8_E8M0, I8 (MXFP4), BF16, F32, I32
`--export-weight-dtype bfloat16`	35,020	0	BF16, I32

35,020 + 34,167 = 69,187 — the bf16 artifact contains exactly every weight with no scale companions (I32 = tid2eid routing buffers, correctly left untouched).

Two notes from the run: (1) the smoke caught a real bug in the first version of this PR — WeightConversionTask is a frozen dataclass, so the dtype is now applied via dataclasses.replace (ce66e82), and the unit tests now use real task instances instead of mocks; (2) with weight_dtype set, the saver's completeness check counts the source's .scale entries as "not written" (export proceeds with --not-strict) — cosmetic, can be polished in a follow-up by excluding scale keys from the expected set.

Meirtz · 2026-06-11T18:29:36Z

/ok to test 1b93e3c

cuichenx

LGTM

cuichenx

please check comments above

Meirtz · 2026-06-16T13:03:36Z

/ok to test b577be3

…tput Export has two consumers — online weight sync for RL rollout (export_hf_weights) and on-disk checkpoints (save_hf_pretrained). Each gains an optional weight_dtype that flows through WeightConversionTask into the export stream. Per review (HollowMan6): the plain-dtype cast is now generic, not DSv4-only. build_conversion_tasks stamps weight_dtype onto each task (no post-hoc dataclasses.replace except for caller-supplied tasks), and the cast lives in the shared stream path covering both the standard and grouped-export branches. The DSv4 hook simply skips requantization when weight_dtype is set and returns the converted weights unchanged, letting the generic path cast the dtype — keeping plain-dtype export identical across bridges. Adds --export-weight-dtype to the multi-gpu convert example. Validated end-to-end on 32x GB300: bf16 export = 35020 tensors / 0 scales; quantized export = 69187 / 34167. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Lingrui Mei <lmei@nvidia.com>

Meirtz · 2026-06-17T15:02:05Z

/ok to test 5a97742

HollowMan6

Thank you @Meirtz ! Now the code logic looks much better and I don't have more comments

Meirtz added feature New capabilities, enhancements, or enablement work area:model Model implementations and HF bridge logic labels Jun 11, 2026

copy-pr-bot Bot temporarily deployed to public June 11, 2026 11:02 Inactive

copy-pr-bot Bot temporarily deployed to test June 11, 2026 11:02 Inactive

yaoyu-33 added the needs-review PR is ready for code review and waiting on a reviewer label Jun 11, 2026

copy-pr-bot Bot temporarily deployed to public June 11, 2026 11:12 Inactive

copy-pr-bot Bot temporarily deployed to public June 11, 2026 11:36 Inactive

Meirtz changed the title ~~feat(model): optional bf16 (non-quantized) HF export for DeepSeek-V4~~ feat(model): thread weight_dtype through HF export for plain-dtype DeepSeek-V4 output Jun 11, 2026

copy-pr-bot Bot temporarily deployed to public June 11, 2026 11:44 Inactive

copy-pr-bot Bot temporarily deployed to test June 11, 2026 11:44 Inactive

copy-pr-bot Bot temporarily deployed to public June 11, 2026 11:53 Inactive

copy-pr-bot Bot temporarily deployed to public June 11, 2026 11:54 Inactive

copy-pr-bot Bot temporarily deployed to public June 11, 2026 12:15 Inactive

copy-pr-bot Bot temporarily deployed to public June 11, 2026 13:05 Inactive

copy-pr-bot Bot temporarily deployed to test June 11, 2026 13:05 Inactive

copy-pr-bot Bot temporarily deployed to public June 11, 2026 13:12 Inactive

copy-pr-bot Bot temporarily deployed to public June 11, 2026 13:13 Inactive

Meirtz force-pushed the fix/dsv4-bf16-export branch from ce66e82 to 1b93e3c Compare June 11, 2026 18:28

copy-pr-bot Bot temporarily deployed to public June 11, 2026 18:29 Inactive

copy-pr-bot Bot temporarily deployed to test June 11, 2026 18:29 Inactive

Meirtz requested review from cuichenx and yaoyu-33 June 11, 2026 18:38

copy-pr-bot Bot temporarily deployed to public June 11, 2026 18:46 Inactive

copy-pr-bot Bot temporarily deployed to public June 15, 2026 19:37 Inactive

cuichenx requested a review from HollowMan6 June 15, 2026 22:35

cuichenx approved these changes Jun 15, 2026

View reviewed changes

HollowMan6 reviewed Jun 15, 2026

View reviewed changes

Comment thread src/megatron/bridge/models/deepseek/deepseek_v4_bridge.py Outdated

HollowMan6 reviewed Jun 15, 2026

View reviewed changes

Comment thread src/megatron/bridge/models/conversion/model_bridge.py Outdated

cuichenx previously requested changes Jun 15, 2026

View reviewed changes

copy-pr-bot Bot temporarily deployed to public June 16, 2026 05:11 Inactive

copy-pr-bot Bot temporarily deployed to test June 16, 2026 05:11 Inactive

copy-pr-bot Bot temporarily deployed to public June 16, 2026 05:20 Inactive

copy-pr-bot Bot temporarily deployed to public June 16, 2026 05:21 Inactive

copy-pr-bot Bot temporarily deployed to public June 16, 2026 05:41 Inactive

Meirtz force-pushed the fix/dsv4-bf16-export branch from f1bcb78 to 0fe66bf Compare June 16, 2026 07:29

copy-pr-bot Bot temporarily deployed to public June 16, 2026 07:30 Inactive

copy-pr-bot Bot had a problem deploying to test June 16, 2026 07:30 Error

Meirtz force-pushed the fix/dsv4-bf16-export branch from 0fe66bf to f7c0987 Compare June 16, 2026 07:39

copy-pr-bot Bot temporarily deployed to test June 16, 2026 07:40 Inactive

copy-pr-bot Bot temporarily deployed to public June 16, 2026 07:40 Inactive

copy-pr-bot Bot temporarily deployed to public June 16, 2026 07:49 Inactive

copy-pr-bot Bot temporarily deployed to public June 16, 2026 07:50 Inactive

copy-pr-bot Bot temporarily deployed to public June 16, 2026 08:09 Inactive

Meirtz requested review from HollowMan6 and cuichenx June 16, 2026 12:59

Meirtz force-pushed the fix/dsv4-bf16-export branch from f7c0987 to b577be3 Compare June 16, 2026 13:03

copy-pr-bot Bot temporarily deployed to public June 16, 2026 13:04 Inactive

HollowMan6 approved these changes Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(model): thread weight_dtype through HF export for plain-dtype DeepSeek-V4 output#4301

feat(model): thread weight_dtype through HF export for plain-dtype DeepSeek-V4 output#4301
Meirtz merged 2 commits into
NVIDIA-NeMo:mainfrom
Meirtz:fix/dsv4-bf16-export

Meirtz commented Jun 11, 2026 •

edited

Loading

Uh oh!

Meirtz commented Jun 11, 2026

Uh oh!

Meirtz commented Jun 11, 2026

Uh oh!

Meirtz commented Jun 11, 2026

Uh oh!

cuichenx left a comment

Uh oh!

Uh oh!

Uh oh!

cuichenx left a comment

Uh oh!

Meirtz commented Jun 16, 2026

Uh oh!

Meirtz commented Jun 17, 2026

Uh oh!

HollowMan6 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Meirtz commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Verified

Notes

Uh oh!

Meirtz commented Jun 11, 2026

Uh oh!

Meirtz commented Jun 11, 2026

Uh oh!

Meirtz commented Jun 11, 2026

Uh oh!

cuichenx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cuichenx left a comment

Choose a reason for hiding this comment

Uh oh!

Meirtz commented Jun 16, 2026

Uh oh!

Meirtz commented Jun 17, 2026

Uh oh!

HollowMan6 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Meirtz commented Jun 11, 2026 •

edited

Loading