Skip to content

feat(diffusion): add LongLive WAN training path#4272

Open
AndysonYs wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
AndysonYs:longlive-wan-mvp
Open

feat(diffusion): add LongLive WAN training path#4272
AndysonYs wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
AndysonYs:longlive-wan-mvp

Conversation

@AndysonYs

@AndysonYs AndysonYs commented Jun 10, 2026

Copy link
Copy Markdown

What does this PR do ?

Adds the initial offline-latents LongLive WAN MVP requested in #4215, covering clean-history plus noisy-target temporal chunks with windowed attention defaults and SP/TP validation.

Changelog

  • Add longlive_wan_step registration so WAN recipes can select the LongLive forward step from scripts/training/run_recipe.py.
  • Add LongLive WAN chunk selection, target-only loss masking, teacher-forcing mask helpers, and CP/SP partition utilities.
  • Add LongLiveWanForwardStep and LongLiveWanFlowMatchingPipeline for clean-history plus noisy-target temporal chunk training.
  • Add LongLive WAN 1.3B and 5B SP long-video recipes with explicit sliding-window attention settings to avoid dense [S, S] masks for long sequences.
  • Update WAN mock/dataset config paths to support LongLive latent shape overrides used by the long-video recipe.
  • Keep the Megatron-Core submodule clean by making explicit dense self-attention masks opt-in only when the decoder supports self_attention_mask.
  • Add focused unit tests for LongLive chunking, noising, masking, recipe wiring, and dense-mask/windowed-attention selection.
  • Add scripts/validation/wan_sp_tp_tiny_parity.py to verify tiny WAN TP/SP inference parity with exact tensor equality.
  • Document the LongLive WAN MVP commands and the 5B SP long-video smoke path in the WAN example README.

GitHub Actions CI

See the CI section in the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

Additional Information

  • Related to [feature] LongLiveWan Long-Video Training Recipe #4215.
  • This PR implements the initial offline WAN latents/text embeddings target from [feature] LongLiveWan Long-Video Training Recipe #4215; online raw-video VAE encoding with temporal halo exchange remains a future extension.
  • Does not add new required or optional dependencies.
  • pre-commit run --all-files passed.
  • python -m compileall -q scripts/validation/wan_sp_tp_tiny_parity.py src/megatron/bridge/diffusion/models/wan/longlive_wan_step.py src/megatron/bridge/diffusion/models/wan/longlive_wan_utils.py src/megatron/bridge/diffusion/models/wan/wan_model.py src/megatron/bridge/diffusion/recipes/wan/wan.py tests/unit_tests/diffusion/model/wan/test_longlive_wan_step.py tests/unit_tests/diffusion/recipes/wan/test_wan_recipe.py passed.
  • Slurm unit job 3244385: 35 passed, 26 warnings in 3.56s on 4x GB200.
  • Slurm tiny SP/TP parity job 3244425: strict_equal=True, max_abs=0.00000000e+00.
  • Slurm long-video smoke job 3244426: completed 1/1 iteration on 4x GB200 with 0 skipped and 0 NaN iterations.
  • Local uv was unavailable in the conda environment, so pre-commit was run directly after installing pre-commit into the existing mb-longlive-wan environment.

Signed-off-by: Shuai Yang <shyang@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyu-33 yaoyu-33 added area:diffusion DFM module feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer labels Jun 10, 2026
Signed-off-by: Shuai Yang <shyang@nvidia.com>
@yaoyu-33

Copy link
Copy Markdown
Contributor

Could you please check the default LongLive 1.3B recipe? It looks internally inconsistent: longlive_wan_1_3b_pretrain_config() inherits the text-to-video config with context_parallel_size=4 and the WAN provider default qkv_format="thd", but LongLiveWanFlowMatchingPipeline.validate_qkv_format() rejects anything except qkv_format="sbhd".

Please make the default recipe runnable, or mark it as non-runnable and update the README/tests accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:diffusion DFM module community-request feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants