Skip to content

prepare chunk indices before cache initialize#4458

Merged
lvhan028 merged 2 commits intoInternLM:mainfrom
grimoire:prepare-chunk-indices
Mar 26, 2026
Merged

prepare chunk indices before cache initialize#4458
lvhan028 merged 2 commits intoInternLM:mainfrom
grimoire:prepare-chunk-indices

Conversation

@grimoire
Copy link
Copy Markdown
Collaborator

@grimoire grimoire commented Mar 24, 2026

Chunk gated delta kernel requires a chunk_indices, which requires stream synchronize.

This PR computes the chunk_indices before forward and cache initialization.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the PyTorch engine’s prefill path for SSM / gated-delta (flash-linear-attention) models so that chunk-gated-delta “chunk indices” preparation (which forces a CUDA stream sync) happens during step-context construction, before state-cache initialization and forward execution.

Changes:

  • Move state-cache initialization for SSM from the input-update path into model_forward(), after build_context().
  • In the CUDA backend update_step_context(), eagerly call fla.ops.utils.prepare_chunk_indices(...) during prefill to trigger the required synchronization earlier.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
lmdeploy/pytorch/engine/model_agent/agent.py Moves SSM state cache initialization to occur after build_context() (and removes prior prefill-only init hook).
lmdeploy/pytorch/backends/cuda/op_backend.py Adds gated-delta chunk-index preparation during prefill step-context update.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Collaborator

@RunningLeon RunningLeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lvhan028 lvhan028 merged commit 342f18e into InternLM:main Mar 26, 2026
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants