prepare chunk indices before cache initialize by grimoire · Pull Request #4458 · InternLM/lmdeploy

grimoire · 2026-03-24T13:16:27Z

Chunk gated delta kernel requires a chunk_indices, which requires stream synchronize.

This PR computes the chunk_indices before forward and cache initialization.

Copilot

Pull request overview

This PR adjusts the PyTorch engine’s prefill path for SSM / gated-delta (flash-linear-attention) models so that chunk-gated-delta “chunk indices” preparation (which forces a CUDA stream sync) happens during step-context construction, before state-cache initialization and forward execution.

Changes:

Move state-cache initialization for SSM from the input-update path into model_forward(), after build_context().
In the CUDA backend update_step_context(), eagerly call fla.ops.utils.prepare_chunk_indices(...) during prefill to trigger the required synchronization earlier.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`lmdeploy/pytorch/engine/model_agent/agent.py`	Moves SSM state cache initialization to occur after `build_context()` (and removes prior prefill-only init hook).
`lmdeploy/pytorch/backends/cuda/op_backend.py`	Adds gated-delta chunk-index preparation during prefill step-context update.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lmdeploy/pytorch/backends/cuda/op_backend.py

RunningLeon

LGTM

prepare chunk indices before cache initialize

d7ceaf4

lvhan028 added the improvement label Mar 25, 2026

lvhan028 requested review from CUHKSZzxy, RunningLeon and Copilot March 25, 2026 02:53

Copilot started reviewing on behalf of lvhan028 March 25, 2026 02:53 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

solve comment;add flag

2205796

RunningLeon approved these changes Mar 26, 2026

View reviewed changes

lvhan028 merged commit 342f18e into InternLM:main Mar 26, 2026
4 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prepare chunk indices before cache initialize#4458

prepare chunk indices before cache initialize#4458
lvhan028 merged 2 commits intoInternLM:mainfrom
grimoire:prepare-chunk-indices

grimoire commented Mar 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RunningLeon left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

grimoire commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RunningLeon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

grimoire commented Mar 24, 2026 •

edited

Loading