fix(turbomind): fix dimension mismatch in ApplyTokenBitmaskInplace by windreamer · Pull Request #4456 · InternLM/lmdeploy

windreamer · 2026-03-24T06:45:32Z

The bug occurs when batch contains requests that have finished generation but are still waiting in the batch for synchronization. In this case, generation_size (number of requests still generating) is less than matchers.size() (total requests in batch).

Root cause:

In generation.cc Forward(), logits is sliced to match generation_size: env.produce("logits", logits.slice(0, gs));
But in guided_decoding.cc ApplyMask(), bitmask was sliced using d.matchers.size() instead of the actual logits batch size.

This causes TM_CHECK(logits_shape.first == bitmask_shape.first) to fail in apply_token_bitmask_inplace_cuda.cu when the dimensions don't match.

fix: #4453

Copilot

Pull request overview

This PR fixes a Turbomind guided-decoding crash caused by a batch-dimension mismatch between sliced logits (only active/generating requests) and the guided-decoding bitmask (previously sliced using the full batch request count).

Changes:

Update GuidedDecoding::ApplyMask() to slice the bitmask using logits.shape(0) rather than d.matchers.size().
Add clarifying comments explaining why logits.shape(0) is the correct dimension to use when some requests in the batch are no longer generating.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The bug occurs when batch contains requests that have finished generation but are still waiting in the batch for synchronization. In this case, generation_size (number of requests still generating) is less than matchers.size() (total requests in batch). Root cause: - In generation.cc Forward(), logits is sliced to match generation_size: env.produce("logits", logits.slice(0, gs)); - But in guided_decoding.cc ApplyMask(), bitmask was sliced using d.matchers.size() instead of the actual logits batch size. This causes TM_CHECK(logits_shape.first == bitmask_shape.first) to fail in apply_token_bitmask_inplace_cuda.cu when the dimensions don't match. Fix: Use logits.shape(0) instead of d.matchers.size() to ensure the bitmask has the same batch dimension as logits. Co-authored-by: openhands <openhands@all-hands.dev>

windreamer requested review from Copilot and lzhangzz March 24, 2026 06:45

windreamer self-assigned this Mar 24, 2026

Copilot started reviewing on behalf of windreamer March 24, 2026 06:45 View session

windreamer mentioned this pull request Mar 24, 2026

[Bug] Turbomind random crash when handle batch request #4453

Open

3 tasks

Copilot AI reviewed Mar 24, 2026

View reviewed changes

windreamer changed the title ~~fix(turbomind): fix dimension mismatch in ApplyTokenBitmaskInplace~~ [DO NOT MERGE] fix(turbomind): fix dimension mismatch in ApplyTokenBitmaskInplace Mar 26, 2026

windreamer force-pushed the fix/guided-decoding-dimension-mismatch branch 5 times, most recently from 8fe41f0 to 20de205 Compare March 26, 2026 06:31

windreamer changed the title ~~[DO NOT MERGE] fix(turbomind): fix dimension mismatch in ApplyTokenBitmaskInplace~~ fix(turbomind): fix dimension mismatch in ApplyTokenBitmaskInplace Mar 26, 2026

windreamer force-pushed the fix/guided-decoding-dimension-mismatch branch from 20de205 to 270e541 Compare March 26, 2026 10:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(turbomind): fix dimension mismatch in ApplyTokenBitmaskInplace#4456

fix(turbomind): fix dimension mismatch in ApplyTokenBitmaskInplace#4456
windreamer wants to merge 1 commit intoInternLM:mainfrom
windreamer:fix/guided-decoding-dimension-mismatch

windreamer commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

windreamer commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants