Skip to content

[BugFix] lookup for all ranks#1033

Open
flesher0813 wants to merge 2 commits into
ModelEngine-Group:developfrom
flesher0813:develop
Open

[BugFix] lookup for all ranks#1033
flesher0813 wants to merge 2 commits into
ModelEngine-Group:developfrom
flesher0813:develop

Conversation

@flesher0813

@flesher0813 flesher0813 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Purpose

Fix GQA cache lookup correctness by checking cached block availability across all TP ranks instead of only rank 0. This prevents scheduler-side false hits where rank 0 files exist but other rank-specific files have been removed by GC, which can later cause worker load failures.

Modifications

  • Update get_num_new_matched_tokens() to use the minimum prefix hit across all ranks as the effective external cache hit.
  • Add _lookup_external_prefix_all_ranks to search for all ranks' blocks

Test

Tested with 2tp/8tp/2dp 2tp/4tp 2dcp/2tp 2pcp with QwQ-32B or Qwen2.5-1.5B

Comment thread ucm/integration/vllm/ucm_connector.py Outdated

min_hit_idx = len(external_block_ids) - 1
for rank, rank_hasher in enumerate(self._rank_hashers):
rank_block_ids = (

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop increases the time spent on Lookup operations. In scenarios with high TP rank, will the performance test pass?

if role == KVConnectorRole.SCHEDULER:
self.request_hasher = RequestHasher(vllm_config, 0)
self._rank_hashers = [
RequestHasher(vllm_config, rank % self.tp_size)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: The _rank_hashers creation uses rank % self.tp_size, but this is redundant since rank is already in range(1, self.tp_size). Consider simplifying to for rank in range(1, self.tp_size) directly, unless there's a specific reason for the modulo operation.

for block_idx in range(len(candidate_block_ids)):
begin = block_idx * num_other_ranks
end = begin + num_other_ranks
if not all(founds[begin:end]):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Warning: The all() check iterates through founds[begin:end]. If founds is empty or the slice is empty, all([]) returns True, which might not be the intended behavior. Consider adding an explicit check for empty slices.


try {
results = std::make_shared<std::vector<uint8_t>>(num, false);
status = std::make_shared<std::atomic<int32_t>>(ok);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Warning: If prefixLookupSrv_.NWorker() returns 0, waiter->Set(0) would cause waiter->Wait() to return immediately without doing any lookup. The results would remain all false. Consider adding a check for zero workers.


if role == KVConnectorRole.SCHEDULER:
self.request_hasher = RequestHasher(vllm_config, 0)
rank_ids = list(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: The rank_ids calculation uses dict.fromkeys to deduplicate. Consider documenting the expected behavior when dcp_world_size > tp_size - this could produce unexpected results.

const auto index = res.Value();
for (ssize_t i = 0; i <= index; ++i) { results[i] = true; }
return results;
if (num == 0) { return std::vector<uint8_t>{}; }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Warning: The new Lookup implementation uses shared_ptr for thread-safe access. However, if prefixLookupSrv_.NWorker() returns 0, waiter->Set(0) would return immediately. Consider adding: if (nWorker == 0) { return std::vector<uint8_t>(num, false); }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants