[BugFix] lookup for all ranks by flesher0813 · Pull Request #1033 · ModelEngine-Group/unified-cache-management

flesher0813 · 2026-06-16T12:54:41Z

Purpose

Fix GQA cache lookup correctness by checking cached block availability across all TP ranks instead of only rank 0. This prevents scheduler-side false hits where rank 0 files exist but other rank-specific files have been removed by GC, which can later cause worker load failures.

Modifications

Update get_num_new_matched_tokens() to use the minimum prefix hit across all ranks as the effective external cache hit.
Add _lookup_external_prefix_all_ranks to search for all ranks' blocks

Test

Tested with 2tp/8tp/2dp 2tp/4tp 2dcp/2tp 2pcp with QwQ-32B or Qwen2.5-1.5B

mag1c-h · 2026-06-17T01:14:22Z

+
+        min_hit_idx = len(external_block_ids) - 1
+        for rank, rank_hasher in enumerate(self._rank_hashers):
+            rank_block_ids = (


This loop increases the time spent on Lookup operations. In scenarios with high TP rank, will the performance test pass?

ygwpz · 2026-06-17T06:20:23Z

        if role == KVConnectorRole.SCHEDULER:
            self.request_hasher = RequestHasher(vllm_config, 0)
+            self._rank_hashers = [
+                RequestHasher(vllm_config, rank % self.tp_size)


💡 Suggestion: The _rank_hashers creation uses rank % self.tp_size, but this is redundant since rank is already in range(1, self.tp_size). Consider simplifying to for rank in range(1, self.tp_size) directly, unless there's a specific reason for the modulo operation.

ygwpz · 2026-06-17T06:20:26Z

+        for block_idx in range(len(candidate_block_ids)):
+            begin = block_idx * num_other_ranks
+            end = begin + num_other_ranks
+            if not all(founds[begin:end]):


⚠️ Warning: The all() check iterates through founds[begin:end]. If founds is empty or the slice is empty, all([]) returns True, which might not be the intended behavior. Consider adding an explicit check for empty slices.

ygwpz · 2026-06-17T06:26:10Z

+
+    try {
+        results = std::make_shared<std::vector<uint8_t>>(num, false);
+        status = std::make_shared<std::atomic<int32_t>>(ok);


⚠️ Warning: If prefixLookupSrv_.NWorker() returns 0, waiter->Set(0) would cause waiter->Wait() to return immediately without doing any lookup. The results would remain all false. Consider adding a check for zero workers.

ygwpz · 2026-06-17T06:27:06Z


        if role == KVConnectorRole.SCHEDULER:
            self.request_hasher = RequestHasher(vllm_config, 0)
+            rank_ids = list(


💡 Suggestion: The rank_ids calculation uses dict.fromkeys to deduplicate. Consider documenting the expected behavior when dcp_world_size > tp_size - this could produce unexpected results.

ygwpz · 2026-06-17T06:27:10Z

-    const auto index = res.Value();
-    for (ssize_t i = 0; i <= index; ++i) { results[i] = true; }
-    return results;
+    if (num == 0) { return std::vector<uint8_t>{}; }


⚠️ Warning: The new Lookup implementation uses shared_ptr for thread-safe access. However, if prefixLookupSrv_.NWorker() returns 0, waiter->Set(0) would return immediately. Consider adding: if (nWorker == 0) { return std::vector<uint8_t>(num, false); }

[BugFix] lookup for all ranks

8cf15ab

flesher0813 requested review from Infinite666, harrisonyhq, mag1c-h, qyh111 and ygwpz as code owners June 16, 2026 12:54

mag1c-h reviewed Jun 17, 2026

View reviewed changes

[BugFix] lookup for all ranks

a7ee90b

ygwpz reviewed Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] lookup for all ranks#1033

[BugFix] lookup for all ranks#1033
flesher0813 wants to merge 2 commits into
ModelEngine-Group:developfrom
flesher0813:develop

flesher0813 commented Jun 16, 2026 •

edited

Loading

Uh oh!

mag1c-h Jun 17, 2026

Uh oh!

ygwpz Jun 17, 2026

Uh oh!

ygwpz Jun 17, 2026

Uh oh!

ygwpz Jun 17, 2026

Uh oh!

ygwpz Jun 17, 2026

Uh oh!

ygwpz Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

flesher0813 commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Modifications

Test

Uh oh!

mag1c-h Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

ygwpz Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

ygwpz Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

ygwpz Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

ygwpz Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

ygwpz Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

flesher0813 commented Jun 16, 2026 •

edited

Loading