[Feat] Add opt-in reshape cache event sync by dante159753 · Pull Request #1025 · ModelEngine-Group/unified-cache-management

dante159753 · 2026-06-12T08:53:29Z

Summary

This PR splits the cache dump reshape-event synchronization optimization into its own branch, without the dump queue pipeline refactor from cache-dump-event-optimize.

Changes:

Add opt-in enable_reshape_cache_event_sync config, defaulting to false to preserve current behavior.
When enabled, layerwise dump first tries to reuse vLLM/vLLM-Ascend reshape_cache_event from attn_metadata[layer_name], then falls back to attn_metadata.reshape_cache_event.
If no usable reshape event exists, UCM keeps the existing current-stream event fallback; if event creation fails, it falls back to device synchronization.
Preserve borrowed torch event references in Device so raw event handles remain valid until UCM clears event handles.
Add focused metrics for event-source selection:
- dump_event_reshape_cache_direct_used_total
- dump_event_reshape_cache_layer_used_total
- dump_event_current_stream_used_total
- dump_event_sync_fallback_used_total
Update config example and docs.

Why

For layerwise cache dump, vLLM-Ascend may already record a reshape_cache_event immediately after KV cache writes are ready. Waiting on that event lets UCM's C++ cache copy stream synchronize at the KV-ready point instead of recording a later current-stream event in Python.

The switch is opt-in because model/version behavior differs. The default remains the pre-existing path.

Scope

This PR intentionally does not include dump queue optimization or DumpQueue changes. The diff is limited to connector/device Python code plus config, metric config, and docs.

Verification

Ran locally on Windows worktree cache-dump-event-sync-only:

git diff --check origin/develop...HEAD
python -m py_compile ucm/integration/vllm/device.py ucm/integration/vllm/ucm_connector.py
python -m black --check ucm/integration/vllm/device.py ucm/integration/vllm/ucm_connector.py
python -m isort --check-only --profile=black ucm/integration/vllm/device.py ucm/integration/vllm/ucm_connector.py
codespell docs/source/user-guide/metrics/metrics.md docs/source/user-guide/prefix-cache/pipeline_store.md examples/metrics/metrics_configs.yaml examples/ucm_config_example.yaml ucm/integration/vllm/device.py ucm/integration/vllm/ucm_connector.py --skip ucm/csrc/**,ucm.egg-info/**,.github/**,ucm/sparse/gsa_on_device/csrc/** -L CANN,cann,NNAL,nnal,ASCEND,ascend,EnQue,CopyIn,Collet,DElt,re-use
Parsed examples/metrics/metrics_configs.yaml and examples/ucm_config_example.yaml with PyYAML.
Confirmed PR diff does not include dump queue files.

Note: full local pre-commit run --files ... could not complete on this machine because the actionlint hook attempted to download Go from go.dev and hit a local SSL/network failure. The changed files do not include GitHub Actions files; black/isort/codespell were run directly with matching configured versions.

ygwpz · 2026-06-13T06:10:33Z

+                return 0
+            self.borrowed_events.append(event)
+            return int(handle)
+        except Exception as e:


⚠️ Warning - Exception handling too broad

Using except Exception catches all exceptions including system exceptions like KeyboardInterrupt, SystemExit. This could mask critical errors.

Recommended specific exceptions:

except (AttributeError, TypeError, ValueError) as e:

This applies to both CudaDevice (line 128) and NpuDevice (line 279) implementations.

ygwpz · 2026-06-13T06:10:35Z

+                return 0
+            self.borrowed_events.append(event)
+            return int(handle)
+        except Exception as e:


⚠️ Warning - Exception handling too broad

Same issue as CudaDevice - using except Exception is too broad. Should catch specific exceptions like AttributeError, TypeError, ValueError.

ygwpz · 2026-06-13T06:10:36Z

        self.events.clear()
+        self.borrowed_events.clear()

    def destroy_event_handle(self, handle: int):


⚠️ Warning - Inconsistent cleanup in destroy_event_handle

The destroy_event_handle method only removes events from self.events dict but does not remove corresponding entries from self.borrowed_events.

If individual event handles are destroyed (rather than batch clearing via destroy_event_handles), borrowed event references could accumulate, potentially causing memory growth over time.

Consider either:

Adding logic to track handle-to-event mapping for borrowed events

Documenting that destroy_event_handle is only for UCM-created events, not borrowed ones

Using a different structure for borrowed_events with proper tracking

ygwpz · 2026-06-13T06:10:38Z

 class Device(ABC):
    def __init__(self):
        self.events = {}
+        self.borrowed_events = []


💡 Suggestion - Thread safety consideration

The borrowed_events list is accessed without synchronization. If this device is used in a multi-threaded context (e.g., multiple workers sharing the same device instance), there could be race conditions when:

Multiple threads call get_event_handle_from_event concurrently (append)

One thread clears while another is appending

Consider adding:

A lock for borrowed_events operations

Documentation about thread-safety assumptions (single-threaded per device)

A thread-safe collection if multi-threaded access is expected

ygwpz · 2026-06-13T06:10:39Z

        self.tp_rank = self._vllm_config.parallel_config.rank
        self.block_size = self._vllm_config.cache_config.block_size
        self.is_mla = self._vllm_config.model_config.is_deepseek_mla
+        self.use_mla = getattr(self._vllm_config.model_config, "use_mla", self.is_mla)


💡 Suggestion - Unused variable

The variable self.use_mla is set here but never used in the visible code changes. If this is intentional (for future use), consider adding a comment explaining the purpose. If not needed, it should be removed to avoid confusion.

Note: self.is_mla is already used elsewhere in the code.

ygwpz · 2026-06-13T06:10:40Z

+    def _get_reshape_cache_event(
+        self, layer_name: Optional[str], attn_metadata: Optional["AttentionMetadata"]
+    ) -> tuple[Optional[Any], str]:
+        if attn_metadata is None:
+            return None, "none"
+
+        if layer_name:
+            try:
+                layer_metadata = attn_metadata[layer_name]
+                event = getattr(layer_metadata, "reshape_cache_event", None)
+                if event is not None:
+                    return event, "layer"
+            except (KeyError, TypeError, AttributeError):
+                pass
+
+        event = getattr(attn_metadata, "reshape_cache_event", None)
+        if event is not None:
+            return event, "direct"
+        return None, "direct_missing"


⚠️ Warning - Metric gap for missing reshape cache events

When enable_reshape_cache_event_sync is True but the reshape cache event is missing (event_source = "none" or "direct_missing"), and we fall back to current stream or device sync, there is no specific metric tracking this "expected-but-missing" case.

This creates an observability gap - operators cannot distinguish between:

Intentional fallback (feature disabled)

Expected reshape event missing (vLLM version/model mismatch)

Consider adding metrics like:

dump_event_reshape_cache_none_total (attn_metadata is None)

dump_event_reshape_cache_missing_total (event_source = "direct_missing")

ygwpz · 2026-06-13T06:10:41Z

+        if self.enable_reshape_cache_event_sync:
+            reshape_cache_event, event_source = self._get_reshape_cache_event(
+                layer_name, attn_metadata
+            )
+            event_handle = (
+                self.device.get_event_handle_from_event(reshape_cache_event)
+                if reshape_cache_event is not None
+                else 0
+            )
+            if event_handle != 0:
+                if event_source == "direct":
+                    ucmmetrics.update_stats(
+                        "dump_event_reshape_cache_direct_used_total", 1.0
+                    )
+                elif event_source == "layer":
+                    ucmmetrics.update_stats(
+                        "dump_event_reshape_cache_layer_used_total", 1.0
+                    )
+                return event_handle


⚠️ Warning - Metric inconsistency in handle extraction failure

When enable_reshape_cache_event_sync is True and we successfully get a reshape_cache_event but get_event_handle_from_event returns 0 (handle extraction failed), we fall back to the current stream path and track "dump_event_current_stream_used_total".

The metric doesn't reflect that we attempted reshape cache sync first but failed at handle extraction. Consider:

Adding a metric for "dump_event_reshape_cache_handle_failed_total"

Or updating the existing metric logic to distinguish this fallback path

dante159753 requested review from FangRun2, Infinite666, Tarrei, flesher0813, harrisonyhq, mag1c-h, qyh111 and ygwpz as code owners June 12, 2026 08:53

dante159753 added 4 commits June 12, 2026 17:10

Use reshape cache event for dump sync

74d5fc7

Add opt-in reshape cache event sync

bdfc4db

Prefer layer reshape cache event

5100ee1

Fix reshape event sync formatting

8167f53

dante159753 force-pushed the cache-dump-event-sync-only branch from a37355f to 8167f53 Compare June 12, 2026 09:11

ygwpz reviewed Jun 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Add opt-in reshape cache event sync#1025

[Feat] Add opt-in reshape cache event sync#1025
dante159753 wants to merge 4 commits into
ModelEngine-Group:developfrom
dante159753:cache-dump-event-sync-only

dante159753 commented Jun 12, 2026

Uh oh!

ygwpz Jun 13, 2026

Uh oh!

ygwpz Jun 13, 2026

Uh oh!

ygwpz Jun 13, 2026

Uh oh!

ygwpz Jun 13, 2026

Uh oh!

ygwpz Jun 13, 2026

Uh oh!

ygwpz Jun 13, 2026

Uh oh!

ygwpz Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dante159753 commented Jun 12, 2026

Summary

Why

Scope

Verification

Uh oh!

ygwpz Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

ygwpz Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

ygwpz Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

ygwpz Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

ygwpz Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

ygwpz Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

ygwpz Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants