Skip to content

prototype for negative caching in StoreCache#4042

Open
espg wants to merge 9 commits into
zarr-developers:mainfrom
espg:feat/cache-store-negative-caching
Open

prototype for negative caching in StoreCache#4042
espg wants to merge 9 commits into
zarr-developers:mainfrom
espg:feat/cache-store-negative-caching

Conversation

@espg

@espg espg commented Jun 5, 2026

Copy link
Copy Markdown

Adds opt-in opt-out negative caching to zarr.experimental.cache_store.CacheStore: when enabled, a full-key read that finds the key absent in the source store is remembered, so subsequent reads of that absent key return None immediately without a source round-trip. The remembered miss is evicted when the key is later written. Default off; no behavior change unless cache_missing=True. Default on (cache_missing=True, opt-out). Negative caching affects only full-key reads of keys absent in the source; results for keys that exist, byte-range reads, and exists() are unchanged. Pass cache_missing=False to disable. Follows from discussion on #4028

Note

Revised since first draft. Per @d-v-b's review, present and absent keys are now tracked in a single structure (a key can never be both cached and marked missing), and negative markers share the max_size budget with cached values instead of being bounded by TTL alone. Edits below are shown with strike-through for the prior text and bold for the update.

Motivation

CacheStore caches present values only. On a full-key miss it deletes any stale entry and stores nothing, so a key absent in the source is a permanent cache miss — every read re-pays a source round-trip. This is the dominant cost when reading sparse arrays through a CacheStore: most chunks are empty, and the positive cache structurally cannot help (there is no value to store, and "not in cache" is indistinguishable from "not cached yet"). Negative caching closes that gap.

It is intentionally narrow: it benefits the stock arr[:] path (which probes every chunk) read repeatedly through a CacheStore. Code using the #4028 discovery primitives (zarr.shards_initialized / zarr.read_regions) never issues the empty-chunk reads in the first place and does not need this.

API

from zarr.experimental.cache_store import CacheStore

cached = CacheStore(
    source_store,
    cache_store=cache_backend,
    # cache_missing=True is the default; pass False to disable
    max_age_seconds=300,  # recommended: bound staleness of remembered misses
)
  • cache_missing: bool = True — remember full-key misses (opt-out).
  • cache_stats() gains negative_hits; cache_info() gains cache_missing and missing_keys.

No new bounding parameter is introduced: remembered misses are bounded by the existing max_age_seconds, mirroring how the positive cache is bounded by max_size. No new bounding parameter is introduced: negative markers share the existing max_size budget with cached values (each charged a small flat overhead for its index slot), so a single max_size bounds total cache memory. Under memory pressure miss-markers are evicted before any cached value, and a marker never displaces cached data. Markers also respect max_age_seconds. When max_size is None both caches are unbounded (as today), so set a finite max_size and/or max_age_seconds for scans over very large sparse key spaces.

Note

cache_info()["current_size"] (and __repr__) now include the small flat overhead charged per negative marker, since markers share the max_size budget.

Design

  • Store-level, key-based~~.~~, single index. CacheStore wraps a whole store and sees opaque keys (no chunk-grid knowledge), so negative knowledge is tracked per full key in a small dict[str, float] (key → insert time). Negative entries carry no bytes and are kept out of the max_size byte budget, so they never evict real cached data. Present and absent keys live in one OrderedDict[key, _Entry] (_Entry = insert_time, size, present), so a key occupies exactly one slot and can never be simultaneously cached and marked missing — the invariant is structural, not maintained by eviction bookkeeping. Negative markers are charged a small flat overhead against max_size but are strictly lower priority than cached data: they are evicted first under pressure (LRU), only ever displace other markers when recording, and never evict a cached value.
  • TTL'd. Remembered misses respect max_age_seconds, so a key written to the source out-of-band becomes visible again after expiry. Like the positive cache (unbounded when max_size is None), the negative cache is bounded only by max_age_seconds; with an infinite TTL a scan over a very large sparse key space accumulates one small entry per absent key, so set a finite TTL (or cache_missing=False) for such workloads. With max_size set, negatives are additionally bounded by the shared byte budget (see API). With max_size is None and an infinite TTL, a scan over a very large sparse key space accumulates one small entry per absent key, so set a finite max_size/TTL (or cache_missing=False) for such workloads. This is called out in the docstring.
  • Write-eviction. set and an overridden set_if_not_exists drop any remembered miss for the key (reclaiming its charged bytes). delete does not create one (a delete is a mutation, not a checked-absence read).
  • Scope. Full-key reads only — byte-range misses and exists() are unchanged. exists() deliberately does not consult the negative cache (the default set_if_not_exists calls exists then set; a stale "missing" there could overwrite present data).
  • Stats. A negative hit is reported separately as negative_hits and counts as neither a hit nor a miss, so the positive hit_rate is unaffected.

Correctness notes

  • TTL staleness: with the default max_age_seconds="infinity" a remembered miss never expires, so a key written by another process stays invisible through the cache until eviction-on-write. Pair cache_missing=True with a finite max_age_seconds when the source may be written concurrently.
  • TOCTOU window: the source get runs outside the state lock, so a concurrent set can land between the source returning None and the miss being recorded. This is the same window the positive cache already has; it is TTL-bounded and self-heals. Documented as a known limitation rather than over-engineered away.

Testing

tests/test_experimental/test_cache_store.py — new TestCacheStoreNegativeCaching: enabled-by-default and cache_missing=False disable, basic negative hit (asserts the source is hit exactly once via monkeypatch), eviction on set and set_if_not_exists, TTL expiry with an out-of-band source write, byte-range reads unaffected, stats/info surfacing, and delete does not record. Additional tests cover the shared-budget behaviour: a negative marker is charged against max_size, misses are bounded by the budget (LRU eviction of the oldest markers), markers are evicted before cached values, and the marker charge is reclaimed on set/set_if_not_exists (no current_size leak). The existing test_cache_info key-set assertion is updated for the two new info keys. Full suite: 54 passed 59 passed; ruff, mypy (strict), and numpydoc clean.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/ (changes/4042.feature.md)
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@codecov

codecov Bot commented Jun 5, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.57%. Comparing base (036ede7) to head (bb0a346).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4042      +/-   ##
==========================================
+ Coverage   93.50%   93.57%   +0.06%     
==========================================
  Files          90       90              
  Lines       11981    12030      +49     
==========================================
+ Hits        11203    11257      +54     
+ Misses        778      773       -5     
Files with missing lines Coverage Δ
src/zarr/experimental/cache_store.py 92.62% <100.00%> (+4.41%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@d-v-b

d-v-b commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

IMO it might be better to use a single key: value mapping, where "present" and "absent" are two possible values, instead of completely separate mappings for present vs missing keys. 2 mappings opens the possibility that a key is cached AND marked as missing, which we should prevent structurally if we can

@d-v-b

d-v-b commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

(this is very cool btw, thank you for working on this)

@espg

espg commented Jun 26, 2026

Copy link
Copy Markdown
Author

Thanks @d-v-b — I switched to the single key mapping, and I think it came out much cleaner.

There's now a single entries: OrderedDict[key, _Entry] where each _Entry carries (insert_time, size, present). present=True is a cached value, present=False is a known-absent marker. So a key maps to exactly one slot, and "cached" and "missing" are mutually exclusive by construction.

Since both entries in the store are tracked the same way now, present and absent entries share one max_size budget, with each absent marker charged a small flat overhead. Eviction is absent-first, and a miss-marker never displaces cached data — it only evicts older markers, and is skipped entirely if the cache is full of real values. That stops a flood of empty cells from growing memory unbounded, without ever letting a miss cost you a cached chunk.

I switched the default to cache_missing=True on this, which seemed like it might be appropriate since this lives in experimental. For the dense case with no misses this should operate very similar to what we had before, and the negative caching is pretty light weight (since we don't have a value to catch). Let me know if you'd rather this be opt-in (or always on with no flag).

@espg espg changed the title [WIP] prototype for negative caching in StoreCache prototype for negative caching in StoreCache Jun 26, 2026
@d-v-b

d-v-b commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

hi @espg, that's great. claude had some concerns about the new defaults + default TTL of infinity:

  1. cache_store.py:165 (+ get fast-path 509–514) — defaulting cache_missing=True under the default max_age_seconds="infinity" is a silent cache-coherency regression. Before this PR an absent key was re-probed on every read, so an out-of-band write became visible on the
    next read. Now, with no extra args, the first absent read records a marker that never expires (_is_fresh returns True unconditionally under "infinity"), so the key stays invisible through the cache forever until written through this instance. Scenario: CacheStore(store,
    cache_store=...), reader does arr[:] (chunk c/0 absent → fill value); another process writes c/0 directly to the source; reader re-runs arr[:] → still sees the fill value, permanently. Documented as a limitation, but it's now the out-of-the-box default. Consider
    defaulting cache_missing=False, or warning when combined with "infinity".
  2. cache_store.py:164 (+ _record_missing 242) — unbounded negative-cache growth under the default max_size=None. The flat-overhead budgeting only bounds anything when max_size is set; the marker eviction loop is gated by if self.max_size is not None. With the defaults,
    a single arr[:] scan over a large sparse array leaves one permanent _Entry + key string per absent chunk (a 10M-chunk scan → ~10M retained markers), where previously the scan left nothing. This is precisely the workload the feature targets, so the default config is the
    worst case.
  3. cache_store.py:407 / _cache_miss + _record_missing — a marker recorded after the out-of-lock _cache.delete can shadow a concurrently-written present value, and under the default infinity TTL never self-heals. _cache_miss does await self._cache.delete(key) before
    taking the lock, then records the marker under the lock; _record_missing never touches the backing _cache. Scenario: thread B sees the source momentarily absent, runs _cache.delete("k") and yields; thread A completes set("k", val) (writes _cache, marks present); thread
    B resumes and overwrites the slot with an absent marker while val still sits in _cache. Subsequent get("k") hits the fast-path and returns None for a key that has a cached value — violating the docstring's "present and absent are mutually exclusive" invariant. The PR's
    "TOCTOU… self-heals" note assumes a finite TTL; with the infinity default it does not self-heal.
  4. cache_store.py:540–542 — set() with cache_set_data=True and len(value) > max_size leaves an untracked orphan in the backing cache. _cache.set(key, value) writes, but _track_entry returns False (value too large) and the result is discarded — unlike the byte-range path
    in _cache_miss, which rolls back. The value is then served from _cache on later reads, never counted or evicted. Pre-existing, but it directly contradicts this PR's new docstring claim that "a single max_size bounds total cache memory."
  5. cache_store.py:405–416 — a full-key miss does not invalidate cached byte-range entries for the same key. After a full get("k") records absence, a previously cached (k, range) entry survives in range_cache, so get("k") returns None while get("k",
    RangeByteRequest(0,3)) returns stale bytes. Pre-existing, but negative caching makes the divergence stable under infinity TTL. (Note the PR did add test_set_if_not_exists_invalidates_stale_byte_range for the analogous case — the miss path has the same gap.)
  6. cache_store.py:509–514 — negative-cache hits never refresh LRU recency. The fast-path increments negative_hitsand returns withoutmove_to_end(key), unlike positive hits (_update_access_order). So eviction is least-recently-inserted, not least-recently-used as
    the class / _next_eviction_candidate/_accommodate_valuedocstrings claim — a frequently-read absent key is evicted before a cold but newer one. Low severity; fix is a one-linemove_to_end` or a docstring correction.

Some of these concerns are about the change in behavior. Since this is marked as experimental, I think that's not as big a deal, but worth an impact assessment I think

@d-v-b

d-v-b commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

I think a finite default max age is probably a good idea in any case, just to keep things bounded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants