feat(cache): support batch get-or-insert single-flight#7523
Conversation
Add CacheBackend::get_or_insert_many and typed LanceCache / WeakLanceCache batch get-or-insert APIs. Implement true per-key single-flight for MokaCacheBackend using a shared in-flight registry across single-key and batch paths. Batch loaders only receive keys owned by the current caller, results preserve input order, duplicate input keys are rejected, and loader output must exactly match the owned keys. Add tests for overlapping batches, mixed single/batch callers, owner failure retry, validation errors, WeakLanceCache fallback behavior, and cache stats. Add Criterion microbenchmarks for cache-level batch get-or-insert tradeoffs.
wjones127
left a comment
There was a problem hiding this comment.
calling single-key get_or_insert in a loop preserves single-flight, but loses batch/coalesced loading
I don't understand this part. Why can't you call each key with get_or_insert() concurrently? Moka already handles the multiple in-flight requests?
My impression is this PR adds a lot of complexity to the cache API and implementation. So we'll need a good reason to add it.
I think the actual benchmark comparison I'd like to see is get_or_insert_with_key_batch vs get_or_insert_with_key concurrently for each key. Comparing vs a single threaded loop or the manual get insert feels like a straw man.
| })?; | ||
| Ok((loaded.entry, loaded.size_bytes)) | ||
| }); | ||
| let (entry, was_cached) = self.get_or_insert(&key, single_loader, codec).await?; |
There was a problem hiding this comment.
suggestion: get_or_insert is async, so there's no reason you couldn't do this for all keys concurrently. Unless course, there's some reason that you don't want to call loader beyond a certain concurrency, but I suppose it's on the caller to call with a small enough batch.
Thanks for the review. The original motivation for this PR came from #6759. That PR tried to coalesce BTree After looking more carefully at the scalar index code, I don’t think this is a good generic cache abstraction. The concrete need is BTree-specific, and I also closed #6759 because I don’t yet have enough evidence that broad cold BTree range reads are common enough to justify this complexity. Bitmap and FTS/FST do not seem to need this generic batch primitive: bitmap caches per value, and FTS already has grouped posting-list cache keys. So I’m going to close this PR. If this comes back, I think it should start with BTree-specific metrics and a narrower page-materialization design. |
Feature
What is the new feature?
This adds backend-level batch get-or-insert support to Lance cache primitives.
The new API allows callers to provide one batch loader for multiple missing keys while preserving backend-owned per-key single-flight semantics. This is intended as cache-layer groundwork for callers such as coalesced BTree page reads, where we need both:
Related to #6759.
Why do we need this feature?
CacheBackend::get_or_insertcurrently supports one key at a time. That works for single-key cache misses because the backend can deduplicate concurrent loaders for the same key.For batch-oriented callers, neither existing pattern provides both desired properties:
get_or_insertin a loop preserves single-flight, but loses batch/coalesced loadingget+ batch load +insertpreserves batch loading, but bypasses backend single-flightUnder concurrent cold overlapping range reads, the manual batch pattern can duplicate page IO and materialization for the same logical cache keys.
How does it work?
CacheLoadedEntryCacheBatchEntryCacheBatchLoaderCacheBackend::get_or_insert_manyLanceCache::get_or_insert_with_key_batchWeakLanceCache::get_or_insert_with_key_batchCacheBatchValue<T>MokaCacheBackend.InvalidInput.InvalidInputInvalidInputInvalidInputwas_cachedsemantics:truemeans this call did not run the loader for that keyget_or_insertone key at a time.Compatibility
This is an additive cache API change.
Existing single-key
get_or_insertbehavior remains compatible. The defaultget_or_insert_manyimplementation preserves existing backend single-key semantics for custom backends, but does not preserve coalesced batch loading unless a backend overrides it.This PR does not change BTree query logic directly. It only adds the cache primitive needed by future multi-page miss paths.
Testing
Local verification run on commit
50dd286d2:cargo fmt --all -- --checkcargo test -p lance-core cachecargo test -p lance-core --doc cachecargo test -p lance-core --bench cache_batch -- --testSuccesscargo clippy -p lance-core --all-targets -- -D warningscargo bench -p lance-core --bench cache_batch -- --noplotBenchmarks
These are cache primitive microbenchmarks, not end-to-end BTree or object-store benchmarks. They intentionally use small
usizevalues and cheap / yielding loaders so the cache coordination overhead and duplicate-load behavior are visible.Benchmark environment:
cargo bench -p lance-core --bench cache_batch -- --noplotEach row runs multiple concurrent tasks against the same cache.
Batch sizeis the number of keys requested by each task.
Concurrent tasksis the numberof simultaneous cache callers.
Key overlapcontrols how many keys are sharedacross those concurrent callers.
Actual loadscounts how many entries were actually produced by loaders acrossall concurrent tasks. Lower load counts indicate less duplicate cold-miss work.
Interpretation:
get+ load +insertpattern and is faster in this microbenchmark.