feat(cache): support batch get-or-insert single-flight by zouhuajian · Pull Request #7523 · lance-format/lance

zouhuajian · 2026-06-30T09:15:20Z

Feature

What is the new feature?

This adds backend-level batch get-or-insert support to Lance cache primitives.

The new API allows callers to provide one batch loader for multiple missing keys while preserving backend-owned per-key single-flight semantics. This is intended as cache-layer groundwork for callers such as coalesced BTree page reads, where we need both:

coalesced loading: one loader can load multiple missing pages together
single-flight: concurrent cold overlapping requests do not duplicate loading for the same key

Related to #6759.

Why do we need this feature?

CacheBackend::get_or_insert currently supports one key at a time. That works for single-key cache misses because the backend can deduplicate concurrent loaders for the same key.

For batch-oriented callers, neither existing pattern provides both desired properties:

calling single-key get_or_insert in a loop preserves single-flight, but loses batch/coalesced loading
doing manual get + batch load + insert preserves batch loading, but bypasses backend single-flight

Under concurrent cold overlapping range reads, the manual batch pattern can duplicate page IO and materialization for the same logical cache keys.

How does it work?

Adds low-level batch cache types and API:
- CacheLoadedEntry
- CacheBatchEntry
- CacheBatchLoader
- CacheBackend::get_or_insert_many
Adds typed cache APIs:
- LanceCache::get_or_insert_with_key_batch
- WeakLanceCache::get_or_insert_with_key_batch
- CacheBatchValue<T>
Implements true batch single-flight in MokaCacheBackend.
Uses one shared in-flight registry for both single-key and batch get-or-insert paths, so mixed callers still coalesce on the same key.
Ensures the batch loader only receives missing keys currently owned by this caller.
Preserves input key order in returned values.
Rejects duplicate input keys with InvalidInput.
Validates loader output exactly:
- missing returned key => InvalidInput
- extra returned key => InvalidInput
- duplicate returned key => InvalidInput
Clarifies was_cached semantics:
- true means this call did not run the loader for that key
- this includes ordinary cache hits and values produced by another in-flight owner
Keeps the default trait implementation as a compatibility fallback that calls single-key get_or_insert one key at a time.

Compatibility

This is an additive cache API change.

Existing single-key get_or_insert behavior remains compatible. The default get_or_insert_many implementation preserves existing backend single-key semantics for custom backends, but does not preserve coalesced batch loading unless a backend overrides it.

This PR does not change BTree query logic directly. It only adds the cache primitive needed by future multi-page miss paths.

Testing

Local verification run on commit 50dd286d2:

cargo fmt --all -- --check
cargo test -p lance-core cache
- 38 passed
cargo test -p lance-core --doc cache
- 3 passed, 2 ignored
cargo test -p lance-core --bench cache_batch -- --test
- all benchmark scenarios reported Success
cargo clippy -p lance-core --all-targets -- -D warnings
cargo bench -p lance-core --bench cache_batch -- --noplot

Benchmarks

These are cache primitive microbenchmarks, not end-to-end BTree or object-store benchmarks. They intentionally use small usize values and cheap / yielding loaders so the cache coordination overhead and duplicate-load behavior are visible.

Benchmark environment:

MacBook Pro, Apple M1 Pro, 16 GiB memory
macOS 15.6, arm64
Rust 1.94.0
Command: cargo bench -p lance-core --bench cache_batch -- --noplot
Criterion config: sample size 10, 100 ms warmup, 250 ms measurement time

Each row runs multiple concurrent tasks against the same cache. Batch size
is the number of keys requested by each task. Concurrent tasks is the number
of simultaneous cache callers. Key overlap controls how many keys are shared
across those concurrent callers.

Actual loads counts how many entries were actually produced by loaders across
all concurrent tasks. Lower load counts indicate less duplicate cold-miss work.

Batch size	Concurrent tasks	Key overlap	Loader	Actual loads: single / manual / new batch	Single-key loop	Manual get + batch load + insert	Batch get_or_insert_many	New vs manual
64	8	50%	cheap	288 / 512 / 288	0.961 ms	0.843 ms	1.258 ms	1.49x
64	8	100%	cheap	64 / 512 / 64	0.395 ms	0.851 ms	0.702 ms	0.82x
64	8	50%	yield_once	288 / 512 / 288	1.269 ms	0.827 ms	1.335 ms	1.61x
64	8	100%	yield_once	64 / 512 / 64	0.660 ms	0.936 ms	0.718 ms	0.77x
8	32	50%	cheap	132 / 256 / 132	0.523 ms	0.532 ms	0.625 ms	1.18x
64	8	0%	cheap	n/a / 512 / 512	n/a	0.671 ms	1.815 ms	2.71x
64	8	0%	yield_once	n/a / 512 / 512	n/a	0.700 ms	1.894 ms	2.71x
512	32	0%	cheap	n/a / 16384 / 16384	n/a	20.414 ms	61.045 ms	2.99x

Interpretation:

The new batch API is not intended to be a universal faster path for cheap disjoint misses.
In no-overlap scenarios, it has expected overhead from per-key flight coordination, validation, and result ordering.
The value is semantic: overlapping cold batch requests preserve batch/coalesced loading while avoiding duplicate loads for shared keys.
In full-overlap scenarios, the new batch API avoids the duplicate work in the manual get + load + insert pattern and is faster in this microbenchmark.
The single-key loop still has strong single-flight behavior, but it cannot give a multi-key loader the full missing-key set for coalesced IO.

Add CacheBackend::get_or_insert_many and typed LanceCache / WeakLanceCache batch get-or-insert APIs. Implement true per-key single-flight for MokaCacheBackend using a shared in-flight registry across single-key and batch paths. Batch loaders only receive keys owned by the current caller, results preserve input order, duplicate input keys are rejected, and loader output must exactly match the owned keys. Add tests for overlapping batches, mixed single/batch callers, owner failure retry, validation errors, WeakLanceCache fallback behavior, and cache stats. Add Criterion microbenchmarks for cache-level batch get-or-insert tradeoffs.

wjones127

calling single-key get_or_insert in a loop preserves single-flight, but loses batch/coalesced loading

I don't understand this part. Why can't you call each key with get_or_insert() concurrently? Moka already handles the multiple in-flight requests?

My impression is this PR adds a lot of complexity to the cache API and implementation. So we'll need a good reason to add it.

I think the actual benchmark comparison I'd like to see is get_or_insert_with_key_batch vs get_or_insert_with_key concurrently for each key. Comparing vs a single threaded loop or the manual get insert feels like a straw man.

wjones127 · 2026-06-30T20:01:38Z

+                })?;
+                Ok((loaded.entry, loaded.size_bytes))
+            });
+            let (entry, was_cached) = self.get_or_insert(&key, single_loader, codec).await?;


suggestion: get_or_insert is async, so there's no reason you couldn't do this for all keys concurrently. Unless course, there's some reason that you don't want to call loader beyond a certain concurrency, but I suppose it's on the caller to call with a small enough batch.

zouhuajian · 2026-07-04T03:55:00Z

I don't understand this part. Why can't you call each key with get_or_insert() concurrently? Moka already handles the multiple in-flight requests?

My impression is this PR adds a lot of complexity to the cache API and implementation. So we'll need a good reason to add it.

I think the actual benchmark comparison I'd like to see is get_or_insert_with_key_batch vs get_or_insert_with_key concurrently for each key. Comparing vs a single threaded loop or the manual get insert feels like a straw man.

Thanks for the review.

The original motivation for this PR came from #6759. That PR tried to coalesce BTree page_data reads by physical file/range, but the manual get + batch load + insert path would bypass the existing per-key single-flight behavior in get_or_insert.

After looking more carefully at the scalar index code, I don’t think this is a good generic cache abstraction. The concrete need is BTree-specific, and I also closed #6759 because I don’t yet have enough evidence that broad cold BTree range reads are common enough to justify this complexity.

Bitmap and FTS/FST do not seem to need this generic batch primitive: bitmap caches per value, and FTS already has grouped posting-list cache keys.

So I’m going to close this PR. If this comes back, I think it should start with BTree-specific metrics and a narrower page-materialization design.

github-actions Bot added A-deps Dependency updates enhancement New feature or request labels Jun 30, 2026

zouhuajian mentioned this pull request Jun 30, 2026

feat(index): coalesce btree scalar index page-data reads #6759

Closed

wjones127 self-requested a review June 30, 2026 15:03

wjones127 requested changes Jun 30, 2026

View reviewed changes

zouhuajian closed this Jul 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(cache): support batch get-or-insert single-flight#7523

feat(cache): support batch get-or-insert single-flight#7523
zouhuajian wants to merge 1 commit into
lance-format:mainfrom
zouhuajian:feat/cache-get-or-insert-batch

zouhuajian commented Jun 30, 2026 •

edited

Loading

Uh oh!

wjones127 left a comment

Uh oh!

wjones127 Jun 30, 2026

Uh oh!

zouhuajian commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

zouhuajian commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Feature

What is the new feature?

Why do we need this feature?

How does it work?

Compatibility

Testing

Benchmarks

Uh oh!

wjones127 left a comment

Choose a reason for hiding this comment

Uh oh!

wjones127 Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

zouhuajian commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zouhuajian commented Jun 30, 2026 •

edited

Loading