Skip to content

fix(index): serialize FTS position prewarm to avoid an IO-scheduler deadlock#7623

Open
BubbleCal wants to merge 1 commit into
mainfrom
yang/fts-position-prewarm-serial
Open

fix(index): serialize FTS position prewarm to avoid an IO-scheduler deadlock#7623
BubbleCal wants to merge 1 commit into
mainfrom
yang/fts-position-prewarm-serial

Conversation

@BubbleCal

Copy link
Copy Markdown
Contributor

Problem

prewarm_index(..., with_position=True) deadlocks on position-bearing inverted indexes: all tokio workers park and the prewarm never completes (reproduced on a 50M-doc index; >14 min with zero progress before this fix, vs ~19 min to fully prewarm 162G after it).

Position streams are dominated by a few huge hot-token rows, so prewarm chunks sized for the 128MB average routinely span hundreds of MBs to GBs. Two or more such read_ranges in flight on the store's shared ScanScheduler can exhaust its byte backpressure window while every request still has undelivered pages: a request's later pages never pass the min_in_flight priority bypass, so nothing can complete and nothing frees the window.

Fix

A single request in flight always delivers in order and recycles the window, so position prewarm now runs its chunks serially. Position-less prewarm keeps the concurrent path (its chunks are bounded by the 128MB target and never wedge).

The scheduler-side wedge (concurrent large read_ranges on one ScanScheduler) is a separate latent lance-io issue; this change removes the only known trigger.

Verification

  • Repro before/after on a 50M-doc 162G positions index: with_position prewarm hung (>840s timeout) → completes in ~19 min; position-less prewarm unchanged (~30s).
  • cargo test -p lance-index (inverted suite), cargo clippy -- -D warnings, cargo fmt --check all clean.

🤖 Generated with Claude Code

…backpressure wedge

Position streams are dominated by a few huge hot-token rows, so
position-bearing prewarm chunks routinely span hundreds of MBs. Two or
more such read_ranges in flight on the store's shared ScanScheduler can
exhaust its backpressure window while every request still has
undelivered pages (later pages of a request never pass the
min_in_flight priority bypass), deadlocking the prewarm. A single
request in flight always delivers in order and recycles the window, so
run position prewarm chunks serially.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions github-actions Bot added A-index Vector index, linalg, tokenizer bug Something isn't working and removed A-index Vector index, linalg, tokenizer labels Jul 4, 2026
@codecov

codecov Bot commented Jul 4, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant