Skip to content

perf(fts): bulk MAXSCORE search path for top-k disjunctions#7603

Open
BubbleCal wants to merge 1 commit into
yang/lan2-88-impact-skip-datafrom
yang/lan2-88-fts-maxscore-search
Open

perf(fts): bulk MAXSCORE search path for top-k disjunctions#7603
BubbleCal wants to merge 1 commit into
yang/lan2-88-impact-skip-datafrom
yang/lan2-88-fts-maxscore-search

Conversation

@BubbleCal

Copy link
Copy Markdown
Contributor

Stacked on the impact skip data PR.

Port of Lucene's MaxScoreBulkScorer, opt-in via LANCE_FTS_MAXSCORE=1:
per outer window (bounded by the essential clauses' blocks with adaptive
growth), clauses split into a non-essential prefix and essential rest by
window max score vs the running threshold. Essential clauses bulk-stream
decompressed blocks (single-essential windows stream with no accumulator);
non-essential clauses are only probed for candidates that can still beat
the threshold. Dead ranges with one live clause skip by scanning the baked
per-block bound slab. Candidate emission matches the classic path, so
results are score-identical (verified over a 40-case A/B snapshot).

Measured on a 200M-doc warm benchmark at 24 partitions, 3-word OR match:
k10 0.137s -> 0.035s, k100 0.250s -> 0.064s; hot single-term 250ms -> 3ms.

🤖 Generated with Claude Code

Port of Lucene's MaxScoreBulkScorer, opt-in via LANCE_FTS_MAXSCORE=1: per
outer window (bounded by the essential clauses' blocks with adaptive
growth), clauses split into a non-essential prefix and essential rest by
window max score vs the running threshold. Essential clauses bulk-stream
decompressed blocks (single-essential windows stream with no accumulator);
non-essential clauses are only probed for candidates that can still beat
the threshold. Dead ranges with one live clause skip by scanning the baked
per-block bound slab. Candidate emission matches the classic path (must
beat the running threshold), so results are score-identical.

Measured on a 200M-doc warm benchmark at 24 partitions: 3-word OR match
k10 0.137s -> 0.035s, k100 0.250s -> 0.064s; hot single-term 250ms -> 3ms.
@BubbleCal BubbleCal force-pushed the yang/lan2-88-fts-maxscore-search branch from 0781681 to acac9d8 Compare July 4, 2026 18:17
@BubbleCal BubbleCal force-pushed the yang/lan2-88-impact-skip-data branch from 0ec1da3 to 96b172d Compare July 4, 2026 18:17
@BubbleCal

Copy link
Copy Markdown
Contributor Author

Rebased on the updated train: the maxscore paths fetch scoring doc lengths through DocSet::scoring_num_tokens (quantized on V3 partitions per #7466), and the ImpactScoreCache machinery this rework makes dead is now removed here (it previously tripped clippy -D warnings downstream). No behavior change beyond the rebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-index Vector index, linalg, tokenizer performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant