perf(fts): bulk MAXSCORE search path for top-k disjunctions#7603
Open
BubbleCal wants to merge 1 commit into
Open
perf(fts): bulk MAXSCORE search path for top-k disjunctions#7603BubbleCal wants to merge 1 commit into
BubbleCal wants to merge 1 commit into
Conversation
This was referenced Jul 3, 2026
Port of Lucene's MaxScoreBulkScorer, opt-in via LANCE_FTS_MAXSCORE=1: per outer window (bounded by the essential clauses' blocks with adaptive growth), clauses split into a non-essential prefix and essential rest by window max score vs the running threshold. Essential clauses bulk-stream decompressed blocks (single-essential windows stream with no accumulator); non-essential clauses are only probed for candidates that can still beat the threshold. Dead ranges with one live clause skip by scanning the baked per-block bound slab. Candidate emission matches the classic path (must beat the running threshold), so results are score-identical. Measured on a 200M-doc warm benchmark at 24 partitions: 3-word OR match k10 0.137s -> 0.035s, k100 0.250s -> 0.064s; hot single-term 250ms -> 3ms.
0781681 to
acac9d8
Compare
0ec1da3 to
96b172d
Compare
Contributor
Author
|
Rebased on the updated train: the maxscore paths fetch scoring doc lengths through DocSet::scoring_num_tokens (quantized on V3 partitions per #7466), and the ImpactScoreCache machinery this rework makes dead is now removed here (it previously tripped clippy -D warnings downstream). No behavior change beyond the rebase. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on the impact skip data PR.
Port of Lucene's MaxScoreBulkScorer, opt-in via
LANCE_FTS_MAXSCORE=1:per outer window (bounded by the essential clauses' blocks with adaptive
growth), clauses split into a non-essential prefix and essential rest by
window max score vs the running threshold. Essential clauses bulk-stream
decompressed blocks (single-essential windows stream with no accumulator);
non-essential clauses are only probed for candidates that can still beat
the threshold. Dead ranges with one live clause skip by scanning the baked
per-block bound slab. Candidate emission matches the classic path, so
results are score-identical (verified over a 40-case A/B snapshot).
Measured on a 200M-doc warm benchmark at 24 partitions, 3-word OR match:
k10 0.137s -> 0.035s, k100 0.250s -> 0.064s; hot single-term 250ms -> 3ms.
🤖 Generated with Claude Code