perf(fts): bulk MAXSCORE search path for top-k disjunctions by BubbleCal · Pull Request #7603 · lance-format/lance

BubbleCal · 2026-07-03T05:21:11Z

Stacked on the impact skip data PR.

Port of Lucene's MaxScoreBulkScorer, opt-in via LANCE_FTS_MAXSCORE=1:
per outer window (bounded by the essential clauses' blocks with adaptive
growth), clauses split into a non-essential prefix and essential rest by
window max score vs the running threshold. Essential clauses bulk-stream
decompressed blocks (single-essential windows stream with no accumulator);
non-essential clauses are only probed for candidates that can still beat
the threshold. Dead ranges with one live clause skip by scanning the baked
per-block bound slab. Candidate emission matches the classic path, so
results are score-identical (verified over a 40-case A/B snapshot).

Measured on a 200M-doc warm benchmark at 24 partitions, 3-word OR match:
k10 0.137s -> 0.035s, k100 0.250s -> 0.064s; hot single-term 250ms -> 3ms.

🤖 Generated with Claude Code

Port of Lucene's MaxScoreBulkScorer, opt-in via LANCE_FTS_MAXSCORE=1: per outer window (bounded by the essential clauses' blocks with adaptive growth), clauses split into a non-essential prefix and essential rest by window max score vs the running threshold. Essential clauses bulk-stream decompressed blocks (single-essential windows stream with no accumulator); non-essential clauses are only probed for candidates that can still beat the threshold. Dead ranges with one live clause skip by scanning the baked per-block bound slab. Candidate emission matches the classic path (must beat the running threshold), so results are score-identical. Measured on a 200M-doc warm benchmark at 24 partitions: 3-word OR match k10 0.137s -> 0.035s, k100 0.250s -> 0.064s; hot single-term 250ms -> 3ms.

BubbleCal · 2026-07-04T18:20:34Z

Rebased on the updated train: the maxscore paths fetch scoring doc lengths through DocSet::scoring_num_tokens (quantized on V3 partitions per #7466), and the ImpactScoreCache machinery this rework makes dead is now removed here (it previously tripped clippy -D warnings downstream). No behavior change beyond the rebase.

github-actions Bot added A-index Vector index, linalg, tokenizer performance labels Jul 3, 2026

This was referenced Jul 3, 2026

test(fts): benchmark new FTS algo #7605

Closed

perf(index): bulk conjunction path for FTS AND and phrase queries #7624

Open

BubbleCal force-pushed the yang/lan2-88-fts-maxscore-search branch from 0781681 to acac9d8 Compare July 4, 2026 18:17

BubbleCal force-pushed the yang/lan2-88-impact-skip-data branch from 0ec1da3 to 96b172d Compare July 4, 2026 18:17

BubbleCal mentioned this pull request Jul 4, 2026

feat(fts)!: add configurable posting block size #7466

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(fts): bulk MAXSCORE search path for top-k disjunctions#7603

perf(fts): bulk MAXSCORE search path for top-k disjunctions#7603
BubbleCal wants to merge 1 commit into
yang/lan2-88-impact-skip-datafrom
yang/lan2-88-fts-maxscore-search

BubbleCal commented Jul 3, 2026

Uh oh!

BubbleCal commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

BubbleCal commented Jul 3, 2026

Uh oh!

BubbleCal commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant