Skip to content

feat(query): use cluster ordering for top-k sort#20031

Open
dbsid wants to merge 1 commit into
databendlabs:mainfrom
dbsid:codex/cluster-key-presorted-limit
Open

feat(query): use cluster ordering for top-k sort#20031
dbsid wants to merge 1 commit into
databendlabs:mainfrom
dbsid:codex/cluster-key-presorted-limit

Conversation

@dbsid

@dbsid dbsid commented Jun 20, 2026

Copy link
Copy Markdown

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

  • Add a physical Sort(PresortedMerge) path for ORDER BY ... LIMIT when a Fuse Parquet scan can prove linear cluster-key ordering matches the requested sort order.
  • Support equality-fixed cluster-key prefixes plus extra filters, for example CLUSTER BY LINEAR(a, b, -c, d) with a = ..., b = ..., additional e/f filters, and ORDER BY c DESC, d ASC.
  • Preserve ordered scan streams after block pruning, including bounded overlapping block ranges controlled by a session setting.
  • Add Web3 Lake Q1/Q2 benchmark driver, table schema, prior Lake-side reports, and a PR-specific benchmark plan.

Implementation

  1. Proves cluster-key/order-by compatibility in physical sort planning for Fuse Parquet linear cluster keys.
  2. Skips cluster-key expressions fixed by equality filters, while leaving unrelated filters as row predicates.
  3. Adds settings:
    • enable_cluster_key_ordered_topk
    • max_cluster_key_ordered_topk_overlap
  4. Reorders pruned block partitions by block cluster statistics and assigns non-overlapping per-stream sequences for PartitionsShuffleKind::PreserveOrder.
  5. Allows bounded overlap by using row-by-row multi-stream merge after per-block partial sort.
  6. Keeps ordered Fuse scans on planner-proven part lists and avoids async pruning receiver delivery for this path.
  7. Adds Decimal unary-minus ordering support for Web3 Q2 CLUSTER BY (..., -balance) / ORDER BY balance DESC.

Notes

  • The descending -column equivalence is conservative: it requires NULLS LAST; Int64/UInt64 and Float still fall back to Sort(Single).
  • If cluster statistics are missing/stale, stream assignment cannot be proven, or active overlap exceeds max_cluster_key_ordered_topk_overlap + 1, the plan falls back to Sort(Single).
  • max_cluster_key_ordered_topk_overlap = 0 means no extra overlapping range is allowed.

Web3 Benchmark Artifacts

Added under benchmark/web3_lake/:

  • schema.sql: final two serving-table DDLs for Q1/Q2.
  • cmd/web3lake/main.go: benchmark driver for Lake and Databend MySQL protocol.
  • report.md: first Lake schema benchmark report.
  • report_v2_filter_conditions.md: second-round filter-condition exploration.
  • report_pr20031_ordered_topk.md: PR feat(query): use cluster ordering for top-k sort #20031 ordered top-k benchmark matrix and commands.

Q1/Q2 authoritative Lake-side results in the reports are separate serial query-group runs. Earlier parallel Q1+Q2 batches are retained only as stress/interference notes.

The requested us-west-2 EC2/S3 PR-specific benchmark is still pending because the STS credentials in ~/Downloads/aws.txt currently return ExpiredToken. The report file records the exact benchmark matrix to run once fresh AWS credentials are available.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test (driver/reports added; PR-specific EC2 run pending on AWS credentials)
  • No Test - Pair with the reviewer to explain why

Validation run locally:

cargo fmt --all
cargo fmt --all --check
git diff --check
CARGO_BUILD_JOBS=4 cargo test -p databend-common-pipeline-transforms row_by_row_ -- --nocapture
CARGO_BUILD_JOBS=2 cargo build -p databend-binaries --bin databend-query
target/debug/databend-sqllogictests --handlers mysql --run_file eliminate_sort_cluster_key.test --enable_sandbox --parallel 1
cd benchmark/web3_lake && go test ./... && go build ./cmd/web3lake

Latest targeted sqllogic result: 69 tests passed.

The sqllogictest covers:

  • CLUSTER BY LINEAR(a, b, -c, d) with fixed a/b, extra filters, optional fixed d, and ORDER BY c DESC, d ASC.
  • Multiple blocks remaining after pruning with Sort(PresortedMerge).
  • Bounded overlap positive path and max_cluster_key_ordered_topk_overlap = 0 fallback.
  • Feature switch fallback.
  • NULLS FIRST, missing fixed prefix, and unsafe Int64 fallback cases.
  • Decimal -balance positive case.

Type of change

  • New feature (non-breaking change which adds functionality)

This change is <img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>

@github-actions github-actions Bot added the pr-feature this PR introduces a new feature to the codebase label Jun 20, 2026
@dbsid dbsid force-pushed the codex/cluster-key-presorted-limit branch from f4e287b to c398b18 Compare June 21, 2026 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature this PR introduces a new feature to the codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant