feat(query): use cluster ordering for top-k sort#20031
Open
dbsid wants to merge 1 commit into
Open
Conversation
f4e287b to
c398b18
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
Sort(PresortedMerge)path forORDER BY ... LIMITwhen a Fuse Parquet scan can prove linear cluster-key ordering matches the requested sort order.CLUSTER BY LINEAR(a, b, -c, d)witha = ...,b = ..., additionale/ffilters, andORDER BY c DESC, d ASC.Implementation
enable_cluster_key_ordered_topkmax_cluster_key_ordered_topk_overlapPartitionsShuffleKind::PreserveOrder.CLUSTER BY (..., -balance)/ORDER BY balance DESC.Notes
-columnequivalence is conservative: it requiresNULLS LAST; Int64/UInt64 and Float still fall back toSort(Single).max_cluster_key_ordered_topk_overlap + 1, the plan falls back toSort(Single).max_cluster_key_ordered_topk_overlap = 0means no extra overlapping range is allowed.Web3 Benchmark Artifacts
Added under
benchmark/web3_lake/:schema.sql: final two serving-table DDLs for Q1/Q2.cmd/web3lake/main.go: benchmark driver for Lake and Databend MySQL protocol.report.md: first Lake schema benchmark report.report_v2_filter_conditions.md: second-round filter-condition exploration.report_pr20031_ordered_topk.md: PR feat(query): use cluster ordering for top-k sort #20031 ordered top-k benchmark matrix and commands.Q1/Q2 authoritative Lake-side results in the reports are separate serial query-group runs. Earlier parallel Q1+Q2 batches are retained only as stress/interference notes.
The requested us-west-2 EC2/S3 PR-specific benchmark is still pending because the STS credentials in
~/Downloads/aws.txtcurrently returnExpiredToken. The report file records the exact benchmark matrix to run once fresh AWS credentials are available.Tests
Validation run locally:
Latest targeted sqllogic result: 69 tests passed.
The sqllogictest covers:
CLUSTER BY LINEAR(a, b, -c, d)with fixeda/b, extra filters, optional fixedd, andORDER BY c DESC, d ASC.Sort(PresortedMerge).max_cluster_key_ordered_topk_overlap = 0fallback.NULLS FIRST, missing fixed prefix, and unsafeInt64fallback cases.-balancepositive case.Type of change
This change is <img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>