Skip to content

[Sort Pushdown · Future B] Page-level dynamic prune at RG boundary — refresh PagePruningPredicate using runtime DynamicFilter (follow-up #22450) #23216

Description

@zhuqi-lucas

Summary

Today page-level pruning in Parquet (opener/mod.rs:1314PagePruningPredicate::prune_plan_with_page_index_and_metrics) runs once at file open with the static query predicate. #22450 added dynamic RG-level pruning at every RG boundary (should_prune in push_decoder.rs:183), but its rebuild path never re-evaluates the page-level predicate.

This issue extends #22450's "refresh at RG boundary" pattern to also refresh the PagePruningPredicate, so the page-level RowSelection of upcoming RGs is tightened by the latest TopK threshold.

Current state (source-confirmed)

Prune type Where Data Dynamic?
RG-level (#22450) push_decoder.rs:183 should_prune (RG boundary) RG metadata min/max ✅ rebuilt every RG boundary
Page-level opener/mod.rs:1314 (file open only) page index ❌ snapshot at file open
Row-level (RowFilter) per batch filter column values ✅ reads latest threshold

Gap: after #22450, RG-level is dynamic but page-level is still static. If TopK heap tightens after file open, surviving RGs still have their initial (loose) page-level RowSelection — pages whose min/max no longer survive the new threshold are still fetched + decompressed + decoded for filter-col evaluation.

Proposal

At every RG boundary (PushDecoderStreamState::transition):

  1. tracker.changed() — same single atomic load feat(parquet): intra-file early stopping via statistics + dynamic filters #22450 uses
  2. If changed: rebuild a fresh PagePruningPredicate from latest filter
  3. Walk remaining RGs in access plan; refine each RowSelection via prune_plan_with_page_index_and_metrics
  4. Apply via existing into_builder() → with_row_groups(...) → build()

Errors fall back to "keep current selection" (mirrors should_prune).

Expected wins

Saves filter-column IO + decompress + decode for individual dead pages — extends #22450's "chip away Layer B residue" philosophy from RG to page granularity.

Most useful when:

  • RGs are large (many pages each)
  • Threshold tightens significantly mid-scan (e.g. after first few RGs fill the heap)
  • Page index is enabled (prerequisite — without it, no-op)

Prerequisites

  • datafusion.execution.parquet.enable_page_index = true
  • Filter column present in file schema
  • Predicate chain contains a DynamicFilter (TopK source)

Open design questions

  1. Refresh frequency: every RG boundary, or only when tracker.changed() returns true?
  2. Granularity: refresh access plan for all surviving RGs, or only the next one to be touched?
  3. arrow-rs API gap: does the existing with_row_groups(...) path accept an updated per-RG RowSelection, or do we need a new arrow-rs API hook? (May overlap with arrow-rs#10158 territory.)
  4. Stretch goal · mid-RG refresh: refresh between pages of the same RG, not just at RG boundary. Needs a brand-new arrow-rs "mid-RG predicate adapt" callback hook.

Related

Part of the Sort Pushdown EPIC #23036, future direction.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions