Skip to content

[python] Fix filter on data evolution table not working issue#7211

Open
XiaoHongbo-Hope wants to merge 2 commits intoapache:masterfrom
XiaoHongbo-Hope:normal_filter_support
Open

[python] Fix filter on data evolution table not working issue#7211
XiaoHongbo-Hope wants to merge 2 commits intoapache:masterfrom
XiaoHongbo-Hope:normal_filter_support

Conversation

@XiaoHongbo-Hope
Copy link
Contributor

@XiaoHongbo-Hope XiaoHongbo-Hope commented Feb 4, 2026

Purpose/Problem

Filter not working on data evolution read: when a predicate is provided, all rows are returned.

Tests

API and Format

Documentation

@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review February 4, 2026 10:12
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as draft February 4, 2026 11:06
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review February 8, 2026 07:54
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as draft February 8, 2026 09:25
@XiaoHongbo-Hope XiaoHongbo-Hope force-pushed the normal_filter_support branch 2 times, most recently from 4354588 to b8e709d Compare February 10, 2026 11:33
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review February 10, 2026 14:12
simple_null = self._filter_batch_simple_null(batch)
if simple_null is not None:
return simple_null
if not self.predicate.has_null_check():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just use Predicate.to_arrow, it is same to other table modes.

schema=result.schema,
)
except (TypeError, ValueError, pa.ArrowInvalid) as e:
logger.debug(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exception here?

return False
if self.partition_key_predicate and not self.partition_key_predicate.test(entry.partition):
return False
if self.deletion_vectors_enabled and entry.file.level == 0: # do not read level 0 file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can refactor this, this if should be in is_primary_key_table.

if not self.predicate:
return True
if self.predicate_for_stats is None:
if self.predicate_for_stats is None or self.data_evolution:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a separate if. And we should add comments to this if, explain why there is no filtering done here.

super().__init__(table, predicate, read_type, actual_split, row_tracking_enabled)

def _push_down_predicate(self) -> Optional[Predicate]:
# Do not push predicate to file readers;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detailed comments, why not push predicate.

return ConcatBatchReader(suppliers)
merge_reader = ConcatBatchReader(suppliers)
if self.predicate is not None:
# Only apply filter when all predicate columns are in read_type (e.g. projected schema).
Copy link
Contributor

@JingsongLi JingsongLi Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we are returning here is complete row, right? So this check should be applicable to all table modes? That shouldn't be added here, it should be verified in SplitRead.init.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants