Skip to content

[4.0-dev] fix data branch no-op update diff#25159

Open
gouhongshen wants to merge 2 commits into
matrixorigin:4.0-devfrom
gouhongshen:codex/fix-data-branch-noop-diff-4.0
Open

[4.0-dev] fix data branch no-op update diff#25159
gouhongshen wants to merge 2 commits into
matrixorigin:4.0-devfrom
gouhongshen:codex/fix-data-branch-noop-diff-4.0

Conversation

@gouhongshen

Copy link
Copy Markdown
Contributor

Summary

  • Port fix data branch no-op update diff #25149 to 4.0-dev.
  • Suppress data branch no-op UPDATE diffs when the final visible row equals the LCA row.
  • Keep 4.0 Decimal256 compare support; add the no-op diff BVT as diff_15 because diff_14 already exists on 4.0-dev.
  • Close migrated fake-PK hashmaps inside diffDataHelper.

Related

Refs #23751
Port of #25149

Tests

  • git diff --check upstream/4.0-dev...HEAD
  • go test ./pkg/frontend -run 'TestCompareTupleWithBatchRow|TestCompareTupleValueWithVectorNormalizesValues|TestDataBranchCompareValueErrors|TestNormalizeCompareValueSupportedForms|TestHashDiff_HasLCANoopUpdateFiltering|TestHashDiff_HasLCAFakePKUpdateIsNotNoop|TestHashDiff_FakePKClosesMigratedHashmaps|TestLCAProbeJoinCastType'

@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@XuPeng-SH XuPeng-SH left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-checked this PR from a first-principles / unhappy-path angle. The no-op diff suppression is moving in the right direction, but I still see one blocker in the current head.

Blocker: reverted same-PK update chains can still emit a false UPDATE

The current handleDelsOnLCA path still decides “on LCA” from PK existence only. That is not strong enough for a repeated same-PK chain like:

  • LCA row = A
  • intermediate update = B
  • later revert = A

In that shape, both tombstones probe the same LCA row by PK, so the intermediate tombstone can be removed from tombBat before the rowid-based local-cancel path runs. That leaves the intermediate inserted row in dataHashmap, and the later compare logic only suppresses the final A == A no-op row — it can still emit the intermediate B as an UPDATE.

Why I think this is real

  • The LCA join is built on PK only (right join (values ...) ... on lca.pk = pks.pk ... order by pks.__idx_).
  • Every joined row is copied into dBat, so duplicate PK probes remain duplicated.
  • The rowid-based local insert/delete cancel-out logic only runs on rows that are still left in tombBat after the LCA classification.
  • The update detector then processes each dBat row independently.

Suggested fix

  • Do not decide “on LCA” from PK existence alone.
  • Reconcile tombstones against local dataHashmap by rowid before (or independently of) the PK-only LCA probe, or otherwise ensure only the tombstone that actually deletes the LCA version reaches the update/no-op comparison path.
  • Please also update the unit coverage so duplicate PK probes from the LCA join are represented explicitly, not just a single LCA hit.

Given that the PR is specifically trying to make repeated-update / revert chains no-op, this remaining path still looks like a blocker to me.

@gouhongshen

gouhongshen commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

The current handleDelsOnLCA path still decides “on LCA” from PK existence only.
In that shape, both tombstones probe the same LCA row by PK, so the intermediate tombstone can be removed from tombBat before the rowid-based local-cancel path runs.

I checked this path against the current PR head.

Classification: not applicable as a blocker for the current code path.

The key premise is that both same-PK tombstones can reach handleDelsOnLCA and produce duplicate PK probes. That is not reachable through the real hashDiff path: buildHashmapForTable compacts tombstones by encoded PK before findDeleteAndUpdateBat, keeps only the latest tombstone, and removes stale data rows older than that tombstone. So in the A -> B -> A case the raw changes contain two tombstones, but the LCA probe receives one PK row, not two.

I verified this two ways:

  • Forcing the test mock to return duplicate LCA rows does produce a false diff, which confirms duplicate dBat rows would be wrong.
  • The real current path does not create that shape. I added commit 3eeec9dd87 to assert this explicitly in TestHashDiff_HasLCANoopUpdateFiltering: the input still has two same-PK tombstones, but the LCA SQL contains exactly one row(...) probe after hashmap compaction.

Verification:
go test ./pkg/frontend -run 'TestCompareTupleWithBatchRow|TestCompareTupleValueWithVectorNormalizesValues|TestDataBranchCompareValueErrors|TestNormalizeCompareValueSupportedForms|TestHashDiff_HasLCANoopUpdateFiltering|TestHashDiff_HasLCAFakePKUpdateIsNotNoop|TestHashDiff_FakePKClosesMigratedHashmaps|TestLCAProbeJoinCastType' -count=1

No production fallback or special-case logic was added.

Update after rebase: conflict with latest 4.0-dev was resolved by keeping upstream errors.Join close-error aggregation together with this PR's migrated-hashmap close path.

@gouhongshen gouhongshen force-pushed the codex/fix-data-branch-noop-diff-4.0 branch from 2cb21c3 to 3eeec9d Compare June 26, 2026 05:24

@XuPeng-SH XuPeng-SH left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed current head (3eeec9d) from a correctness / unhappy-path angle.\n\nI re-checked the previous same-PK A -> B -> A concern against the real hashDiff path. The current buildHashmapForTable path compacts tombstones by PK before LCA probing: it keeps the latest tombstone, removes stale tombstones, and removes stale data rows older than the latest tombstone. That means the LCA probe receives one candidate for this chain, and the new tuple-vs-LCA-row comparison suppresses the final no-op when the visible row matches the LCA row.\n\nI also checked the fake-PK path, migrated hashmap close path, compare normalization, batch ownership around update emission, and the reader/SQL LCA result post-processing. I do not see a remaining blocker.\n\nLocal verification:\n- git diff --check origin/4.0-dev...HEAD\n- go test ./pkg/frontend -run 'TestCompareTupleWithBatchRow|TestCompareTupleValueWithVectorNormalizesValues|TestDataBranchCompareValueErrors|TestNormalizeCompareValueSupportedForms|TestHashDiff_HasLCANoopUpdateFiltering|TestHashDiff_HasLCAFakePKUpdateIsNotNoop|TestHashDiff_FakePKClosesMigratedHashmaps|TestLCAProbeJoinCastType' -count=1\n- go test ./pkg/frontend -run 'TestHashDiff|TestRunLCAProbe|TestCompareRowInWrappedBatches|TestCompareTuple|TestDataBranchCompare|TestNormalizeCompare|TestLCAProbeJoinCastType' -count=1\n- go test ./pkg/frontend/databranchutils -count=1\n\nApproved.

@mergify

mergify Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Tick the box to add this pull request to the merge queue (same as @mergifyio queue).

  • Queue this pull request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Denotes a PR that changes [1000, 1999] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants