fix: cross-CN shuffle join INSERT...SELECT hang on multi-CN cluster (#24919) by ck89119 · Pull Request #25158 · matrixorigin/matrixone

ck89119 · 2026-06-25T08:36:07Z

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

Fixes a silent deadlock where INSERT ... SELECT with cross-CN shuffle hash join (shuffle: range(...)) over large tables (>=5M rows/table) hangs forever on multi-CN clusters. The same statement completes in ~3-6s on a single-node deployment.

Root cause

newShuffleJoinScopeList leaves a CN's dop join buckets as independent RemoteRun trees, while the shuffle dispatch attaches to only the first bucket. When compileMultiUpdate (used by large INSERT...SELECT) sends each bucket individually via newMergeScope:

checkPipelineStandaloneExecutableAtRemote detects the dispatch's LocalRegs pointing to out-of-tree sibling buckets → correctly returns false
RemoteRun converts the pipeline to local on the coordinator
The dispatch runs on the coordinator instead of its compile-time CN, mispaired with the cross-CN receiver's FromAddr
Remote receivers GetProcByUuid endlessly busy-spin (300s timeout) → idle WaitingEnd deadlock (uncancellable context.TODO())

Fix

Add groupShuffleBucketsByCNIfNeeded (gated: multi-CN + cross-CN dispatch subtree present) which reuses mergeScopesByCN to group same-CN shuffle buckets — together with their nested dispatch — into one per-CN send unit. When the whole group is sent via a single RemoteRun to its target CN:

checkPipelineStandaloneExecutableAtRemote returns true (all dispatch LocalRegs are within the same tree)
The dispatch runs on the correct CN, completing the cross-CN receiver handshake
No changes to the RemoteRun protocol, dispatch operator, or checkPipeline logic

The grouping is a no-op for single-CN / non-shuffle inserts (confirmed by gating UT).

Changes (2 files, +171 lines)

pkg/sql/compile/compile.go (+62): groupShuffleBucketsByCNIfNeeded + scopeTreeHasCrossCNDispatch helpers, wired into compileInsert and compileMultiUpdate (4 call sites: toWriteS3 and !toWriteS3 paths)
pkg/sql/compile/remoterun_test.go (+109): UT reproducing the checkPipeline=false pre-fix + verifying per-CN containers return true post-fix + gating test (no regressions for single-CN / non-shuffle)

Verification

2-CN docker cluster (etc/docker-multi-cn-local-disk): 5M rows × 2 consecutive runs complete in ~17s each, count(*)=5,000,000, data fully correct (distinct_id=5,000,000, bad_pad=0, bad_k=0). Previously hung forever.
Unit tests: -race pass; new helper coverage 100%; go vet clean; go build ./pkg/sql/... passes
Gating: single-CN and non-shuffle paths unaffected (UT confirms no-op when no cross-CN dispatch)

🤖 Generated with Claude Code

…atrixorigin#24919) Root cause: newShuffleJoinScopeList leaves a CNs dop join buckets in separate RemoteRun trees while the shuffle dispatch attaches to only the first one. In compileMultiUpdate (used by large INSERT...SELECT), newMergeScope sends each bucket individually -> checkPipeline detects dispatch.LocalRegs pointing to out-of-tree sibling buckets -> converts to local on coordinator -> dispatch runs on wrong CN, mispaired with compile-time cross-CN receiver FromAddr -> remote GetProcByUuid spins / merge WaitingEnd waits forever -> 5M+ rows hang. Fix: add groupShuffleBucketsByCNIfNeeded (gating: multi-CN + cross-CN dispatch present) which merges same-CN shuffle buckets and their nested dispatch into one per-CN send unit via mergeScopesByCN. When the whole group is sent as one RemoteRun unit to its target CN, the dispatch runs on the correct CN, checkPipeline returns true, and the cross-CN receiver handshake completes normally. Noop for single-CN / non-shuffle inserts. Wired into compileInsert and compileMultiUpdate (toWriteS3 + !toWriteS3, 4 call sites). Verified on 2-CN docker cluster: 5Mx2 consecutive runs complete in ~17s with count(*)=5,000,000 and correct data; previously hung forever. Co-Authored-By: Claude <noreply@anthropic.com>

qodo-code-review · 2026-06-25T08:36:12Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

- document scopeTreeHasCrossCNDispatch precondition (dispatch is always RootOp) - document that grouping preserves callers existing root operator via AppendChild - extend grouping to the compileInsert S3 sink-scan path: dataScope.MergeRun still sends each bucket individually, so a cross-CN shuffle dispatch there would hit the same convert-to-local hang; group same-CN buckets first Co-Authored-By: Claude <noreply@anthropic.com>

ck89119 requested review from aunjgr and ouyuanning as code owners June 25, 2026 08:36

ck89119 temporarily deployed to ci June 25, 2026 08:36 — with GitHub Actions Inactive

matrix-meow added the size/M Denotes a PR that changes [100,499] lines label Jun 25, 2026

mergify Bot requested a review from XuPeng-SH June 25, 2026 08:36

mergify Bot added the kind/bug Something isn't working label Jun 25, 2026

ck89119 temporarily deployed to ci June 25, 2026 10:27 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: cross-CN shuffle join INSERT...SELECT hang on multi-CN cluster (#24919)#25158

fix: cross-CN shuffle join INSERT...SELECT hang on multi-CN cluster (#24919)#25158
ck89119 wants to merge 2 commits into
matrixorigin:3.0-devfrom
ck89119:fix-cross-cn-shuffle-hang-3.0-dev

ck89119 commented Jun 25, 2026

Uh oh!

qodo-code-review Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ck89119 commented Jun 25, 2026

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

Root cause

Fix

Changes (2 files, +171 lines)

Verification

Uh oh!

qodo-code-review Bot commented Jun 25, 2026

Qodo reviews are paused for this user.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants