fix(rscthrottler): add mpool-based physical memory gate for S3 writes (4.0-dev)#25039
fix(rscthrottler): add mpool-based physical memory gate for S3 writes (4.0-dev)#25039aunjgr wants to merge 1 commit into
Conversation
Replace the removed RSS gate with a check on mpool.GlobalStats().NumCurrBytes. mpool tracks all mmap'd off-heap allocations (vectors, hash tables, S3 buffers, cache) and is real-time accurate — no stale window, no RSS/s3-buffer double-count. The pool check (limit - reserved) bounds S3 write concurrency. The mpool gate (mpoolBytes + ask > total * limitRate) bounds physical memory. Rejection triggers flushS3WriterOnMemoryPressure, freeing mpool + reserved. Fixes OOM on TPCC regression caused by matrixorigin#24892 removing the RSS gate.
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
XuPeng-SH
left a comment
There was a problem hiding this comment.
Requesting changes for a blocker in the new mpool gate.
acquireWithinRSSLimit now checks only mpoolBytes + ask before the reserved.CompareAndSwap. That is not a linearizable physical-memory admission check: once one caller succeeds, its ask is in reserved but may not be reflected in mpool.GlobalStats() until the subsequent CNS3Writer.Write allocations run. In that window, other callers can still see the same mpoolBytes and pass the same mpoolBytes + ask check.
Concrete case: total=100, limitRate=0.90, mpool=85, reserved=0, ask=4. The first acquire passes and CASes reserved to 4. Before its write allocation updates mpool, a second acquire still checks 85+4 <= 90 and also passes. After both writes allocate, mpool can exceed the intended 90-byte physical cap. With high non-S3 mpool usage and parallel S3 writers, this can admit a burst of S3 reservations on top of existing mpool pressure, so the PR can still OOM in the unhappy path it is meant to guard.
Please make the physical gate account for granted-but-not-yet-materialized S3 reservations without reintroducing the old double-count for already materialized S3 buffers, and add a regression test that keeps mpool constant across multiple granted acquires or exercises concurrent acquires under high non-S3 mpool pressure.
Local checks run:
CGO_CFLAGS="-I/home/xupeng/matrixone/thirdparties/install/include" CGO_LDFLAGS="-L/home/xupeng/matrixone/thirdparties/install/lib" go test ./pkg/common/rscthrottlerCGO_CFLAGS="-I/home/xupeng/matrixone/thirdparties/install/include" CGO_LDFLAGS="-L/home/xupeng/matrixone/thirdparties/install/lib" go test ./pkg/sql/colexec/insert -run "TestInsert.*S3|TestAcquireFlushSlot" -count=1
What type of PR is this?
Which issue(s) this PR fixes:
Fixes #24881
What this PR does / why we need it:
Cherry-pick of #25034 to 4.0-dev.
#24892/#24931 removed the RSS gate from acquireWithinRSSLimit because it double-counted S3 write buffers, causing permanent false-rejects (#24881). But removing it left no physical memory guard, causing the 4.0-dev TPCC nightly to OOM on large REPLACE INTO operations.
This PR adds a physical memory gate using mpool.GlobalStats().NumCurrBytes — the total off-heap mmap'd memory. This covers vectors, hash tables, S3 buffers, and caches. It is real-time accurate (no stale window) and does not double-count S3 buffers.
The two gates now:
Rejection triggers flushS3WriterOnMemoryPressure, freeing mpool + reserved for retry.
Checklist:
🤖 Generated with Claude Code