Skip to content

fix(rscthrottler): add mpool-based physical memory gate for S3 writes (4.0-dev)#25039

Open
aunjgr wants to merge 1 commit into
matrixorigin:4.0-devfrom
aunjgr:fix/s3-physical-memory-gate-4.0-dev
Open

fix(rscthrottler): add mpool-based physical memory gate for S3 writes (4.0-dev)#25039
aunjgr wants to merge 1 commit into
matrixorigin:4.0-devfrom
aunjgr:fix/s3-physical-memory-gate-4.0-dev

Conversation

@aunjgr

@aunjgr aunjgr commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

Fixes #24881

What this PR does / why we need it:

Cherry-pick of #25034 to 4.0-dev.

#24892/#24931 removed the RSS gate from acquireWithinRSSLimit because it double-counted S3 write buffers, causing permanent false-rejects (#24881). But removing it left no physical memory guard, causing the 4.0-dev TPCC nightly to OOM on large REPLACE INTO operations.

This PR adds a physical memory gate using mpool.GlobalStats().NumCurrBytes — the total off-heap mmap'd memory. This covers vectors, hash tables, S3 buffers, and caches. It is real-time accurate (no stale window) and does not double-count S3 buffers.

The two gates now:

  • Pool gate (limit - reserved >= ask): bounds S3 write concurrency
  • Mpool gate (mpoolBytes + ask > total * limitRate): bounds physical memory

Rejection triggers flushS3WriterOnMemoryPressure, freeing mpool + reserved for retry.

Checklist:

🤖 Generated with Claude Code

Replace the removed RSS gate with a check on mpool.GlobalStats().NumCurrBytes.
mpool tracks all mmap'd off-heap allocations (vectors, hash tables, S3 buffers,
cache) and is real-time accurate — no stale window, no RSS/s3-buffer double-count.

The pool check (limit - reserved) bounds S3 write concurrency.
The mpool gate (mpoolBytes + ask > total * limitRate) bounds physical memory.
Rejection triggers flushS3WriterOnMemoryPressure, freeing mpool + reserved.

Fixes OOM on TPCC regression caused by matrixorigin#24892 removing the RSS gate.
@aunjgr aunjgr requested a review from XuPeng-SH as a code owner June 17, 2026 10:44
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@XuPeng-SH XuPeng-SH left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes for a blocker in the new mpool gate.

acquireWithinRSSLimit now checks only mpoolBytes + ask before the reserved.CompareAndSwap. That is not a linearizable physical-memory admission check: once one caller succeeds, its ask is in reserved but may not be reflected in mpool.GlobalStats() until the subsequent CNS3Writer.Write allocations run. In that window, other callers can still see the same mpoolBytes and pass the same mpoolBytes + ask check.

Concrete case: total=100, limitRate=0.90, mpool=85, reserved=0, ask=4. The first acquire passes and CASes reserved to 4. Before its write allocation updates mpool, a second acquire still checks 85+4 <= 90 and also passes. After both writes allocate, mpool can exceed the intended 90-byte physical cap. With high non-S3 mpool usage and parallel S3 writers, this can admit a burst of S3 reservations on top of existing mpool pressure, so the PR can still OOM in the unhappy path it is meant to guard.

Please make the physical gate account for granted-but-not-yet-materialized S3 reservations without reintroducing the old double-count for already materialized S3 buffers, and add a regression test that keeps mpool constant across multiple granted acquires or exercises concurrent acquires under high non-S3 mpool pressure.

Local checks run:

  • CGO_CFLAGS="-I/home/xupeng/matrixone/thirdparties/install/include" CGO_LDFLAGS="-L/home/xupeng/matrixone/thirdparties/install/lib" go test ./pkg/common/rscthrottler
  • CGO_CFLAGS="-I/home/xupeng/matrixone/thirdparties/install/include" CGO_LDFLAGS="-L/home/xupeng/matrixone/thirdparties/install/lib" go test ./pkg/sql/colexec/insert -run "TestInsert.*S3|TestAcquireFlushSlot" -count=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working kind/test-ci size/S Denotes a PR that changes [10,99] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants