Skip to content

fix(rscthrottler): add mpool-based physical memory gate for S3 writes#25034

Closed
aunjgr wants to merge 1 commit into
matrixorigin:mainfrom
aunjgr:fix/s3-physical-memory-gate
Closed

fix(rscthrottler): add mpool-based physical memory gate for S3 writes#25034
aunjgr wants to merge 1 commit into
matrixorigin:mainfrom
aunjgr:fix/s3-physical-memory-gate

Conversation

@aunjgr

@aunjgr aunjgr commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What type of PR is this?

  • BUG

Which issue(s) this PR fixes:

Fixes #24881

What this PR does / why we need it:

#24892 removed the RSS gate (rss + reserved + ask > total * limitRate) from acquireWithinRSSLimit because it double-counted S3 write buffers (in both RSS and reserved), causing permanent false-rejects (#24881). But removing it left no physical memory guard, causing TPCC to OOM on large REPLACE INTO operations.

This PR adds a physical memory gate using mpool.GlobalStats().NumCurrBytes — the total off-heap mmap'd memory. This covers vectors, hash tables, S3 buffers, and caches. It is real-time accurate (no stale window) and does not double-count S3 buffers.

The two gates now:

  • Pool gate (limit - reserved >= ask): bounds S3 write concurrency
  • Mpool gate (mpoolBytes + ask > total * limitRate): bounds physical memory

Rejection from the mpool gate triggers flushS3WriterOnMemoryPressure in the caller, which syncs buffered data to S3 and frees both reserved and mpool memory, creating headroom for retry.

🤖 Generated with Claude Code

Replace the removed RSS gate with a check on mpool.GlobalStats().NumCurrBytes.
mpool tracks all mmap'd off-heap allocations (vectors, hash tables, S3 buffers,
cache) and is real-time accurate — no stale window, no RSS/s3-buffer double-count.

The pool check (limit - reserved) bounds S3 write concurrency.
The mpool gate (mpoolBytes + ask > total * limitRate) bounds physical memory.
Rejection triggers flushS3WriterOnMemoryPressure, freeing mpool + reserved.

Fixes OOM on TPCC regression caused by matrixorigin#24892 removing the RSS gate.
@aunjgr aunjgr requested a review from XuPeng-SH as a code owner June 17, 2026 09:31
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working size/S Denotes a PR that changes [10,99] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Failed to load vector data during regression testing, but it works when isolation

2 participants