Skip to content

fix(health): gate memory_critical on real memory pressure#735

Open
MarvinFS wants to merge 2 commits into
rohitg00:mainfrom
MarvinFS:fix/memory-critical-real-pressure
Open

fix(health): gate memory_critical on real memory pressure#735
MarvinFS wants to merge 2 commits into
rohitg00:mainfrom
MarvinFS:fix/memory-critical-real-pressure

Conversation

@MarvinFS
Copy link
Copy Markdown

@MarvinFS MarvinFS commented May 30, 2026

memory_critical fires whenever heapUsed/heapTotal > 95% && rss > 512MB. A busy Node process keeps its heap near-full by design, so that condition is effectively always true under normal load, producing chronic false-positive critical alerts that drown out real signals.

This adds a real-pressure gate: memory_critical now additionally requires a genuine pressure signal — either this process's absolute RSS has crossed a high ceiling (default 4 GB), or the host as a whole has run low on free RAM (default: < 10% of total free). The warn-level threshold is unchanged.

All bounds are env-overridable via resolveThresholdConfig(), which monitor.ts reads once at startup:

  • AGENTMEMORY_MEMORY_CRITICAL_RSS_MB (default 4096)
  • AGENTMEMORY_MEMORY_SYSTEM_FREE_FLOOR_RATIO (default 0.1; set to 0 to disable the system-memory check)
  • existing knobs: AGENTMEMORY_MEMORY_CRITICAL_PERCENT, AGENTMEMORY_MEMORY_WARN_PERCENT, AGENTMEMORY_MEMORY_RSS_FLOOR_MB

Uses only node:os; no new dependencies. test/health-thresholds.test.ts is updated for the new gate. Verified npm run build clean and npm test green on a fresh main checkout.

Summary by CodeRabbit

  • New Features

    • Environment-variable overrides for memory threshold configuration (warning/critical levels) and system-free-memory ratio.
    • Health monitor now loads threshold config at startup and applies it when evaluating snapshots.
  • Bug Fixes

    • More accurate critical memory detection by combining process memory and host free-memory checks.
  • Tests

    • Threshold tests updated to cover stricter RSS/system-memory gating and mock restoration.

Review Change Stack

memory_critical fired whenever heapUsed/heapTotal > 95% and rss > 512MB. A busy Node process keeps its heap near-full by design, so that condition is effectively always true under normal load, producing chronic false-positive critical alerts that drown out real signals.

Add a real-pressure gate: memory_critical now additionally requires a genuine pressure signal - either this process's absolute RSS has crossed a high ceiling (default 4GB), or the host as a whole has run low on free RAM (default: less than 10% of total free). Both bounds, plus the existing percent and RSS-floor knobs, are env-overridable (AGENTMEMORY_MEMORY_CRITICAL_RSS_MB, AGENTMEMORY_MEMORY_SYSTEM_FREE_FLOOR_RATIO, AGENTMEMORY_MEMORY_CRITICAL_PERCENT, AGENTMEMORY_MEMORY_WARN_PERCENT, AGENTMEMORY_MEMORY_RSS_FLOOR_MB) via resolveThresholdConfig(), which monitor.ts reads once at startup. The warn-level threshold is unchanged.

Uses only node:os; no new dependencies.

Signed-off-by: MarvinFS <MarvinFS@users.noreply.github.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 30, 2026

@MarvinFS is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 38019284-95e5-4d78-90c4-686843947270

📥 Commits

Reviewing files that changed from the base of the PR and between e58c3cd and d23ee8f.

📒 Files selected for processing (2)
  • src/health/thresholds.ts
  • test/health-thresholds.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/health-thresholds.test.ts
  • src/health/thresholds.ts

📝 Walkthrough

Walkthrough

The PR adds environment-configurable memory critical gating to the health monitor. New ThresholdConfig fields define an absolute process RSS ceiling and a system free-memory floor ratio. A resolveThresholdConfig() function reads these from environment variables at startup. The health evaluation logic now requires "real pressure"—either process RSS exceeding the critical threshold or system free memory falling below the configured ratio—before escalating to critical status.

Changes

Memory Critical Threshold Configuration and Evaluation

Layer / File(s) Summary
Memory threshold configuration and resolution
src/health/thresholds.ts
ThresholdConfig extends with memoryCriticalRssBytes and memorySystemFreeFloorRatio; DEFAULTS updated; resolveThresholdConfig() and env parsing helpers added.
Real-memory-pressure helper
src/health/thresholds.ts
Adds isUnderRealMemoryPressure(rssBytes, cfg) which checks absolute RSS ≥ memoryCriticalRssBytes or host free RAM ratio ≤ memorySystemFreeFloorRatio via os.freemem()/os.totalmem().
Memory critical evaluation with real-pressure gating
src/health/thresholds.ts
evaluateHealth's critical branch now requires the real-pressure gate in addition to heap percent and RSS-floor checks.
Monitor startup and health evaluation with config
src/health/monitor.ts
registerHealthMonitor resolves threshold config once at startup and passes it into each evaluateHealth(snapshot, thresholdConfig) call.
Test updates for memory critical gating
test/health-thresholds.test.ts
Tests restore Vitest mocks after each test and pass memoryCriticalRssBytes (and RSS-floor overrides where needed) to deterministically exercise the new gating.
sequenceDiagram
  participant registerHealthMonitor
  participant resolveThresholdConfig
  participant evaluateHealth
  participant HostOS
  registerHealthMonitor->>resolveThresholdConfig: read env vars
  resolveThresholdConfig-->>registerHealthMonitor: thresholdConfig
  registerHealthMonitor->>evaluateHealth: (healthSnapshot, thresholdConfig)
  evaluateHealth->>HostOS: os.freemem()/os.totalmem() (real-pressure)
  evaluateHealth-->>registerHealthMonitor: status/alerts/notes
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A rabbit hops through memory lanes,
Where thresholds guard the critical plains.
Environment whispers what's too high,
Real pressure signals: "Now we cry!"
Config resolved once, then tests align—
Health monitoring logic, refined and fine! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(health): gate memory_critical on real memory pressure' directly and clearly reflects the main change: adding a real memory pressure gate to prevent false positive memory_critical alerts.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
test/health-thresholds.test.ts (1)

47-52: ⚡ Quick win

Add direct coverage for the host-memory gate as well.

These updates only exercise the new absolute-RSS path. The memorySystemFreeFloorRatio branch, and the documented 0 disable case, still have no direct test, so a regression in the new host-memory gate would slip through.

Also applies to: 97-105

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/health-thresholds.test.ts` around lines 47 - 52, The test only exercises
the absolute-RSS branch of evaluateHealth via memoryCriticalRssBytes; add direct
tests for the host-memory gate by calling evaluateHealth with
memorySystemFreeFloorRatio set to a non-zero value and with system-free bytes in
the synthetic stats (s) low enough to trip the gate, asserting that
status/alerts reflect memory_critical, and also add a test for the documented
disable case by passing memorySystemFreeFloorRatio: 0 to ensure the host-memory
check is skipped; update or add tests near the existing blocks (the
evaluateHealth calls around lines 47 and 97-105) to supply appropriate
s.systemFreeBytes and corresponding assertions for both the tripping and
disabled behaviors.
src/health/thresholds.ts (1)

12-23: ⚡ Quick win

Prefer self-describing names/helpers over the new explanatory comments.

Most of these comments restate behavior that the field names and gate logic can carry directly. Extracting the pressure check into a helper such as isUnderRealMemoryPressure(...) would let you drop most of the prose.

As per coding guidelines, src/**/*.{ts,js}: No code comments explaining WHAT — use clear naming instead.

Also applies to: 52-55, 73-75, 137-141

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/health/thresholds.ts` around lines 12 - 23, Replace the long explanatory
comments in thresholds.ts by using self-describing names and a small helper:
rename/ensure the fields memoryCriticalRssBytes and memorySystemFreeFloorRatio
remain clearly named, remove the prose above them, and implement an exported
helper function isUnderRealMemoryPressure(totalRamBytes, freeRamBytes,
processRssBytes, processHeapUsedBytes, memorySystemFreeFloorRatio,
memoryCriticalRssBytes) that encapsulates the existing gate logic (system
free-ratio escape hatch plus the RSS+heap real-pressure check) and use that
helper wherever the gate is evaluated (e.g., the memory critical gate logic
referenced around lines ~52-55, ~73-75, ~137-141) so the comments can be removed
and behavior is expressed by the function and field names.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/health/thresholds.ts`:
- Around line 38-50: Environment-parsed thresholds can be set to impossible
values; update parseIntEnv and parseFloatEnv and the caller
resolveThresholdConfig to validate and clamp overrides for specific keys (e.g.
AGENTMEMORY_MEMORY_SYSTEM_FREE_FLOOR_RATIO, AGENTMEMORY_MEMORY_CRITICAL_RSS_MB)
before they affect underRealPressure logic: enforce non-negative integers for MB
thresholds, enforce ratio values to be within [0,1] (or other sensible min/max
per threshold), and fall back to the default if the parsed value is out of
bounds; reference the parseIntEnv, parseFloatEnv helpers and
resolveThresholdConfig so you add clamping/validation there (reject or clamp
invalid values and optionally log a warning) to prevent impossible env values
from enabling false positives.

---

Nitpick comments:
In `@src/health/thresholds.ts`:
- Around line 12-23: Replace the long explanatory comments in thresholds.ts by
using self-describing names and a small helper: rename/ensure the fields
memoryCriticalRssBytes and memorySystemFreeFloorRatio remain clearly named,
remove the prose above them, and implement an exported helper function
isUnderRealMemoryPressure(totalRamBytes, freeRamBytes, processRssBytes,
processHeapUsedBytes, memorySystemFreeFloorRatio, memoryCriticalRssBytes) that
encapsulates the existing gate logic (system free-ratio escape hatch plus the
RSS+heap real-pressure check) and use that helper wherever the gate is evaluated
(e.g., the memory critical gate logic referenced around lines ~52-55, ~73-75,
~137-141) so the comments can be removed and behavior is expressed by the
function and field names.

In `@test/health-thresholds.test.ts`:
- Around line 47-52: The test only exercises the absolute-RSS branch of
evaluateHealth via memoryCriticalRssBytes; add direct tests for the host-memory
gate by calling evaluateHealth with memorySystemFreeFloorRatio set to a non-zero
value and with system-free bytes in the synthetic stats (s) low enough to trip
the gate, asserting that status/alerts reflect memory_critical, and also add a
test for the documented disable case by passing memorySystemFreeFloorRatio: 0 to
ensure the host-memory check is skipped; update or add tests near the existing
blocks (the evaluateHealth calls around lines 47 and 97-105) to supply
appropriate s.systemFreeBytes and corresponding assertions for both the tripping
and disabled behaviors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1e6ba365-14f2-493f-a08a-43022801a5d6

📥 Commits

Reviewing files that changed from the base of the PR and between fd9e3bd and e58c3cd.

📒 Files selected for processing (3)
  • src/health/monitor.ts
  • src/health/thresholds.ts
  • test/health-thresholds.test.ts

Comment thread src/health/thresholds.ts Outdated
…e helper

Addresses CodeRabbit review feedback on the memory_critical gate:

- resolveThresholdConfig() validated nothing, so a typo such as AGENTMEMORY_MEMORY_SYSTEM_FREE_FLOOR_RATIO=2 or AGENTMEMORY_MEMORY_CRITICAL_RSS_MB=-1 made the real-pressure gate effectively always true, reintroducing the false-positive memory_critical alerts this change is meant to remove. parseIntEnv/parseFloatEnv now take min/max bounds and fall back to the default when an override is out of range (percent 0-100, ratio 0-1, MB >= 0).
- Extract isUnderRealMemoryPressure() so the gate is expressed by a name rather than a comment, and trim the explanatory prose.
- Add direct coverage for the host-free-RAM gate (both the tripping path and the ratio=0 disable case, via mocked os.totalmem/os.freemem) and for the env-bounds rejection.

Signed-off-by: MarvinFS <MarvinFS@users.noreply.github.com>
@MarvinFS
Copy link
Copy Markdown
Author

Thanks for the review — addressed in d23ee8f:

  • Env override validation (the actionable item): parseIntEnv/parseFloatEnv now take min/max bounds and fall back to the default when an override is out of range (percent 0–100, ratio 0–1, MB ≥ 0), so a typo like ...FREE_FLOOR_RATIO=2 or ...CRITICAL_RSS_MB=-1 can no longer make underRealPressure always true.
  • Helper extraction: pulled the gate into isUnderRealMemoryPressure(...) and trimmed the explanatory comments.
  • Coverage: added direct tests for the host-free-RAM gate (tripping path and the ratio=0 disable case, via mocked os.totalmem/os.freemem) and for the env-bounds rejection.

Full suite green (1294 tests) on a fresh main checkout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant