Skip to content

feat(security): #2149 ADR-131 P1 — ToolOutputGuardrail + OWASP 2026 mapping#2154

Open
ruvnet wants to merge 1 commit into
mainfrom
feat/2149-tool-output-guardrail
Open

feat(security): #2149 ADR-131 P1 — ToolOutputGuardrail + OWASP 2026 mapping#2154
ruvnet wants to merge 1 commit into
mainfrom
feat/2149-tool-output-guardrail

Conversation

@ruvnet
Copy link
Copy Markdown
Owner

@ruvnet ruvnet commented May 26, 2026

Summary

What's in this PR (P1 of ADR-131)

File Purpose
v3/@claude-flow/security/src/tool-output-guardrail.ts Class + 8 pattern categories + default policy
v3/@claude-flow/security/__tests__/tool-output-guardrail.test.ts 24 tests, 9ms total
v3/@claude-flow/security/src/index.ts Public API exports
v3/docs/adr/ADR-131-tool-output-guardrail.md Architecture decision record
v3/docs/security/owasp-agents-2026-mapping.md ASI01–ASI10 control matrix
scripts/smoke-tool-output-guardrail.mjs CI smoke against built dist
.github/workflows/v3-ci.yml New tool-output-guardrail-smoke job

Pattern coverage

Category Severity Default action
instruction-override ("ignore previous instructions") critical reject
embedded-system (ChatML, [INST], <system> tags) critical/high reject/redact
exfiltration ("leak the api key") critical reject
role-hijack ("you are now …") high/medium redact/flag
jailbreak (DAN, "do anything now") high redact
hidden-unicode (bidi overrides, zero-width) high/low redact/allow
tool-spoofing (tool_call: payloads) medium flag
truncation (content > 1 MiB) medium flag

What's NOT in this PR (P2-P5, tracked in ADR-131)

  • P2: MCP tool dispatch integration
  • P3: Memory read path integration
  • P4: Raft consensus payload validator (swarm-layer ASI01)
  • P5: Per-tool policy overrides + structured telemetry

Shipping the class + tests + CI smoke first so callers (third-party plugins, integration tests) can exercise it before deeper wiring.

Performance

32KB safe content scans in 0.1 ms locally (target: <1 ms p99). Pure-function shape — no I/O, no async, no model dependency.

Test plan

  • cd v3/@claude-flow/security && npm test -- tool-output-guardrail — 24/24 pass in 9ms
  • node scripts/smoke-tool-output-guardrail.mjs — 11/11 pass
  • All four canonical ASI01 attack shapes detected as critical
  • Default policy verified: critical→reject, high→redact, medium→flag, low/safe→allow
  • CI tool-output-guardrail-smoke green

🤖 Generated with RuFlo

…apping

Closes the OWASP ASI01 (Agent Goal Hijacking) gap identified in the
2026-05-26 dream cycle. Adds a pure, synchronous, pattern-based screener
for content crossing the agent's content boundary (MCP tool results,
memory reads, external API responses) before it enters reasoning.

Design (full rationale in ADR-131):
- Eight built-in pattern categories: instruction-override, embedded-system
  (ChatML / Llama [INST] / <system>), exfiltration, role-hijack, jailbreak,
  hidden-unicode (bidi + zero-width), tool-spoofing, truncation
- Default policy: critical→reject, high→redact, medium→flag, low→allow
- Pure-function shape — safe to invoke in every tool result / memory read
- 32KB content scans in 0.1ms locally (target: <1ms p99)

Scope:
- P1 (this PR): class + tests + exports + ADR + OWASP matrix + CI smoke
- P2-P5 (follow-on): MCP dispatch, memory read, swarm Raft payload,
  per-tool policy + telemetry — tracked in ADR-131

Validation:
- 24/24 vitest tests pass (9ms)
- 11/11 smoke checks against built dist/index.js
- Tested against the four canonical ASI01 attack shapes
- CI guard `tool-output-guardrail-smoke` wired into v3-ci.yml

OWASP Top 10 for Agentic Applications 2026 control matrix added:
v3/docs/security/owasp-agents-2026-mapping.md — flags ASI01/06/09/10 as
highest-priority remaining gaps.

Co-Authored-By: RuFlo <ruv@ruv.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant