feat(security): #2149 ADR-131 P1 — ToolOutputGuardrail + OWASP 2026 mapping by ruvnet · Pull Request #2154 · ruvnet/ruflo

ruvnet · 2026-05-26T12:48:36Z

Summary

Adds ToolOutputGuardrail to @claude-flow/security — pure, synchronous, pattern-based screener for content crossing the agent boundary
Closes the OWASP ASI01 (Agent Goal Hijacking) gap identified in dream cycle 2026-05-26 ([Dream Cycle 2026-05-26] security: Indirect prompt injection critical gap vs OWASP ASI01 + intelligence,swarm scan #2149)
New CI smoke tool-output-guardrail-smoke exercises the built dist + the four canonical ASI01 attack shapes
OWASP Top 10 for Agentic Applications 2026 control matrix → v3/docs/security/owasp-agents-2026-mapping.md

What's in this PR (P1 of ADR-131)

File	Purpose
`v3/@claude-flow/security/src/tool-output-guardrail.ts`	Class + 8 pattern categories + default policy
`v3/@claude-flow/security/__tests__/tool-output-guardrail.test.ts`	24 tests, 9ms total
`v3/@claude-flow/security/src/index.ts`	Public API exports
`v3/docs/adr/ADR-131-tool-output-guardrail.md`	Architecture decision record
`v3/docs/security/owasp-agents-2026-mapping.md`	ASI01–ASI10 control matrix
`scripts/smoke-tool-output-guardrail.mjs`	CI smoke against built dist
`.github/workflows/v3-ci.yml`	New `tool-output-guardrail-smoke` job

Pattern coverage

Category	Severity	Default action
`instruction-override` ("ignore previous instructions")	critical	reject
`embedded-system` (ChatML, `[INST]`, `<system>` tags)	critical/high	reject/redact
`exfiltration` ("leak the api key")	critical	reject
`role-hijack` ("you are now …")	high/medium	redact/flag
`jailbreak` (DAN, "do anything now")	high	redact
`hidden-unicode` (bidi overrides, zero-width)	high/low	redact/allow
`tool-spoofing` (`tool_call:` payloads)	medium	flag
`truncation` (content > 1 MiB)	medium	flag

What's NOT in this PR (P2-P5, tracked in ADR-131)

P2: MCP tool dispatch integration
P3: Memory read path integration
P4: Raft consensus payload validator (swarm-layer ASI01)
P5: Per-tool policy overrides + structured telemetry

Shipping the class + tests + CI smoke first so callers (third-party plugins, integration tests) can exercise it before deeper wiring.

Performance

32KB safe content scans in 0.1 ms locally (target: <1 ms p99). Pure-function shape — no I/O, no async, no model dependency.

Test plan

cd v3/@claude-flow/security && npm test -- tool-output-guardrail — 24/24 pass in 9ms
node scripts/smoke-tool-output-guardrail.mjs — 11/11 pass
All four canonical ASI01 attack shapes detected as critical
Default policy verified: critical→reject, high→redact, medium→flag, low/safe→allow
CI tool-output-guardrail-smoke green

🤖 Generated with RuFlo

…apping Closes the OWASP ASI01 (Agent Goal Hijacking) gap identified in the 2026-05-26 dream cycle. Adds a pure, synchronous, pattern-based screener for content crossing the agent's content boundary (MCP tool results, memory reads, external API responses) before it enters reasoning. Design (full rationale in ADR-131): - Eight built-in pattern categories: instruction-override, embedded-system (ChatML / Llama [INST] / <system>), exfiltration, role-hijack, jailbreak, hidden-unicode (bidi + zero-width), tool-spoofing, truncation - Default policy: critical→reject, high→redact, medium→flag, low→allow - Pure-function shape — safe to invoke in every tool result / memory read - 32KB content scans in 0.1ms locally (target: <1ms p99) Scope: - P1 (this PR): class + tests + exports + ADR + OWASP matrix + CI smoke - P2-P5 (follow-on): MCP dispatch, memory read, swarm Raft payload, per-tool policy + telemetry — tracked in ADR-131 Validation: - 24/24 vitest tests pass (9ms) - 11/11 smoke checks against built dist/index.js - Tested against the four canonical ASI01 attack shapes - CI guard `tool-output-guardrail-smoke` wired into v3-ci.yml OWASP Top 10 for Agentic Applications 2026 control matrix added: v3/docs/security/owasp-agents-2026-mapping.md — flags ASI01/06/09/10 as highest-priority remaining gaps. Co-Authored-By: RuFlo <ruv@ruv.net>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(security): #2149 ADR-131 P1 — ToolOutputGuardrail + OWASP 2026 mapping#2154

feat(security): #2149 ADR-131 P1 — ToolOutputGuardrail + OWASP 2026 mapping#2154
ruvnet wants to merge 1 commit into
mainfrom
feat/2149-tool-output-guardrail

ruvnet commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ruvnet commented May 26, 2026

Summary

What's in this PR (P1 of ADR-131)

Pattern coverage

What's NOT in this PR (P2-P5, tracked in ADR-131)

Performance

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant