feat(security): #2149 ADR-131 P1 — ToolOutputGuardrail + OWASP 2026 mapping#2154
Open
ruvnet wants to merge 1 commit into
Open
feat(security): #2149 ADR-131 P1 — ToolOutputGuardrail + OWASP 2026 mapping#2154ruvnet wants to merge 1 commit into
ruvnet wants to merge 1 commit into
Conversation
…apping Closes the OWASP ASI01 (Agent Goal Hijacking) gap identified in the 2026-05-26 dream cycle. Adds a pure, synchronous, pattern-based screener for content crossing the agent's content boundary (MCP tool results, memory reads, external API responses) before it enters reasoning. Design (full rationale in ADR-131): - Eight built-in pattern categories: instruction-override, embedded-system (ChatML / Llama [INST] / <system>), exfiltration, role-hijack, jailbreak, hidden-unicode (bidi + zero-width), tool-spoofing, truncation - Default policy: critical→reject, high→redact, medium→flag, low→allow - Pure-function shape — safe to invoke in every tool result / memory read - 32KB content scans in 0.1ms locally (target: <1ms p99) Scope: - P1 (this PR): class + tests + exports + ADR + OWASP matrix + CI smoke - P2-P5 (follow-on): MCP dispatch, memory read, swarm Raft payload, per-tool policy + telemetry — tracked in ADR-131 Validation: - 24/24 vitest tests pass (9ms) - 11/11 smoke checks against built dist/index.js - Tested against the four canonical ASI01 attack shapes - CI guard `tool-output-guardrail-smoke` wired into v3-ci.yml OWASP Top 10 for Agentic Applications 2026 control matrix added: v3/docs/security/owasp-agents-2026-mapping.md — flags ASI01/06/09/10 as highest-priority remaining gaps. Co-Authored-By: RuFlo <ruv@ruv.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ToolOutputGuardrailto@claude-flow/security— pure, synchronous, pattern-based screener for content crossing the agent boundarytool-output-guardrail-smokeexercises the built dist + the four canonical ASI01 attack shapesv3/docs/security/owasp-agents-2026-mapping.mdWhat's in this PR (P1 of ADR-131)
v3/@claude-flow/security/src/tool-output-guardrail.tsv3/@claude-flow/security/__tests__/tool-output-guardrail.test.tsv3/@claude-flow/security/src/index.tsv3/docs/adr/ADR-131-tool-output-guardrail.mdv3/docs/security/owasp-agents-2026-mapping.mdscripts/smoke-tool-output-guardrail.mjs.github/workflows/v3-ci.ymltool-output-guardrail-smokejobPattern coverage
instruction-override("ignore previous instructions")embedded-system(ChatML,[INST],<system>tags)exfiltration("leak the api key")role-hijack("you are now …")jailbreak(DAN, "do anything now")hidden-unicode(bidi overrides, zero-width)tool-spoofing(tool_call:payloads)truncation(content > 1 MiB)What's NOT in this PR (P2-P5, tracked in ADR-131)
Shipping the class + tests + CI smoke first so callers (third-party plugins, integration tests) can exercise it before deeper wiring.
Performance
32KB safe content scans in 0.1 ms locally (target: <1 ms p99). Pure-function shape — no I/O, no async, no model dependency.
Test plan
cd v3/@claude-flow/security && npm test -- tool-output-guardrail— 24/24 pass in 9msnode scripts/smoke-tool-output-guardrail.mjs— 11/11 passcriticalcritical→reject,high→redact,medium→flag,low/safe→allowtool-output-guardrail-smokegreen🤖 Generated with RuFlo