Skip to content

[Enhancement]: ATR detection plugin over managed tool-call events #324

Description

@eeee2345

Affected area

Middleware or guardrails, Plugins

Problem or opportunity

NeMo Relay observes and controls managed tool and llm execution, and the first-party nemo_guardrails and pii_redaction plugins already sit in the security/privacy lane around that boundary. What isn't covered today is a deterministic, local content-detection pass over tool-call payloads: flagging agent-attack patterns (prompt injection, tool poisoning, exfiltration) in tool args/results without a model call or a network round-trip.

Proposed enhancement

A NeMo Relay detection plugin that runs a deterministic ruleset over managed tool-call payloads at the plugin/middleware layer. It would subscribe to tool lifecycle events (or register a tool-execution intercept via PluginContext) and emit findings on the JSON-projected tool args/results. Advisory by design: it reports, it does not block. The ruleset is ATR (Agent Threat Rules), an open MIT, no-LLM ruleset; the engine (pyatr) runs in-process with no network call.

Runtime contract and binding impact

Runs in-process through the plugin system: validate() + register(config, context: PluginContext), with behavior registered through PluginContext over the managed tool boundary. Deterministic/local maps onto the same builtin/local backend mode the existing nemo_guardrails / pii_redaction plugins use. Primary surface is the Python binding; Rust core / Node.js / other bindings are N/A unless the maintainers want the detector in core rather than as a plugin.

Open question (the reason for filing): should this live as an in-repo first-party component alongside nemo_guardrails, or as an external package built on the nemo-relay-plugin worker SDK? Happy to follow whichever direction you prefer before writing any code.

Alternatives considered

  • Wire detection outside Relay entirely — loses the managed-tool-boundary vantage point Relay is built to provide.
  • Rely on the existing pii_redaction / nemo_guardrails plugins — different scope (PII redaction / policy), not agent-attack content detection.
  • Put the detector in the Rust core instead of a plugin — heavier; the plugin system already models security components this way.

Acceptance criteria

  • A plugin can run a deterministic ruleset over managed tool-call args/results and emit advisory findings without blocking execution.
  • It registers through the documented plugin system (PluginContext), in the same lane as the existing security/privacy plugins.
  • Direction is agreed (in-repo first-party component vs external nemo-relay-plugin package) before any code lands.
  • Tests and a docs/example show the detection-plugin wiring.

Disclosure: I maintain ATR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions