The first firewall for AI agents.
A network firewall inspects the packets reaching a machine. AEGIS inspects the
untrusted content reaching your agent — and blocks what's trying to hijack it.
A message arrives that looks like routine ops automation. AEGIS reads it —
patterns flag the phrasing, the local judge sees it wants the agent to act without you —
and it's blocked before the agent ever sees it. Genuine status messages pass straight through.
Your AI coding agent reads a lot of text it didn't write — peer messages, skill files,
memory notes, CLAUDE.md, MCP configs, web results, bash output. Any of it can carry
an attack on the agent itself. The dangerous ones don't look like attacks:
No "ignore previous instructions." No keywords. Just an official-sounding message that tells your agent it's pre-authorized to act without you. Keyword filters and prompt-injection classifiers wave it through. AEGIS catches it — and quarantines it before your agent ever reads it.
Scope: AEGIS guards the agent, not the host. It's not antivirus. Its one job is to stop your agent from being talked into doing something by untrusted text.
It doesn't cry wolf. On a labeled held-out set it had never seen:
| Recall | Precision | FP rate | F1 |
|---|---|---|---|
| 90% | 95% | 10% | 92% |
1 false positive on 10 benign files — a security doc that contains live shell exfil
examples in a code block (the 1.7B judge can't distinguish "educational" from "active"
at that granularity). All other benign files — code calling subprocess/eval, skills
full of kubectl/gcloud, MCP configs, monitoring alerts — pass cleanly.
Reproduce it yourself: tests/held_out_eval/.
brew install pilot-protocol/tap/aegis # brings llama.cpp automatically
aegis install-models # one-time judge model (~1.8 GB)
aegis init # protect your agent surfaces
aegis daemon # (or: brew services start aegis)No model? No llama.cpp? It still runs as the L1 pattern layer — instant, no RAM overhead.
Build from source / other platforms
git clone https://github.com/pilot-protocol/aegis && cd aegis
cargo build --release
cp target/release/aegis ~/.local/bin/Prebuilt macOS + Linux (x86_64/arm64) binaries are on the releases page.
AEGIS can act as a blocking hook inside Claude Code — scanning every Bash command before it runs, and scanning every tool result (web fetches, MCP responses) after it arrives.
aegis install-hooks # writes hooks into ~/.claude/settings.jsonThis installs two hooks:
| Hook | Trigger | Action |
|---|---|---|
PreToolUse/Bash |
every bash command | L1 scan, exit 2 = block if match |
PostToolUse/WebFetch,Bash,mcp__* |
tool result arrives | L1 scan, WARN to stdout if match |
The pre-tool hook is L1-only (microseconds — no model round-trip on the blocking path). The post-tool hook is non-blocking: it warns without stopping execution.
When AEGIS blocks a command it prints the exact approval command:
AEGIS blocked this command (T1:~/.ssh/).
To approve this exact command once, run:
aegis approve 'cp ~/.ssh/id_rsa /tmp/debug && cat /tmp/debug'
approve is a one-time, hash-based bypass — the exact command string is SHA-256
hashed and stored in ~/.aegis/approved_cmds.txt. On the next attempt the approval is
consumed and AEGIS blocks again. Use aegis revoke to cancel a pending approval.
aegis approve '<cmd>' # allow this exact command once
aegis revoke '<cmd>' # cancel a pending approvalAEGIS provides aegis scan-pipe — a stdin scanner that exits 0 (allow) or 2 (block) — so any harness can integrate in a few lines.
In ~/.openclaw/plugins/claw-pilot/src/inbound.ts, before dispatching a message:
import { spawnSync } from "node:child_process";
const scan = spawnSync("aegis", ["scan-pipe"], {
input: message.text, encoding: "utf8", timeout: 500,
});
if (scan.status === 2) {
logger.warn("AEGIS blocked inbound message", { rule: scan.stdout.trim() });
return; // drop
}In the tick loop, before behavior.reply():
import { spawnSync } from "child_process";
const check = spawnSync("aegis", ["scan-pipe"], {
input: msg.data ?? "", encoding: "utf8", timeout: 500,
});
if (check.status === 2) {
console.warn(`[aegis] BLOCKED — ${check.stdout.trim()}`);
continue;
}Copy integrations/hermes/plugin.py to ~/.hermes/plugins/aegis-security/plugin.py. The pre_llm_call hook scans all message content and returns None to block:
from aegis_hermes_plugin import pre_llm_call
messages = pre_llm_call(messages)
if messages is None:
raise RuntimeError("AEGIS blocked this LLM call")The plugin fails open on timeout or if aegis is not on $PATH, so a misconfigured binary never kills your harness.
echo "$CONTENT" | aegis scan-pipe
# exits 0 → safe, exits 2 → blockedflowchart LR
A[New / changed file<br/>on an agent surface] --> B{Pilot inbox?}
B -- yes --> C[Intercept:<br/>rename before<br/>the agent reads it]
B -- no --> D[Read in place]
C --> E
D --> E[Extract the payload text]
E --> F[L1 · Aho-Corasick patterns<br/><i>microseconds · pure Rust</i>]
F --> G[L2 · Unified judge<br/><i>local LLM, two passes</i>]
G --> H{Attack?}
H -- no --> I[Allow → file released<br/>to the agent, untouched]
H -- yes --> J[Quarantine →<br/>~/.aegis/quarantine/<br/>+ desktop notification<br/>+ audit log]
Two layers. A fast universal one, and a smart one.
-
L1 — Aho-Corasick patterns. Pure Rust, microseconds, kilobytes. ~120 known injection/IoC strings plus decode passes: base64, hex, rot13, homoglyphs, zero-width characters, leetspeak. Sliding-window scan (4096-byte window, 512-byte stride) covers full documents so middle-buried payloads can't hide. Runs on anything — a Pi, a router, a CI box.
-
L2 — the judge. A local Qwen3-1.7B (via llama.cpp, fully offline). Two passes: "is this content attacking the agent?" (injection, jailbreak, spoofing, exfil — and crucially, describing an attack ≠ performing one) OR "is it pushing the agent to act without the user?" (the infra-impersonation question). A safe verdict vetoes L1's keyword hits — that's why a security doc that quotes an injection isn't flagged. The judge sees both the head and tail of large documents to defeat truncation-based burial attacks.
If the judge can't run (tiny device, no model, server down), AEGIS degrades to L1 patterns alone — lower recall, but an instant, dependency-free floor.
Credential taint detection. If a command co-occurs a credential read ($(cat ~/.ssh,
security find-generic-password, .aws/credentials, etc.) with a network exfil sink
(/dev/tcp, nc, openssl s_client, etc.), AEGIS flags it as CRED_TAINT — even
if no individual keyword fires. The judge can still veto for educational content.
- Quarantine =
~/.aegis/quarantine/(amv, not a delete — you can inspect it). Inbox messages are intercepted (claimed before the agent can read them); skills and memory are moved out of the agent's path.CLAUDE.md/ MCP config are alerted but not moved (moving them would break your setup). - Notified three ways: a native desktop notification, the terminal, and an
HMAC-chained audit log at
~/.aegis/audit.jsonl(aegis statusto tail it).
~/.aegis/config.toml (created by aegis init):
[judge]
enabled = true # false = L1 patterns only, any host
model = "" # pin a model path; "" = auto
[watch]
defaults = true # protect standard agent surfacesCustom watch targets go in ~/.aegis/watch.toml.
| Layer | Latency | RAM | Runs on |
|---|---|---|---|
| L1 patterns | microseconds | KB | anywhere |
| L2 judge | ~260 ms/pass (warm) | ~2.2 GB | macOS / Linux |
Binary 880 KB. Judge model loads once; clean traffic stays cheap. Nothing ever leaves the machine.
aegis init Write default config; show what's protected
aegis daemon Watch & protect all agent surfaces
aegis scan <path>... One-shot scan (agent hook / CI step)
aegis install-hooks Wire AEGIS into Claude Code as a blocking hook
aegis install-models Download the judge model (~1.8 GB)
aegis approve <cmd> Allow a blocked command once (one-time bypass)
aegis revoke <cmd> Cancel a pending one-time approval
aegis status Tail the audit log
aegis targets List protected surfaces
aegis config Show effective configuration
tests/held_out_eval/ is the honest held-out benchmark —
labeled files AEGIS never saw during tuning. Start the judge, then:
python3 tests/held_out_eval/run_held_out.pyThe runner exits 1 if precision < 90% or recall < 80%, so it's usable in CI.
| Vector | Covered by |
|---|---|
| Prompt injection / override keywords | L1 T1 patterns |
| Jailbreak tokens (DAN, developer mode, etc.) | L1 T1 patterns |
| Context-sensitive manipulation (skill/memory/CLAUDE.md) | L1 T2 patterns |
| Infrastructure impersonation ("pre-approved", "no confirmation") | L1 T2 + judge pass 2 |
| Middle-buried payloads | Sliding-window scan |
| Obfuscated payloads (base64, hex, rot13, zero-width, homoglyphs) | Decode passes |
| Credential exfiltration (source + sink co-occurrence) | Credential taint |
| Injected bash commands (Claude Code) | PreToolUse hook |
| Injected content in tool results / web fetches | PostToolUse hook |
| MCP tool description injection | PostToolUse/mcp__* hook |
| Inbound overlay messages (OpenClaw, PicoClaw) | scan-pipe in TypeScript/Node dispatch |
| Pre-LLM messages (Hermes) | pre_llm_call Python plugin |
| Generic CI / pipeline content | scan-pipe stdin |
MIT — see LICENSE.