07 Jun 06:05

thehoff

a1e32ee

ContextCrawler 0.4.0 — the library pivot Latest

Latest

ContextCrawler 0.4.0 — the library pivot

ContextCrawler is now a proper lib + bin. The binary is a thin shim over the
library, so the CLI dogfoods the exact code path downstream Rust tools embed.
This completes the community request in #185.

Added

Curated public embedder API (experimental, not yet semver-guaranteed):
summarize_command_output, filter_output(name, raw), auto_filter_output(raw),
available_filters(), no_bloat. Apply the filters/summariser to captured
command output without spawning the CLI.
Crate-level rustdoc + a comprehensive docs overhaul (README, command/filter
reference, library + architecture guides, security-gate docs).

Changed

One compile tree (no duplicate bin/lib): dead-code warnings 473 to 0.
main.rs is a thin shim over contextcrawler::run(); all CLI logic in the lib.

Use as a library (no crates.io needed)

contextcrawler = { git = "https://github.com/thehoff/contextcrawler", tag = "contextcrawler-v0.4.0" }

Install the CLI (from source only)

cargo install --git https://github.com/thehoff/contextcrawler --tag contextcrawler-v0.4.0

No pre-built binaries are distributed: build from source. Downstream of
rtk-ai/rtk, with contextzip + Tirith folded in. Experimental, single-maintainer,
fast-moving: build it, test it against your own workflow, verify before depending.

Assets 2

26 May 04:10

thehoff

v0.1.11

3942088

v0.1.11 — security + supply-chain hardening + perf

Security + supply-chain hardening + perf

Heaviest release since v0.1.6. 7 issues closed, 200+ tests green, 11 commits.

🔒 Security

#180 — Credentials no longer captured in audit logs. Both downgrades.jsonl and supply_chain.jsonl previously serialised the raw shell command verbatim, capturing inline tokens (curl Authorization headers, env-var assignments to *_TOKEN/*_KEY/*_SECRET, GitHub PATs, URL basic-auth, git-credential-helper, CLI flag values). Empirical 24h scan of one user's log: 27 unique 40-hex tokens + 15 Authorization: token <hex> headers in cleartext. Fixed at the log-write boundary in both gates plus the tirith blob (which echoes cmd back in findings). Ships contextcrawler security --scrub-logs [--dry-run] to retroactively clean existing on-disk logs.
#181 — CONTEXTCRAWLER_SUPPLY_CHAIN=off now works as an inline prefix. The gate's own error messages tell users to "rerun with CONTEXTCRAWLER_SUPPLY_CHAIN=off" — but the gate only checked the shell environment, not the inline form. Users followed the documented bypass and were still blocked. Now parses leading POSIX-style assignments off the cmd. Must be a prefix; mid-cmd && FOO=bar does not bypass.
#182 — Retry on transient HTTP errors. A single registry/OSV blip previously turned the whole install into Verdict::Unavailable (no retry, first error wins). Now: 1 retry, 5s per-attempt, 250ms backoff. Retried only on transport-level failures and 5xx. 4xx is terminal. Worst-case 10.25s/call, well under CHECK_WALL_BUDGET = 25s.

🛠 Supply-chain hardening

#145 — parse_package_args robustness. Three orthogonal false-positives fixed: celery[redis] extras now stripped before PyPI lookup; inline # comments terminate args; full pip value-consuming flag set (--platform, --python-version, --trusted-host, --extra-index-url, --find-links, --no-binary/--only-binary, --prefix, --root, --upgrade-strategy, --proxy, --cert, …) is now known.
#147 — Ecosystem::Unknown variant. Depth-cap synthetic was hardcoded Ecosystem::Npm — deep-nested PyPI/mixed-shell payloads were mislabelled [npm]. Verdict was always correct; label is now too. Cascade arms added in 3 existing matches.
#149 — O(n²) → O(n) dedup + O(log n) claimed-span check. Mechanical refactor, no behaviour change. dedup_installs uses HashSet keyed on canonical signature; claimed-span overlap-check uses BTreeMap::range. Stress test: 5,000 dedup in single-digit ms; 200-segment install detect in <250ms.

🧹 CI + process

#183 — branding_lint red on develop. Four pre-existing leaks in src/hooks/init.rs were blocking the safety net. Lint is green again.
#78 (closes #75) — clap-aware doc-comment lint. Catches RTK savings/adoption/equivalent/artifacts/etc. in clap docs.
#80 (addresses #77) — merge-driver design + multi-model peer review docs. Pure documentation.

Test coverage

Module	Tests
`core::secret_redact`	16
`hooks::tirith_gate`	6
`hooks::supply_chain_gate`	173
`tests::branding_lint`	5

200+ green across the touched stack. No regressions.

Upgrade notes

No breaking changes.
New CLI: contextcrawler security --scrub-logs [--dry-run].
The #182 retry change reduces per-attempt HTTP timeout from 8s → 5s.
Existing audit logs are NOT automatically scrubbed on upgrade. Run contextcrawler security --scrub-logs once after install if you've used inline credentials in shell commands (curl headers, env-var literals, gh auth login --with-token, etc.).

Empirical proof — real-world 24h scan

Pattern	Before	After
`Authorization: token <hex>` in audit logs	15	0
40-hex tokens captured	27	0
GitHub PAT prefixes	0	0

Assets 2

23 May 05:53

thehoff

v0.1.10

779a853

v0.1.10 — supply-chain hardening cluster + correctness pass

Caps the #141 plaintext-FP-guard cluster: install-shaped substrings printed
as data through echo/grep/jq/base64/… no longer trigger the
supply-chain gate, while rg/awk/gawk/sed are kept OUT of the
allowlist because they can spawn sub-processes (rg --pre,
awk 'BEGIN { system(...) }', GNU sed s///e).

Both Codex and agy returned GREEN on rounds 2 + 3.

Supply-chain hardening (#139 → #148)

#148 — plaintext-FP guard with data-utility allowlist
(echo, printf, cat, tac, tee, grep, egrep, fgrep, head, tail, nl,
base64, xxd, od, hexdump, jq) + explicit exclusions for execution-capable
utilities (rg/awk/gawk/sed). 2668 tests, every retained util pinned
by a dedicated masking regression test.
#146 — quote-aware tokeniser closes wrapped-install bypass (#140).
#143 — value-flag + Windows-launcher edge cases, yarn diagnostic
flag exclusion, legacy-Mac \r-only line continuation.
#142 — newline-chain, Windows launcher, bare poetry/uv bypasses.
#139 — absolute / relative path install bypass closed.

Security gate (#101 → #122, #128 → #130)

#130 — integrity binary-hook check, supply-chain budget/cap, ISO-8601
- downgrades hardening (SEC-I1..I4).
#129 — token-aware permission matching + substitution-aware
decomposition (SEC-C2/C3); skip arithmetic expansion, normalise glob
patterns.
#128 — dotnet forbidden-arg checker (block MSBuild
Custom*Targets / runsettings RCE, SEC-C1) + semicolon-batch / case-fold
/ @response-file bypasses closed.
#101 → #122 — the #100/#111 audit gating cluster (g1–g8 fixes,
hooks dispatch, gate wiring, security core).

Filters + correctness

#127 — accept standard grep flags (62% drop in parse failures);
route -h/-G to passthrough.
#126 — restore SIGPIPE default disposition (eliminates startup
SIGABRT).
#125 — recognise CRITICAL/FATAL severities, wc stdin support,
detached-HEAD sha handling, .env visibility (port upstream 62fc0e0);
init refuses malformed CLAUDE.md instead of silent exit-0.
#124 — preserve user content in copilot-instructions.md
(port upstream d108165).
#134 — honest savings claims for read / git log filters
(READ-I1, GIT-I1).

Performance + analytics

#133 — take VACUUM off the record() hot path (PERF-I1).
#132 — weighted by-command avg, quota horizon, char-based
token estimate, computed ccusage date (AN-I1..I4).
#135 — new: gain --weak-filters — rank tools by leaked tokens.

Features

#138 — Pi (pi.dev) added as the 11th supported harness.
#136 — unified agent guidance into one canonical source.
#94 — curl JSON minify filter.
#93 — ServiceNow SDK build filter.

Verification

2,668 / 2,668 unit tests passing
cargo deny check — advisories, bans, licenses, sources ok
scripts/build-release.sh --verify — zero builder paths in binary
PR #148 peer review: Codex GREEN, agy GREEN (rounds 2 + 3)

Known follow-ups (not blocking)

README install pins still reference v0.1.6 (stale since v0.1.7).
CHANGELOG.md top entry is [0.1.7]; v0.1.8/9/10 narratives live on
the GH release pages only.
scripts/bump-version.sh looks for a ContextCrawler X.Y.Z literal
no longer present in src/main.rs (version flows from Cargo.toml via
clap). Manual bump used for this release as for v0.1.8/9.

🤖 Generated with Claude Code

Post-release patch (force-tag, same v0.1.10)

scripts/build-release.sh --install shipped with a macOS Apple-Silicon
AMFI bug — plain cp of the ad-hoc-linker-signed binary to
~/.local/bin/contextcrawler produced a destination that AMFI rejected
at exec (load code signature error 2, SIGKILL before main()). Found
within minutes of cutting the release. Patched by codesign --force --sign - on the destination after copy, macOS-gated. Linux path
unchanged. cargo install --git … --tag v0.1.10 was never affected —
cargo handles signing correctly. Only the RELEASING.md §7 manual
install hit it.

Tag force-moved to include the fix (release was minutes old; no
precompiled artifacts attached, so no SHA-mismatch surface for users).

Assets 2

18 May 14:45

thehoff

v0.1.9

0ff327f

v0.1.9 — Claude Code hook fix + zero-trust hardening + bench harness Tier 2

Highlights

Claude Code hook actually works now (#62)

Pre-v0.1.9, contextcrawler rewrite "git status" returned "rtk git status" and the Claude Code hook substituted that into Claude Code's updatedInput.command. Anyone with raw-command Claude usage + the hook installed got rtk: command not found on every rewrite. The bug had been live since the rebrand — only masked in our session because most traffic flows via codex (which calls contextcrawler X directly per the AGENTS.md template).

Fixed in PR #72 with full backcompat: the classifier and rewriter now accept BOTH contextcrawler X and rtk X as already-wrapped inputs, and the dashboard's display_cmd strips both prefixes so historical SQLite rows bucket cleanly with new rows.

Other notable shipments

Zero-trust hardening for gh / glab / gt (PR #58) — env strip + arg deny + passthrough fix
Pytest -p bypass (PR #59) — close bare-relative + glued-form -pVALUE / -p=VALUE evasions
$CODEX_HOME escape warning (PR #66) — defence-in-depth when env points outside $HOME
Tier 2 bench harness (PR #70) — end-to-end Claude Code hook measurement, 10.4% measured savings on 8 fixtures
Allowlist-based branding lint (PR #71) — broad scan with 17 named allowlist rules; catalogues ~50 known-debt RTK strings; future leaks fail at test time instead of user-report time

Cleanup

0 cargo warnings on develop tip (was 80+); deleted 6 orphan modules (-2953 lines), wired 2 latent legacy hook bugs (PR #64)
uninstall_codex_at ordering fix (PR #65) — AGENTS.md patched before file deletion so failure modes don't leave inconsistent state
git_hardening test fixture (PR #60) — hermetic against host commit.gpgsign=true so 1Password/gpg-agent unavailability doesn't break the suite
GLOBAL_ENV_LOCK consolidation (PR #68) — shared lock for env-mutating integration tests

Branding sweep (3 rounds; user-reported)

PR #61, #63 — clap help text + error prefixes + init messages
PR #71 — allowlist-style lint replaces regression-style FORBIDDEN_TOKENS

Quality bar

cargo build clean: 0 warnings
cargo test --bin contextcrawler — 2119 pass / 0 fail
10 integration suites green: branding_lint, git_hardening, cargo_hardening, node_hardening, runtime_hardening, cloud_hardening, harness_standalone, proxy_nudge, gh_glab_gt_hardening, harness_claude_code (new)

PRs included

#54 #56 #58 #59 #60 #61 #63 #64 #65 #66 #68 #70 #71 #72 (14 PRs)

Deferred to v0.1.10+

#28 codex compliance 80→95% lift (needs 24h post-#54 dashboard sample)
#29 Tier 3 Codex bench harness
#40 trusted-PATH binary resolution
#67-debt ~50 catalogued RTK strings in dashboard / uninstall / etc. (lint-pinned, ready for cleanup PR)
#69 unit-test src/ env race follow-up to #48

Install

cargo install --git https://github.com/thehoff/contextcrawler --tag v0.1.9
contextcrawler --version    # → contextcrawler 0.1.9

If you're using the Claude Code hook installed via contextcrawler init -g --claude-md: strongly recommend updating so the hook's rewrite output matches the binary name. Pre-v0.1.9 the hook returned rtk X substitutions that Claude Code couldn't execute.

Assets 2

18 May 10:02

thehoff

v0.1.8

1e967af

v0.1.8 — zero-trust CLI wrapper hardening

Highlights

This release lands the zero-trust CLI wrapper layer across 15 secure_*_command helpers + universal env-strip + per-tool RCE-vector deny-lists for rg/find/git/cargo/node/python/ruby/jvm/dotnet/go/k8s/docker/aws/psql/curl/wget.

Threat model framing: the LLM is the caller, and we want defense in depth against hostile env or arg injection inherited from upstream processes — this is hardening, not a panic patch. (See docs/security/zero-trust-wrapped-cli.md)

Confirmed RCE blocked

RIPGREP_CONFIG_PATH → loaded config can carry --pre=<script> → rg executes the script per file. Empirically reproduced 4/4 attack paths firing the preprocessor as the user's uid, then blocked via secure_rg_command() (env strip) + check_forbidden_rg_args() (arg deny + bundle-form detection).

What's new

15 secure_*_command helpers wrapping every spawned tool — LD_PRELOAD, DYLD_*, BASH_FUNC_*, PROMPT_COMMAND, PERL5OPT, LUA_INIT, RUBYOPT, JAVA_TOOL_OPTIONS, DOTNET_STARTUP_HOOKS, RUSTC_WRAPPER, NODE_OPTIONS, GIT_EXTERNAL_DIFF, KUBECONFIG (k8s exec credential plugins), AWS_CONFIG_FILE (credential_process), RIPGREP_CONFIG_PATH, etc.
Real contextcrawler security subcommand — Tirith dashboard with binary resolution, gate state, downgrade log tail, --json mode. (Was previously a doc-only feature; invocations fell through to macOS /usr/bin/security.)
Token-savings improvements — symmetric 80/80 head/tail cap (was 80/20), passthrough_extensions allowlist, two-line marker with contextcrawler proxy cat <path> escape hatch, grep -c/-L/-o/-Z long-form routed through rg.
Codex compliance lifted 0% → 80% via strengthened AGENTS.md template with MUST + WRONG/RIGHT examples.
Bench harness Tier 1 — isolated DB tempfile, JSON/MD reports keyed by git sha.

Rebrand insurance

--version now prints contextcrawler 0.1.8 (was leaking rtk 0.39.0).
47 [rtk] → [contextcrawler] print-string sweep.
4 regression-pinning tests (branding_lint, RTK_MD constant pin, CLI name pin, config-file canonical name).

Tests

2116 bin tests + 7 integration suites green.

PRs included

#10 #14 #15 #16 #17 #18 #21 #24 #25 #30 #31 #33 #41 #42 #43 #44 #46 #47 #51

Install

cargo install --git https://github.com/thehoff/contextcrawler --tag v0.1.8
contextcrawler --version    # → contextcrawler 0.1.8

Assets 2

15 May 12:31

thehoff

v0.1.6

d723137

v0.1.6 — security & maintenance

[0.1.6] — 2026-05-15

Security and maintenance release. Closes 12 audit findings from the
2026-05-15 review (extending the three GHSAs from v0.1.5 plus
downstream-only findings on the web command, supply-chain integration,
filter trust model, and tirith gate). Adds the long-term-maintenance
framework: threat model, release runbook, upstream-rebase strategy,
quality baselines, three per-module security audits, and a roadmap.

Security

Build-host metadata stripped from release binaries. Previously
the release binary embedded ~284 /Users/<builder>/.cargo/registry/...
paths used by Rust's panic-backtrace metadata, leaking the builder's
username and directory layout. scripts/build-release.sh now sets
--remap-path-prefix for $CARGO_HOME and the workspace; --verify
mode asserts zero builder paths in the produced binary.
strip_ansi extended + raw-emit sweep. strip_ansi already
covered CSI; v0.1.5 added OSC / OSC 8 hyperlinks / DCS / SOS / PM /
APC / private DEC modes. v0.1.6 sweeps 58 raw eprint!/println!
sites across 9 files (cmds/git/, cmds/cloud/, cmds/js/,
cmds/python/, cmds/dotnet/, cmds/system/grep_cmd.rs,
cmds/go/, core/runner.rs) so failure-path tool output goes
through the sanitiser before reaching the agent.
Global TOML filter trust gate (H-3). ~/.config/rtk/filters.toml
was previously loaded with no integrity check while the project-local
.rtk/filters.toml was SHA-256-pinned. Closed: same trust store,
same content-change-revokes semantics. New CLI: contextcrawler trust --global / untrust --global. Plus a TOCTOU fix
(check_trust_bytes works on the already-read buffer instead of
re-opening the path between hash and parse).
CI trust-override now requires platform-injected token (H-2).
RTK_TRUST_PROJECT_FILTERS=1 previously trusted any env that set
CI=true (settable by a hostile Makefile). Tightened to also require
a platform-injected token (GITHUB_TOKEN, CI_JOB_TOKEN,
BUILDKITE_AGENT_ACCESS_TOKEN, JENKINS_NODE_COOKIE/BUILD_TAG,
CIRCLE_TOKEN/CIRCLE_BUILD_NUM, DRONE_BUILD_NUMBER). An in-repo
Makefile can't fake these.
Tirith subprocess hardening (F-01 / F-02 / F-04 / F-05).
- wait_timeout(8s) so a hung tirith check no longer freezes the
  agent's PreToolUse hook (was indefinite).
- 4 MiB stdout cap.
- Stdio::null() on stdin and stderr — the stderr pipe was never
  drained, so a noisy tirith could fill the 64 KiB kernel buffer
  and stall the wait_timeout until it fired.
- JSON re-canonicalisation in log_downgrade before embedding in
  downgrades.jsonl — closes a log-injection vector where a
  hostile tirith could emit literal newlines to forge a top-level
  log record. Sentinel-on-parse-failure keeps the line valid JSON.
- Same subprocess pattern applied to the security_cmd dashboard
  (fetch_audit_stats, fetch_doctor_status).
- New dep: wait-timeout = "0.2".
Web command hardening (F-01 / F-02 / F-03 / F-04 / F-07).
contextcrawler web now:
- parses the URL with the url crate, rejects non-http(s)
  schemes (closes file:///etc/passwd local-read);
- resolves the host and refuses if any resolved IP is in a blocked
  range (loopback / link-local / RFC1918 / ULA / CGN / multicast /
  unspecified / 0.0.0.0/8 / 198.18/15 benchmark / 240/4 future-use,
  plus IPv4-mapped-private-in-IPv6, plus Azure metadata
  168.63.129.16, plus AWS metadata 169.254.169.254 via link-local);
- pins the validated IPs into curl via --resolve so curl can't
  independently re-resolve to a private IP between our check and
  the fetch (DNS-rebinding defence);
- caps curl at --max-time 30, --max-filesize 64 MiB,
  --max-redirs 10;
- uses -- to terminate flag parsing before the URL;
- wraps stderr in strip_ansi.
- New dep: url = "2".
- Residual: multi-host-redirect (other.example after a redirect
  re-resolves DNS) tracked for v0.2.0.

Process & docs

Threat model: new docs/security/THREAT_MODEL.md. Documents
assets, attack surfaces, threat actors, mitigations matrix,
accepted limitations.
Module audits: per-file security audits for supply_chain_gate.rs
(6 findings, no High/Critical), tirith_gate.rs (5 findings,
closed), Commands::Web dispatch + web_cmd.rs (6 findings,
closed), and combined jsonl_rewriter + session_compact_cmd
- security_cmd (3 Mediums, 6 LOW/INFO). Subprocess-timeout
  class-audit conclusion in AUDIT_subprocess_timeout_class.md.
Quality baselines: docs/quality/BASELINE.md snapshots test
count, clippy state, cargo audit result, unsafe blocks, unwrap
distribution. deny.toml covers advisories, licenses, bans,
sources (passes cargo deny check).
Release & rebase docs: docs/contributing/RELEASING.md
(end-to-end runbook) + docs/contributing/UPSTREAM_REBASE.md
(rtk-ai/rtk tracking strategy, what-to-take-vs-skip matrix,
conflict resolution for hardened paths).
Roadmap: docs/ROADMAP.md — v0.1.x line, v0.2.0 candidates
organised into security/process/capability buckets, tracking
model.
Session record: docs/sessions/2026-05-15-overnight.md —
branch-by-branch summary with Codex round results and merge order.

Build & infrastructure

rust-version = "1.80" MSRV declared in Cargo.toml (covers
Ipv6Addr::to_ipv4_mapped used by the SSRF block check).
New scripts: scripts/build-release.sh (with --verify and
--install modes), scripts/bump-version.sh.
Proposed CI jobs documented in docs/quality/CI_JOBS_PROPOSED.md
(release-leak gate + cargo deny check). Wire in when the
.github/ gitignore situation is resolved.

Tests

1845+ passed across the merged tree (was 1828 at v0.1.5). 32 new
regression tests for argv-mode guard / OSC stripping / scrub /
SSRF block / CI trust check / JSONL canonicalisation.

Acknowledgements

Three rounds of Codex peer review on each fix branch. Every
finding tracked, every fix verified.

Assets 2

14 May 16:30

thehoff

v0.1.5

4268cde

v0.1.5 — security release (3 downstream-only GHSAs)

Security release. Three downstream-only fixes covering attack surfaces upstream rtk-ai/rtk has declined to address (rtk-ai/rtk#640 — "by design / tracking"). Each landed on its own feature branch with three rounds of Codex peer review. All three are tracked as draft GitHub Security Advisories pending coordinated publication.

Security fixes

GHSA-3mmh-86cm-g6w4 — shell-execution trust boundary

`contextcrawler err / test / summary` now parse the trailing command as argv and exec without a shell by default.

Shell metacharacters (`|` `;` `&` `<` `>` backtick `$` newline) cause rejection.
The first token is refused if it is a known shell (sh, bash, zsh, dash, ksh, fish, tcsh, csh, ash and their `.exe` variants; cmd, powershell, pwsh; busybox, toybox).
The first token is refused if it is an exec wrapper (env, nice, nohup, time, timeout, gtimeout, ionice, chroot, setpriv, unshare, taskset, stdbuf, script, xargs, watch, sudo, doas, su, runuser, pkexec) — these replace the process image with arg[1+], reintroducing the attack surface.
`--shell` is the documented escape hatch for users who actually want `sh -c` semantics.

Closes a prompt-injection → shell-injection chain where an agent could append a shell payload to a build-triage command and have it auto-execute.

GHSA-wjx4-ffxm-fxxp — terminal-escape stripping

`strip_ansi` previously matched CSI only. Extended to cover the full set of escapes that survive into LLM context:

CSI including DEC private modes (`ESC [ ? ...`)
OSC (window titles, palette changes, notifications)
OSC 8 terminal hyperlinks — visible text preserved, URL payload dropped (hyperlinks are a smuggling channel for instructions or exfil URLs)
DCS, SOS, PM, APC (`ESC P|X|^|_ ... ESC \`)
Standalone Fe/Fp/Fs escapes used by some pagers

Prisma command paths (`run_generate` / `run_migrate` / `run_db_push`) now wrap their failure-fallback `eprint!` calls in `strip_ansi`. A broader audit of remaining raw-emit paths (git / container / dotnet / python / pnpm / grep) is tracked as follow-up.

GHSA-2cwv-rr7c-2p4c — credential scrubbing before tracking-db insert

`scrub_secrets` runs at the INSERT boundary in `tracking.db` and redacts:

Credential-bearing flags: `--password`, `--token`, `--api-key`, `--secret`, `--access-key`, `--auth-token`, `--client-secret` (`=value`, space-value, and escape-aware quoted-value forms).
HTTP `Authorization: Bearer|Basic|Token|ApiKey ` headers.
URL-embedded credentials: `scheme://user:password@host`.
AWS access key IDs (`AKIA…`, `ASIA…`).
GitHub tokens: classic / OAuth / user-to-server / server / refresh PATs (`ghp_`, `gho_`, `ghu_`, `ghs_`, `ghr_`) and fine-grained PATs (`github_pat_…`).
Slack tokens (`xox[abprs]-…`).
mysql / mariadb `-p` (scoped to mysql / mariadb invocations only — `curl -p3000` and similar are not rewritten).

Without scrubbing, `gain --history` would feed credentials back into agent context on every read.

Tests

1828 passed, 0 failed across all three branches and the merged `develop`. Each fix landed with a dedicated regression-test block.

Install

```sh
cargo install --git https://github.com/thehoff/contextcrawler --tag v0.1.5 --locked
```

Full changelog: `CHANGELOG.md`

Assets 2

14 May 14:57

thehoff

v0.1.4

bfe32f1

v0.1.4 — discover report rebrand + gate-design comment

Mop-up release covering two surfaces v0.1.3 didn't touch.

Fixed

contextcrawler discover output still printed RTK — banner, stats line, empty-state hint, section header, column header, and per-row "Equivalent" cells all said RTK … / rtk git. Fixed by widening the existing display_rtk helper from the rewrite path to the discover report path (made pub, applied at the print site in src/discover/report.rs). Internal rtk_cmd: rule literals in rules.rs intentionally untouched — kept as lookup keys aligned with upstream rtk. (#7)

Documented

Added a design-intent comment to process_claude_payload clarifying that the Tirith and supply-chain gates only fire on the PermissionVerdict::Allow path. The gate is a safety net for explicitly-allowlisted command shapes, not a universal filter. Future investigators won't repeat the false alarm of "fresh probes don't appear in downgrades.jsonl". (#7)

Verified

cargo test — 1785 passed / 0 failed / 6 ignored (unchanged from v0.1.3)

Upgrade

cargo install --git https://github.com/thehoff/contextcrawler --tag v0.1.4 --force

🤖 Generated with Claude Code

Assets 2

14 May 13:17

thehoff

v0.1.3

f3bdc35

v0.1.3 — finish rebrand polish

Polish release rolling up the post-v0.1.2 rebrand work, surfaced via fresh-install devel-testing on macOS and Ubuntu.

Fixed (correctness)

Hook rewrite prefix — every rewrite emitted rtk <subcmd>, which failed with command not found: rtk on the documented install (only contextcrawler on PATH). Now emits contextcrawler <subcmd>. Legacy rtk prefix still recognized as already-rewritten passthrough — existing Bash(rtk:*) allowlist entries keep working. (#1)
contextcrawler -v (flag-only invocations) — CLI fallback path tried to exec args[0] when clap parsing failed; with args[0]=-v it produced [rtk: No such file or directory (os error 2)]. Now shows clap's "subcommand required" error; fallback prefix is [contextcrawler: ...]. (#4)

Fixed (cosmetic)

gain dashboard header: RTK Token Savings → ContextCrawler Token Savings (#1)
init -g success output, codex config listing, agent hook output (cline / windsurf / kilocode / antigravity), uninstall messages, and init -g usage text — all consistently branded ContextCrawler / CONTEXTCRAWLER.md / @CONTEXTCRAWLER.md. (#3)
Sourced from RTK_MD / RTK_MD_REF constants so future renames stay centralized.

Added

Tirith gate status in contextcrawler init -g — reports whether the URL-security defense-in-depth gate is armed. Detect-only — does NOT modify ~/.bashrc / ~/.zshrc / fish config. The gate operates exclusively at the Claude Code PreToolUse hook layer via subprocess invocation of tirith check. (#2 superseded by #5)

Internal

rtk source-level identifiers (mod / struct / field names, rtk_cmd: rule values, rtk_equivalent classification keys) intentionally retained to keep upstream rebases against rtk-ai/rtk small.

Verified

cargo test — 1785 passed / 0 failed / 6 ignored
End-to-end smoke test of the rewrite hook on both macOS and Ubuntu
Empirical 61.4% token savings observed on Ubuntu after install (cargo test + find + git log mix)

Upgrade

cargo install --git https://github.com/thehoff/contextcrawler --tag v0.1.3 --force
contextcrawler init -g     # re-register hook + report Tirith gate status

After upgrading, fully /exit + relaunch any running Claude Code sessions so they pick up the new binary on PATH.

🤖 Generated with Claude Code

Assets 2

14 May 06:28

thehoff

v0.1.2

423e5cc

v0.1.2 — first release where init -g actually works

ContextCrawler v0.1.2 — first release where `init -g` actually works

Upgrade strongly recommended for anyone on v0.1.0 / v0.1.1.

TL;DR

The hook command was hardcoded to rtk hook claude, a binary that doesn't exist in this distribution. Every contextcrawler init -g since the rebrand has been writing a broken entry into settings.json — the hook fired, the rtk binary wasn't found, the bash shim gracefully degraded, and Claude Code received raw, un-filtered command output. ContextCrawler was effectively a no-op on every install. This release fixes that and migrates existing broken installs automatically.

A second-pass dual-agent code review (Codex + Claude) also surfaced two real security issues, both patched.

Fixed (critical)

Hook command rename. CLAUDE_HOOK_COMMAND, CURSOR_HOOK_COMMAND, and the Gemini wrapper script + Copilot hook JSON all referenced the non-existent rtk binary. Now write contextcrawler hook <agent>. Install-time matchers recognize the legacy string so existing broken entries get cleaned up on next init -g.

Fixed (security — from a second Codex + Claude review pass)

Session compactor path traversal in resolve_session_path. A bare session id like ../foo was joined under each project directory and the resulting candidate was returned if it resolved to a file. Now rejects ids containing /, \, or ...
Supply-chain cooldown bypass on future-dated publishes. The age guard age > -1d let packages "published" up to 24h ahead of now pass through both bounds entirely. Now clamps negative ages to zero — future dates are treated as just-published, which always trips the cooldown.

Fixed (UX)

Every [rtk] ... / run \rtk ...`warning string now readscontextcrawler` so pasted commands actually work.
~/.claude/RTK.md and @RTK.md reference renamed to CONTEXTCRAWLER.md / @CONTEXTCRAWLER.md. Legacy files auto-migrate on next install.
contextcrawler gain no longer prefixes every row with the redundant rtk string in the by-command and recent-commands tables.
README/MIGRATING gained a pre-install callout warning users who previously ran upstream rtk or jee599/contextzip to clean out stale hook entries.

Other

CodeQL py/insecure-temporary-file fixed (tempfile.mktemp → NamedTemporaryFile).
CodeQL rust/cleartext-logging false-positive in rtk trust --list defused.
Unused-variable compiler warning silenced.

Install

cargo install --git https://github.com/thehoff/contextcrawler --tag v0.1.2 --locked --force
contextcrawler init -g

If you're already on v0.1.0 / v0.1.1, the --force is important — cargo install will skip the upgrade otherwise. Run init -g again after install to migrate RTK.md → CONTEXTCRAWLER.md and replace the broken rtk hook claude entry in your ~/.claude/settings.json. Then restart Claude Code.

Full changelog: CHANGELOG.md.

Assets 2

Releases: thehoff/contextcrawler

ContextCrawler 0.4.0 — the library pivot

ContextCrawler 0.4.0 — the library pivot

Added

Changed

Use as a library (no crates.io needed)

Install the CLI (from source only)

Uh oh!

v0.1.11 — security + supply-chain hardening + perf

Security + supply-chain hardening + perf

🔒 Security

🛠 Supply-chain hardening

🧹 CI + process

Test coverage

Upgrade notes

Empirical proof — real-world 24h scan

Uh oh!

v0.1.10 — supply-chain hardening cluster + correctness pass

v0.1.10 — supply-chain hardening cluster + correctness pass

Supply-chain hardening (#139 → #148)

Security gate (#101 → #122, #128 → #130)

Filters + correctness

Performance + analytics

Features

Verification

Known follow-ups (not blocking)

Post-release patch (force-tag, same v0.1.10)

Uh oh!

v0.1.9 — Claude Code hook fix + zero-trust hardening + bench harness Tier 2

Highlights

Claude Code hook actually works now (#62)

Other notable shipments

Cleanup

Branding sweep (3 rounds; user-reported)

Quality bar

PRs included

Deferred to v0.1.10+

Install

Uh oh!

v0.1.8 — zero-trust CLI wrapper hardening

Highlights

Confirmed RCE blocked

What's new

Rebrand insurance

Tests

PRs included

Install

Uh oh!

v0.1.6 — security & maintenance

[0.1.6] — 2026-05-15

Security

Process & docs

Build & infrastructure

Tests

Acknowledgements

Uh oh!

v0.1.5 — security release (3 downstream-only GHSAs)

Security fixes

GHSA-3mmh-86cm-g6w4 — shell-execution trust boundary

GHSA-wjx4-ffxm-fxxp — terminal-escape stripping

GHSA-2cwv-rr7c-2p4c — credential scrubbing before tracking-db insert

Tests

Install

Uh oh!

v0.1.4 — discover report rebrand + gate-design comment

Fixed

Documented

Verified

Upgrade

Uh oh!

v0.1.3 — finish rebrand polish

Fixed (correctness)

Fixed (cosmetic)

Added

Internal

Verified

Upgrade

Uh oh!

v0.1.2 — first release where init -g actually works

ContextCrawler v0.1.2 — first release where init -g actually works

ContextCrawler v0.1.2 — first release where `init -g` actually works