Releases: thehoff/contextcrawler
ContextCrawler 0.4.0 — the library pivot
ContextCrawler 0.4.0 — the library pivot
ContextCrawler is now a proper lib + bin. The binary is a thin shim over the
library, so the CLI dogfoods the exact code path downstream Rust tools embed.
This completes the community request in #185.
Added
- Curated public embedder API (experimental, not yet semver-guaranteed):
summarize_command_output, filter_output(name, raw), auto_filter_output(raw),
available_filters(), no_bloat. Apply the filters/summariser to captured
command output without spawning the CLI. - Crate-level rustdoc + a comprehensive docs overhaul (README, command/filter
reference, library + architecture guides, security-gate docs).
Changed
- One compile tree (no duplicate bin/lib): dead-code warnings 473 to 0.
- main.rs is a thin shim over contextcrawler::run(); all CLI logic in the lib.
Use as a library (no crates.io needed)
contextcrawler = { git = "https://github.com/thehoff/contextcrawler", tag = "contextcrawler-v0.4.0" }
Install the CLI (from source only)
cargo install --git https://github.com/thehoff/contextcrawler --tag contextcrawler-v0.4.0
No pre-built binaries are distributed: build from source. Downstream of
rtk-ai/rtk, with contextzip + Tirith folded in. Experimental, single-maintainer,
fast-moving: build it, test it against your own workflow, verify before depending.
v0.1.11 — security + supply-chain hardening + perf
Security + supply-chain hardening + perf
Heaviest release since v0.1.6. 7 issues closed, 200+ tests green, 11 commits.
🔒 Security
-
#180— Credentials no longer captured in audit logs. Bothdowngrades.jsonlandsupply_chain.jsonlpreviously serialised the raw shell command verbatim, capturing inline tokens (curlAuthorizationheaders, env-var assignments to*_TOKEN/*_KEY/*_SECRET, GitHub PATs, URL basic-auth, git-credential-helper, CLI flag values). Empirical 24h scan of one user's log: 27 unique 40-hex tokens + 15Authorization: token <hex>headers in cleartext. Fixed at the log-write boundary in both gates plus the tirith blob (which echoes cmd back in findings). Shipscontextcrawler security --scrub-logs [--dry-run]to retroactively clean existing on-disk logs. -
#181—CONTEXTCRAWLER_SUPPLY_CHAIN=offnow works as an inline prefix. The gate's own error messages tell users to "rerun with CONTEXTCRAWLER_SUPPLY_CHAIN=off" — but the gate only checked the shell environment, not the inline form. Users followed the documented bypass and were still blocked. Now parses leading POSIX-style assignments off the cmd. Must be a prefix; mid-cmd&& FOO=bardoes not bypass. -
#182— Retry on transient HTTP errors. A single registry/OSV blip previously turned the whole install intoVerdict::Unavailable(no retry, first error wins). Now: 1 retry, 5s per-attempt, 250ms backoff. Retried only on transport-level failures and5xx.4xxis terminal. Worst-case 10.25s/call, well underCHECK_WALL_BUDGET = 25s.
🛠 Supply-chain hardening
-
#145—parse_package_argsrobustness. Three orthogonal false-positives fixed:celery[redis]extras now stripped before PyPI lookup; inline#comments terminate args; full pip value-consuming flag set (--platform,--python-version,--trusted-host,--extra-index-url,--find-links,--no-binary/--only-binary,--prefix,--root,--upgrade-strategy,--proxy,--cert, …) is now known. -
#147—Ecosystem::Unknownvariant. Depth-cap synthetic was hardcodedEcosystem::Npm— deep-nested PyPI/mixed-shell payloads were mislabelled[npm]. Verdict was always correct; label is now too. Cascade arms added in 3 existing matches. -
#149— O(n²) → O(n) dedup + O(log n) claimed-span check. Mechanical refactor, no behaviour change.dedup_installsuses HashSet keyed on canonical signature; claimed-span overlap-check usesBTreeMap::range. Stress test: 5,000 dedup in single-digit ms; 200-segment install detect in <250ms.
🧹 CI + process
#183—branding_lintred on develop. Four pre-existing leaks insrc/hooks/init.rswere blocking the safety net. Lint is green again.#78(closes#75) — clap-aware doc-comment lint. CatchesRTK savings/adoption/equivalent/artifacts/etc.in clap docs.#80(addresses#77) — merge-driver design + multi-model peer review docs. Pure documentation.
Test coverage
| Module | Tests |
|---|---|
core::secret_redact |
16 |
hooks::tirith_gate |
6 |
hooks::supply_chain_gate |
173 |
tests::branding_lint |
5 |
200+ green across the touched stack. No regressions.
Upgrade notes
- No breaking changes.
- New CLI:
contextcrawler security --scrub-logs [--dry-run]. - The
#182retry change reduces per-attempt HTTP timeout from 8s → 5s. - Existing audit logs are NOT automatically scrubbed on upgrade. Run
contextcrawler security --scrub-logsonce after install if you've used inline credentials in shell commands (curl headers, env-var literals,gh auth login --with-token, etc.).
Empirical proof — real-world 24h scan
| Pattern | Before | After |
|---|---|---|
Authorization: token <hex> in audit logs |
15 | 0 |
| 40-hex tokens captured | 27 | 0 |
| GitHub PAT prefixes | 0 | 0 |
v0.1.10 — supply-chain hardening cluster + correctness pass
v0.1.10 — supply-chain hardening cluster + correctness pass
Caps the #141 plaintext-FP-guard cluster: install-shaped substrings printed
as data through echo/grep/jq/base64/… no longer trigger the
supply-chain gate, while rg/awk/gawk/sed are kept OUT of the
allowlist because they can spawn sub-processes (rg --pre,
awk 'BEGIN { system(...) }', GNU sed s///e).
Both Codex and agy returned GREEN on rounds 2 + 3.
Supply-chain hardening (#139 → #148)
- #148 — plaintext-FP guard with data-utility allowlist
(echo, printf, cat, tac, tee, grep, egrep, fgrep, head, tail, nl,
base64, xxd, od, hexdump, jq) + explicit exclusions for execution-capable
utilities (rg/awk/gawk/sed). 2668 tests, every retained util pinned
by a dedicated masking regression test. - #146 — quote-aware tokeniser closes wrapped-install bypass (#140).
- #143 — value-flag + Windows-launcher edge cases, yarn diagnostic
flag exclusion, legacy-Mac\r-only line continuation. - #142 — newline-chain, Windows launcher, bare poetry/uv bypasses.
- #139 — absolute / relative path install bypass closed.
Security gate (#101 → #122, #128 → #130)
- #130 — integrity binary-hook check, supply-chain budget/cap, ISO-8601
- downgrades hardening (SEC-I1..I4).
- #129 — token-aware permission matching + substitution-aware
decomposition (SEC-C2/C3); skip arithmetic expansion, normalise glob
patterns. - #128 — dotnet forbidden-arg checker (block MSBuild
Custom*Targets / runsettings RCE, SEC-C1) + semicolon-batch / case-fold
/ @response-file bypasses closed. - #101 → #122 — the #100/#111 audit gating cluster (g1–g8 fixes,
hooks dispatch, gate wiring, security core).
Filters + correctness
- #127 — accept standard grep flags (62% drop in parse failures);
route-h/-Gto passthrough. - #126 — restore SIGPIPE default disposition (eliminates startup
SIGABRT). - #125 — recognise CRITICAL/FATAL severities, wc stdin support,
detached-HEAD sha handling, .env visibility (port upstream 62fc0e0);
init refuses malformed CLAUDE.md instead of silent exit-0. - #124 — preserve user content in copilot-instructions.md
(port upstream d108165). - #134 — honest savings claims for read / git log filters
(READ-I1, GIT-I1).
Performance + analytics
- #133 — take VACUUM off the
record()hot path (PERF-I1). - #132 — weighted by-command avg, quota horizon, char-based
token estimate, computed ccusage date (AN-I1..I4). - #135 — new:
gain --weak-filters— rank tools by leaked tokens.
Features
- #138 — Pi (pi.dev) added as the 11th supported harness.
- #136 — unified agent guidance into one canonical source.
- #94 — curl JSON minify filter.
- #93 — ServiceNow SDK build filter.
Verification
- 2,668 / 2,668 unit tests passing
cargo deny check— advisories, bans, licenses, sources okscripts/build-release.sh --verify— zero builder paths in binary- PR #148 peer review: Codex GREEN, agy GREEN (rounds 2 + 3)
Known follow-ups (not blocking)
- README install pins still reference v0.1.6 (stale since v0.1.7).
- CHANGELOG.md top entry is [0.1.7]; v0.1.8/9/10 narratives live on
the GH release pages only. scripts/bump-version.shlooks for aContextCrawler X.Y.Zliteral
no longer present insrc/main.rs(version flows from Cargo.toml via
clap). Manual bump used for this release as for v0.1.8/9.
🤖 Generated with Claude Code
Post-release patch (force-tag, same v0.1.10)
scripts/build-release.sh --install shipped with a macOS Apple-Silicon
AMFI bug — plain cp of the ad-hoc-linker-signed binary to
~/.local/bin/contextcrawler produced a destination that AMFI rejected
at exec (load code signature error 2, SIGKILL before main()). Found
within minutes of cutting the release. Patched by codesign --force --sign - on the destination after copy, macOS-gated. Linux path
unchanged. cargo install --git … --tag v0.1.10 was never affected —
cargo handles signing correctly. Only the RELEASING.md §7 manual
install hit it.
Tag force-moved to include the fix (release was minutes old; no
precompiled artifacts attached, so no SHA-mismatch surface for users).
v0.1.9 — Claude Code hook fix + zero-trust hardening + bench harness Tier 2
Highlights
Claude Code hook actually works now (#62)
Pre-v0.1.9, contextcrawler rewrite "git status" returned "rtk git status" and the Claude Code hook substituted that into Claude Code's updatedInput.command. Anyone with raw-command Claude usage + the hook installed got rtk: command not found on every rewrite. The bug had been live since the rebrand — only masked in our session because most traffic flows via codex (which calls contextcrawler X directly per the AGENTS.md template).
Fixed in PR #72 with full backcompat: the classifier and rewriter now accept BOTH contextcrawler X and rtk X as already-wrapped inputs, and the dashboard's display_cmd strips both prefixes so historical SQLite rows bucket cleanly with new rows.
Other notable shipments
- Zero-trust hardening for
gh/glab/gt(PR #58) — env strip + arg deny + passthrough fix - Pytest
-pbypass (PR #59) — close bare-relative + glued-form-pVALUE/-p=VALUEevasions $CODEX_HOMEescape warning (PR #66) — defence-in-depth when env points outside$HOME- Tier 2 bench harness (PR #70) — end-to-end Claude Code hook measurement, 10.4% measured savings on 8 fixtures
- Allowlist-based branding lint (PR #71) — broad scan with 17 named allowlist rules; catalogues ~50 known-debt RTK strings; future leaks fail at test time instead of user-report time
Cleanup
- 0 cargo warnings on develop tip (was 80+); deleted 6 orphan modules (-2953 lines), wired 2 latent legacy hook bugs (PR #64)
uninstall_codex_atordering fix (PR #65) — AGENTS.md patched before file deletion so failure modes don't leave inconsistent stategit_hardeningtest fixture (PR #60) — hermetic against hostcommit.gpgsign=trueso 1Password/gpg-agent unavailability doesn't break the suiteGLOBAL_ENV_LOCKconsolidation (PR #68) — shared lock for env-mutating integration tests
Branding sweep (3 rounds; user-reported)
- PR #61, #63 — clap help text + error prefixes + init messages
- PR #71 — allowlist-style lint replaces regression-style FORBIDDEN_TOKENS
Quality bar
cargo buildclean: 0 warningscargo test --bin contextcrawler— 2119 pass / 0 fail- 10 integration suites green: branding_lint, git_hardening, cargo_hardening, node_hardening, runtime_hardening, cloud_hardening, harness_standalone, proxy_nudge, gh_glab_gt_hardening, harness_claude_code (new)
PRs included
#54 #56 #58 #59 #60 #61 #63 #64 #65 #66 #68 #70 #71 #72 (14 PRs)
Deferred to v0.1.10+
- #28 codex compliance 80→95% lift (needs 24h post-#54 dashboard sample)
- #29 Tier 3 Codex bench harness
- #40 trusted-PATH binary resolution
- #67-debt ~50 catalogued RTK strings in dashboard / uninstall / etc. (lint-pinned, ready for cleanup PR)
- #69 unit-test
src/env race follow-up to #48
Install
cargo install --git https://github.com/thehoff/contextcrawler --tag v0.1.9
contextcrawler --version # → contextcrawler 0.1.9
If you're using the Claude Code hook installed via contextcrawler init -g --claude-md: strongly recommend updating so the hook's rewrite output matches the binary name. Pre-v0.1.9 the hook returned rtk X substitutions that Claude Code couldn't execute.
v0.1.8 — zero-trust CLI wrapper hardening
Highlights
This release lands the zero-trust CLI wrapper layer across 15 secure_*_command helpers + universal env-strip + per-tool RCE-vector deny-lists for rg/find/git/cargo/node/python/ruby/jvm/dotnet/go/k8s/docker/aws/psql/curl/wget.
Threat model framing: the LLM is the caller, and we want defense in depth against hostile env or arg injection inherited from upstream processes — this is hardening, not a panic patch. (See docs/security/zero-trust-wrapped-cli.md)
Confirmed RCE blocked
RIPGREP_CONFIG_PATH→ loaded config can carry--pre=<script>→ rg executes the script per file. Empirically reproduced 4/4 attack paths firing the preprocessor as the user's uid, then blocked viasecure_rg_command()(env strip) +check_forbidden_rg_args()(arg deny + bundle-form detection).
What's new
- 15 secure_*_command helpers wrapping every spawned tool —
LD_PRELOAD,DYLD_*,BASH_FUNC_*,PROMPT_COMMAND,PERL5OPT,LUA_INIT,RUBYOPT,JAVA_TOOL_OPTIONS,DOTNET_STARTUP_HOOKS,RUSTC_WRAPPER,NODE_OPTIONS,GIT_EXTERNAL_DIFF,KUBECONFIG(k8s exec credential plugins),AWS_CONFIG_FILE(credential_process),RIPGREP_CONFIG_PATH, etc. - Real
contextcrawler securitysubcommand — Tirith dashboard with binary resolution, gate state, downgrade log tail,--jsonmode. (Was previously a doc-only feature; invocations fell through to macOS/usr/bin/security.) - Token-savings improvements — symmetric 80/80 head/tail cap (was 80/20),
passthrough_extensionsallowlist, two-line marker withcontextcrawler proxy cat <path>escape hatch,grep -c/-L/-o/-Zlong-form routed through rg. - Codex compliance lifted 0% → 80% via strengthened AGENTS.md template with MUST + WRONG/RIGHT examples.
- Bench harness Tier 1 — isolated DB tempfile, JSON/MD reports keyed by git sha.
Rebrand insurance
--versionnow printscontextcrawler 0.1.8(was leakingrtk 0.39.0).- 47
[rtk]→[contextcrawler]print-string sweep. - 4 regression-pinning tests (branding_lint,
RTK_MDconstant pin, CLI name pin, config-file canonical name).
Tests
- 2116 bin tests + 7 integration suites green.
PRs included
#10 #14 #15 #16 #17 #18 #21 #24 #25 #30 #31 #33 #41 #42 #43 #44 #46 #47 #51
Install
cargo install --git https://github.com/thehoff/contextcrawler --tag v0.1.8
contextcrawler --version # → contextcrawler 0.1.8
v0.1.6 — security & maintenance
[0.1.6] — 2026-05-15
Security and maintenance release. Closes 12 audit findings from the
2026-05-15 review (extending the three GHSAs from v0.1.5 plus
downstream-only findings on the web command, supply-chain integration,
filter trust model, and tirith gate). Adds the long-term-maintenance
framework: threat model, release runbook, upstream-rebase strategy,
quality baselines, three per-module security audits, and a roadmap.
Security
-
Build-host metadata stripped from release binaries. Previously
the release binary embedded ~284/Users/<builder>/.cargo/registry/...
paths used by Rust's panic-backtrace metadata, leaking the builder's
username and directory layout.scripts/build-release.shnow sets
--remap-path-prefixfor$CARGO_HOMEand the workspace;--verify
mode asserts zero builder paths in the produced binary. -
strip_ansiextended + raw-emit sweep.strip_ansialready
covered CSI; v0.1.5 added OSC / OSC 8 hyperlinks / DCS / SOS / PM /
APC / private DEC modes. v0.1.6 sweeps 58 raweprint!/println!
sites across 9 files (cmds/git/,cmds/cloud/,cmds/js/,
cmds/python/,cmds/dotnet/,cmds/system/grep_cmd.rs,
cmds/go/,core/runner.rs) so failure-path tool output goes
through the sanitiser before reaching the agent. -
Global TOML filter trust gate (H-3).
~/.config/rtk/filters.toml
was previously loaded with no integrity check while the project-local
.rtk/filters.tomlwas SHA-256-pinned. Closed: same trust store,
same content-change-revokes semantics. New CLI:contextcrawler trust --global/untrust --global. Plus a TOCTOU fix
(check_trust_bytesworks on the already-read buffer instead of
re-opening the path between hash and parse). -
CI trust-override now requires platform-injected token (H-2).
RTK_TRUST_PROJECT_FILTERS=1previously trusted any env that set
CI=true(settable by a hostile Makefile). Tightened to also require
a platform-injected token (GITHUB_TOKEN,CI_JOB_TOKEN,
BUILDKITE_AGENT_ACCESS_TOKEN,JENKINS_NODE_COOKIE/BUILD_TAG,
CIRCLE_TOKEN/CIRCLE_BUILD_NUM,DRONE_BUILD_NUMBER). An in-repo
Makefile can't fake these. -
Tirith subprocess hardening (F-01 / F-02 / F-04 / F-05).
wait_timeout(8s)so a hungtirith checkno longer freezes the
agent's PreToolUse hook (was indefinite).- 4 MiB stdout cap.
Stdio::null()on stdin and stderr — the stderr pipe was never
drained, so a noisy tirith could fill the 64 KiB kernel buffer
and stall the wait_timeout until it fired.- JSON re-canonicalisation in
log_downgradebefore embedding in
downgrades.jsonl— closes a log-injection vector where a
hostile tirith could emit literal newlines to forge a top-level
log record. Sentinel-on-parse-failure keeps the line valid JSON. - Same subprocess pattern applied to the
security_cmddashboard
(fetch_audit_stats,fetch_doctor_status). - New dep:
wait-timeout = "0.2".
-
Web command hardening (F-01 / F-02 / F-03 / F-04 / F-07).
contextcrawler webnow:- parses the URL with the
urlcrate, rejects non-http(s)
schemes (closesfile:///etc/passwdlocal-read); - resolves the host and refuses if any resolved IP is in a blocked
range (loopback / link-local / RFC1918 / ULA / CGN / multicast /
unspecified / 0.0.0.0/8 / 198.18/15 benchmark / 240/4 future-use,
plus IPv4-mapped-private-in-IPv6, plus Azure metadata
168.63.129.16, plus AWS metadata 169.254.169.254 via link-local); - pins the validated IPs into curl via
--resolveso curl can't
independently re-resolve to a private IP between our check and
the fetch (DNS-rebinding defence); - caps curl at
--max-time 30,--max-filesize 64 MiB,
--max-redirs 10; - uses
--to terminate flag parsing before the URL; - wraps stderr in
strip_ansi. - New dep:
url = "2". - Residual: multi-host-redirect (
other.exampleafter a redirect
re-resolves DNS) tracked for v0.2.0.
- parses the URL with the
Process & docs
-
Threat model: new
docs/security/THREAT_MODEL.md. Documents
assets, attack surfaces, threat actors, mitigations matrix,
accepted limitations. -
Module audits: per-file security audits for
supply_chain_gate.rs
(6 findings, no High/Critical),tirith_gate.rs(5 findings,
closed),Commands::Webdispatch +web_cmd.rs(6 findings,
closed), and combinedjsonl_rewriter+session_compact_cmdsecurity_cmd(3 Mediums, 6 LOW/INFO). Subprocess-timeout
class-audit conclusion inAUDIT_subprocess_timeout_class.md.
-
Quality baselines:
docs/quality/BASELINE.mdsnapshots test
count, clippy state,cargo auditresult, unsafe blocks, unwrap
distribution.deny.tomlcovers advisories, licenses, bans,
sources (passescargo deny check). -
Release & rebase docs:
docs/contributing/RELEASING.md
(end-to-end runbook) +docs/contributing/UPSTREAM_REBASE.md
(rtk-ai/rtk tracking strategy, what-to-take-vs-skip matrix,
conflict resolution for hardened paths). -
Roadmap:
docs/ROADMAP.md— v0.1.x line, v0.2.0 candidates
organised into security/process/capability buckets, tracking
model. -
Session record:
docs/sessions/2026-05-15-overnight.md—
branch-by-branch summary with Codex round results and merge order.
Build & infrastructure
rust-version = "1.80"MSRV declared in Cargo.toml (covers
Ipv6Addr::to_ipv4_mappedused by the SSRF block check).- New scripts:
scripts/build-release.sh(with--verifyand
--installmodes),scripts/bump-version.sh. - Proposed CI jobs documented in
docs/quality/CI_JOBS_PROPOSED.md
(release-leak gate +cargo deny check). Wire in when the
.github/gitignore situation is resolved.
Tests
1845+ passed across the merged tree (was 1828 at v0.1.5). 32 new
regression tests for argv-mode guard / OSC stripping / scrub /
SSRF block / CI trust check / JSONL canonicalisation.
Acknowledgements
Three rounds of Codex peer review on each fix branch. Every
finding tracked, every fix verified.
v0.1.5 — security release (3 downstream-only GHSAs)
Security release. Three downstream-only fixes covering attack surfaces upstream rtk-ai/rtk has declined to address (rtk-ai/rtk#640 — "by design / tracking"). Each landed on its own feature branch with three rounds of Codex peer review. All three are tracked as draft GitHub Security Advisories pending coordinated publication.
Security fixes
GHSA-3mmh-86cm-g6w4 — shell-execution trust boundary
`contextcrawler err / test / summary` now parse the trailing command as argv and exec without a shell by default.
- Shell metacharacters (`|` `;` `&` `<` `>` backtick `$` newline) cause rejection.
- The first token is refused if it is a known shell (sh, bash, zsh, dash, ksh, fish, tcsh, csh, ash and their `.exe` variants; cmd, powershell, pwsh; busybox, toybox).
- The first token is refused if it is an exec wrapper (env, nice, nohup, time, timeout, gtimeout, ionice, chroot, setpriv, unshare, taskset, stdbuf, script, xargs, watch, sudo, doas, su, runuser, pkexec) — these replace the process image with arg[1+], reintroducing the attack surface.
- `--shell` is the documented escape hatch for users who actually want `sh -c` semantics.
Closes a prompt-injection → shell-injection chain where an agent could append a shell payload to a build-triage command and have it auto-execute.
GHSA-wjx4-ffxm-fxxp — terminal-escape stripping
`strip_ansi` previously matched CSI only. Extended to cover the full set of escapes that survive into LLM context:
- CSI including DEC private modes (`ESC [ ? ...`)
- OSC (window titles, palette changes, notifications)
- OSC 8 terminal hyperlinks — visible text preserved, URL payload dropped (hyperlinks are a smuggling channel for instructions or exfil URLs)
- DCS, SOS, PM, APC (`ESC P|X|^|_ ... ESC \`)
- Standalone Fe/Fp/Fs escapes used by some pagers
Prisma command paths (`run_generate` / `run_migrate` / `run_db_push`) now wrap their failure-fallback `eprint!` calls in `strip_ansi`. A broader audit of remaining raw-emit paths (git / container / dotnet / python / pnpm / grep) is tracked as follow-up.
GHSA-2cwv-rr7c-2p4c — credential scrubbing before tracking-db insert
`scrub_secrets` runs at the INSERT boundary in `tracking.db` and redacts:
- Credential-bearing flags: `--password`, `--token`, `--api-key`, `--secret`, `--access-key`, `--auth-token`, `--client-secret` (`=value`, space-value, and escape-aware quoted-value forms).
- HTTP `Authorization: Bearer|Basic|Token|ApiKey ` headers.
- URL-embedded credentials: `scheme://user:password@host`.
- AWS access key IDs (`AKIA…`, `ASIA…`).
- GitHub tokens: classic / OAuth / user-to-server / server / refresh PATs (`ghp_`, `gho_`, `ghu_`, `ghs_`, `ghr_`) and fine-grained PATs (`github_pat_…`).
- Slack tokens (`xox[abprs]-…`).
- mysql / mariadb `-p` (scoped to mysql / mariadb invocations only — `curl -p3000` and similar are not rewritten).
Without scrubbing, `gain --history` would feed credentials back into agent context on every read.
Tests
1828 passed, 0 failed across all three branches and the merged `develop`. Each fix landed with a dedicated regression-test block.
Install
```sh
cargo install --git https://github.com/thehoff/contextcrawler --tag v0.1.5 --locked
```
Full changelog: `CHANGELOG.md`
v0.1.4 — discover report rebrand + gate-design comment
Mop-up release covering two surfaces v0.1.3 didn't touch.
Fixed
contextcrawler discoveroutput still printed RTK — banner, stats line, empty-state hint, section header, column header, and per-row "Equivalent" cells all saidRTK …/rtk git. Fixed by widening the existingdisplay_rtkhelper from the rewrite path to the discover report path (madepub, applied at the print site insrc/discover/report.rs). Internalrtk_cmd:rule literals inrules.rsintentionally untouched — kept as lookup keys aligned with upstream rtk. (#7)
Documented
- Added a design-intent comment to
process_claude_payloadclarifying that the Tirith and supply-chain gates only fire on thePermissionVerdict::Allowpath. The gate is a safety net for explicitly-allowlisted command shapes, not a universal filter. Future investigators won't repeat the false alarm of "fresh probes don't appear indowngrades.jsonl". (#7)
Verified
cargo test— 1785 passed / 0 failed / 6 ignored (unchanged from v0.1.3)
Upgrade
cargo install --git https://github.com/thehoff/contextcrawler --tag v0.1.4 --force
🤖 Generated with Claude Code
v0.1.3 — finish rebrand polish
Polish release rolling up the post-v0.1.2 rebrand work, surfaced via fresh-install devel-testing on macOS and Ubuntu.
Fixed (correctness)
- Hook rewrite prefix — every rewrite emitted
rtk <subcmd>, which failed withcommand not found: rtkon the documented install (onlycontextcrawleron PATH). Now emitscontextcrawler <subcmd>. Legacyrtkprefix still recognized as already-rewritten passthrough — existingBash(rtk:*)allowlist entries keep working. (#1) contextcrawler -v(flag-only invocations) — CLI fallback path tried to execargs[0]when clap parsing failed; withargs[0]=-vit produced[rtk: No such file or directory (os error 2)]. Now shows clap's "subcommand required" error; fallback prefix is[contextcrawler: ...]. (#4)
Fixed (cosmetic)
gaindashboard header:RTK Token Savings→ContextCrawler Token Savings(#1)init -gsuccess output, codex config listing, agent hook output (cline / windsurf / kilocode / antigravity), uninstall messages, andinit -gusage text — all consistently branded ContextCrawler /CONTEXTCRAWLER.md/@CONTEXTCRAWLER.md. (#3)- Sourced from
RTK_MD/RTK_MD_REFconstants so future renames stay centralized.
Added
- Tirith gate status in
contextcrawler init -g— reports whether the URL-security defense-in-depth gate is armed. Detect-only — does NOT modify~/.bashrc/~/.zshrc/ fish config. The gate operates exclusively at the Claude Code PreToolUse hook layer via subprocess invocation oftirith check. (#2 superseded by #5)
Internal
rtksource-level identifiers (mod / struct / field names,rtk_cmd:rule values,rtk_equivalentclassification keys) intentionally retained to keep upstream rebases against rtk-ai/rtk small.
Verified
cargo test— 1785 passed / 0 failed / 6 ignored- End-to-end smoke test of the rewrite hook on both macOS and Ubuntu
- Empirical 61.4% token savings observed on Ubuntu after install (cargo test + find + git log mix)
Upgrade
cargo install --git https://github.com/thehoff/contextcrawler --tag v0.1.3 --force
contextcrawler init -g # re-register hook + report Tirith gate status
After upgrading, fully /exit + relaunch any running Claude Code sessions so they pick up the new binary on PATH.
🤖 Generated with Claude Code
v0.1.2 — first release where init -g actually works
ContextCrawler v0.1.2 — first release where init -g actually works
Upgrade strongly recommended for anyone on v0.1.0 / v0.1.1.
TL;DR
The hook command was hardcoded to rtk hook claude, a binary that doesn't exist in this distribution. Every contextcrawler init -g since the rebrand has been writing a broken entry into settings.json — the hook fired, the rtk binary wasn't found, the bash shim gracefully degraded, and Claude Code received raw, un-filtered command output. ContextCrawler was effectively a no-op on every install. This release fixes that and migrates existing broken installs automatically.
A second-pass dual-agent code review (Codex + Claude) also surfaced two real security issues, both patched.
Fixed (critical)
- Hook command rename.
CLAUDE_HOOK_COMMAND,CURSOR_HOOK_COMMAND, and the Gemini wrapper script + Copilot hook JSON all referenced the non-existentrtkbinary. Now writecontextcrawler hook <agent>. Install-time matchers recognize the legacy string so existing broken entries get cleaned up on nextinit -g.
Fixed (security — from a second Codex + Claude review pass)
- Session compactor path traversal in
resolve_session_path. A bare session id like../foowas joined under each project directory and the resulting candidate was returned if it resolved to a file. Now rejects ids containing/,\, or... - Supply-chain cooldown bypass on future-dated publishes. The age guard
age > -1dlet packages "published" up to 24h ahead of now pass through both bounds entirely. Now clamps negative ages to zero — future dates are treated as just-published, which always trips the cooldown.
Fixed (UX)
- Every
[rtk] .../run \rtk ...`warning string now readscontextcrawler` so pasted commands actually work. ~/.claude/RTK.mdand@RTK.mdreference renamed toCONTEXTCRAWLER.md/@CONTEXTCRAWLER.md. Legacy files auto-migrate on next install.contextcrawler gainno longer prefixes every row with the redundantrtkstring in the by-command and recent-commands tables.- README/MIGRATING gained a pre-install callout warning users who previously ran upstream
rtkorjee599/contextzipto clean out stale hook entries.
Other
- CodeQL
py/insecure-temporary-filefixed (tempfile.mktemp→NamedTemporaryFile). - CodeQL
rust/cleartext-loggingfalse-positive inrtk trust --listdefused. - Unused-variable compiler warning silenced.
Install
cargo install --git https://github.com/thehoff/contextcrawler --tag v0.1.2 --locked --force
contextcrawler init -gIf you're already on v0.1.0 / v0.1.1, the --force is important — cargo install will skip the upgrade otherwise. Run init -g again after install to migrate RTK.md → CONTEXTCRAWLER.md and replace the broken rtk hook claude entry in your ~/.claude/settings.json. Then restart Claude Code.
Full changelog: CHANGELOG.md.