Skip to content

Releases: thehoff/contextcrawler

ContextCrawler 0.4.0 — the library pivot

07 Jun 06:05

Choose a tag to compare

ContextCrawler 0.4.0 — the library pivot

ContextCrawler is now a proper lib + bin. The binary is a thin shim over the
library, so the CLI dogfoods the exact code path downstream Rust tools embed.
This completes the community request in #185.

Added

  • Curated public embedder API (experimental, not yet semver-guaranteed):
    summarize_command_output, filter_output(name, raw), auto_filter_output(raw),
    available_filters(), no_bloat. Apply the filters/summariser to captured
    command output without spawning the CLI.
  • Crate-level rustdoc + a comprehensive docs overhaul (README, command/filter
    reference, library + architecture guides, security-gate docs).

Changed

  • One compile tree (no duplicate bin/lib): dead-code warnings 473 to 0.
  • main.rs is a thin shim over contextcrawler::run(); all CLI logic in the lib.

Use as a library (no crates.io needed)

contextcrawler = { git = "https://github.com/thehoff/contextcrawler", tag = "contextcrawler-v0.4.0" }

Install the CLI (from source only)

cargo install --git https://github.com/thehoff/contextcrawler --tag contextcrawler-v0.4.0

No pre-built binaries are distributed: build from source. Downstream of
rtk-ai/rtk, with contextzip + Tirith folded in. Experimental, single-maintainer,
fast-moving: build it, test it against your own workflow, verify before depending.

v0.1.11 — security + supply-chain hardening + perf

26 May 04:10
3942088

Choose a tag to compare

Security + supply-chain hardening + perf

Heaviest release since v0.1.6. 7 issues closed, 200+ tests green, 11 commits.

🔒 Security

  • #180 — Credentials no longer captured in audit logs. Both downgrades.jsonl and supply_chain.jsonl previously serialised the raw shell command verbatim, capturing inline tokens (curl Authorization headers, env-var assignments to *_TOKEN/*_KEY/*_SECRET, GitHub PATs, URL basic-auth, git-credential-helper, CLI flag values). Empirical 24h scan of one user's log: 27 unique 40-hex tokens + 15 Authorization: token <hex> headers in cleartext. Fixed at the log-write boundary in both gates plus the tirith blob (which echoes cmd back in findings). Ships contextcrawler security --scrub-logs [--dry-run] to retroactively clean existing on-disk logs.

  • #181CONTEXTCRAWLER_SUPPLY_CHAIN=off now works as an inline prefix. The gate's own error messages tell users to "rerun with CONTEXTCRAWLER_SUPPLY_CHAIN=off" — but the gate only checked the shell environment, not the inline form. Users followed the documented bypass and were still blocked. Now parses leading POSIX-style assignments off the cmd. Must be a prefix; mid-cmd && FOO=bar does not bypass.

  • #182 — Retry on transient HTTP errors. A single registry/OSV blip previously turned the whole install into Verdict::Unavailable (no retry, first error wins). Now: 1 retry, 5s per-attempt, 250ms backoff. Retried only on transport-level failures and 5xx. 4xx is terminal. Worst-case 10.25s/call, well under CHECK_WALL_BUDGET = 25s.

🛠 Supply-chain hardening

  • #145parse_package_args robustness. Three orthogonal false-positives fixed: celery[redis] extras now stripped before PyPI lookup; inline # comments terminate args; full pip value-consuming flag set (--platform, --python-version, --trusted-host, --extra-index-url, --find-links, --no-binary/--only-binary, --prefix, --root, --upgrade-strategy, --proxy, --cert, …) is now known.

  • #147Ecosystem::Unknown variant. Depth-cap synthetic was hardcoded Ecosystem::Npm — deep-nested PyPI/mixed-shell payloads were mislabelled [npm]. Verdict was always correct; label is now too. Cascade arms added in 3 existing matches.

  • #149 — O(n²) → O(n) dedup + O(log n) claimed-span check. Mechanical refactor, no behaviour change. dedup_installs uses HashSet keyed on canonical signature; claimed-span overlap-check uses BTreeMap::range. Stress test: 5,000 dedup in single-digit ms; 200-segment install detect in <250ms.

🧹 CI + process

  • #183branding_lint red on develop. Four pre-existing leaks in src/hooks/init.rs were blocking the safety net. Lint is green again.
  • #78 (closes #75) — clap-aware doc-comment lint. Catches RTK savings/adoption/equivalent/artifacts/etc. in clap docs.
  • #80 (addresses #77) — merge-driver design + multi-model peer review docs. Pure documentation.

Test coverage

Module Tests
core::secret_redact 16
hooks::tirith_gate 6
hooks::supply_chain_gate 173
tests::branding_lint 5

200+ green across the touched stack. No regressions.

Upgrade notes

  • No breaking changes.
  • New CLI: contextcrawler security --scrub-logs [--dry-run].
  • The #182 retry change reduces per-attempt HTTP timeout from 8s → 5s.
  • Existing audit logs are NOT automatically scrubbed on upgrade. Run contextcrawler security --scrub-logs once after install if you've used inline credentials in shell commands (curl headers, env-var literals, gh auth login --with-token, etc.).

Empirical proof — real-world 24h scan

Pattern Before After
Authorization: token <hex> in audit logs 15 0
40-hex tokens captured 27 0
GitHub PAT prefixes 0 0

v0.1.10 — supply-chain hardening cluster + correctness pass

23 May 05:53
779a853

Choose a tag to compare

v0.1.10 — supply-chain hardening cluster + correctness pass

Caps the #141 plaintext-FP-guard cluster: install-shaped substrings printed
as data through echo/grep/jq/base64/… no longer trigger the
supply-chain gate, while rg/awk/gawk/sed are kept OUT of the
allowlist because they can spawn sub-processes (rg --pre,
awk 'BEGIN { system(...) }', GNU sed s///e).

Both Codex and agy returned GREEN on rounds 2 + 3.

Supply-chain hardening (#139#148)

  • #148 — plaintext-FP guard with data-utility allowlist
    (echo, printf, cat, tac, tee, grep, egrep, fgrep, head, tail, nl,
    base64, xxd, od, hexdump, jq) + explicit exclusions for execution-capable
    utilities (rg/awk/gawk/sed). 2668 tests, every retained util pinned
    by a dedicated masking regression test.
  • #146 — quote-aware tokeniser closes wrapped-install bypass (#140).
  • #143 — value-flag + Windows-launcher edge cases, yarn diagnostic
    flag exclusion, legacy-Mac \r-only line continuation.
  • #142 — newline-chain, Windows launcher, bare poetry/uv bypasses.
  • #139 — absolute / relative path install bypass closed.

Security gate (#101#122, #128#130)

  • #130 — integrity binary-hook check, supply-chain budget/cap, ISO-8601
    • downgrades hardening (SEC-I1..I4).
  • #129 — token-aware permission matching + substitution-aware
    decomposition (SEC-C2/C3); skip arithmetic expansion, normalise glob
    patterns.
  • #128 — dotnet forbidden-arg checker (block MSBuild
    Custom*Targets / runsettings RCE, SEC-C1) + semicolon-batch / case-fold
    / @response-file bypasses closed.
  • #101#122 — the #100/#111 audit gating cluster (g1–g8 fixes,
    hooks dispatch, gate wiring, security core).

Filters + correctness

  • #127 — accept standard grep flags (62% drop in parse failures);
    route -h/-G to passthrough.
  • #126 — restore SIGPIPE default disposition (eliminates startup
    SIGABRT).
  • #125 — recognise CRITICAL/FATAL severities, wc stdin support,
    detached-HEAD sha handling, .env visibility (port upstream 62fc0e0);
    init refuses malformed CLAUDE.md instead of silent exit-0.
  • #124 — preserve user content in copilot-instructions.md
    (port upstream d108165).
  • #134 — honest savings claims for read / git log filters
    (READ-I1, GIT-I1).

Performance + analytics

  • #133 — take VACUUM off the record() hot path (PERF-I1).
  • #132 — weighted by-command avg, quota horizon, char-based
    token estimate, computed ccusage date (AN-I1..I4).
  • #135 — new: gain --weak-filters — rank tools by leaked tokens.

Features

  • #138 — Pi (pi.dev) added as the 11th supported harness.
  • #136 — unified agent guidance into one canonical source.
  • #94 — curl JSON minify filter.
  • #93 — ServiceNow SDK build filter.

Verification

  • 2,668 / 2,668 unit tests passing
  • cargo deny check — advisories, bans, licenses, sources ok
  • scripts/build-release.sh --verify — zero builder paths in binary
  • PR #148 peer review: Codex GREEN, agy GREEN (rounds 2 + 3)

Known follow-ups (not blocking)

  • README install pins still reference v0.1.6 (stale since v0.1.7).
  • CHANGELOG.md top entry is [0.1.7]; v0.1.8/9/10 narratives live on
    the GH release pages only.
  • scripts/bump-version.sh looks for a ContextCrawler X.Y.Z literal
    no longer present in src/main.rs (version flows from Cargo.toml via
    clap). Manual bump used for this release as for v0.1.8/9.

🤖 Generated with Claude Code


Post-release patch (force-tag, same v0.1.10)

scripts/build-release.sh --install shipped with a macOS Apple-Silicon
AMFI bug — plain cp of the ad-hoc-linker-signed binary to
~/.local/bin/contextcrawler produced a destination that AMFI rejected
at exec (load code signature error 2, SIGKILL before main()). Found
within minutes of cutting the release. Patched by codesign --force --sign - on the destination after copy, macOS-gated. Linux path
unchanged. cargo install --git … --tag v0.1.10 was never affected —
cargo handles signing correctly. Only the RELEASING.md §7 manual
install hit it.

Tag force-moved to include the fix (release was minutes old; no
precompiled artifacts attached, so no SHA-mismatch surface for users).

v0.1.9 — Claude Code hook fix + zero-trust hardening + bench harness Tier 2

18 May 14:45
0ff327f

Choose a tag to compare

Highlights

Claude Code hook actually works now (#62)

Pre-v0.1.9, contextcrawler rewrite "git status" returned "rtk git status" and the Claude Code hook substituted that into Claude Code's updatedInput.command. Anyone with raw-command Claude usage + the hook installed got rtk: command not found on every rewrite. The bug had been live since the rebrand — only masked in our session because most traffic flows via codex (which calls contextcrawler X directly per the AGENTS.md template).

Fixed in PR #72 with full backcompat: the classifier and rewriter now accept BOTH contextcrawler X and rtk X as already-wrapped inputs, and the dashboard's display_cmd strips both prefixes so historical SQLite rows bucket cleanly with new rows.

Other notable shipments

  • Zero-trust hardening for gh / glab / gt (PR #58) — env strip + arg deny + passthrough fix
  • Pytest -p bypass (PR #59) — close bare-relative + glued-form -pVALUE / -p=VALUE evasions
  • $CODEX_HOME escape warning (PR #66) — defence-in-depth when env points outside $HOME
  • Tier 2 bench harness (PR #70) — end-to-end Claude Code hook measurement, 10.4% measured savings on 8 fixtures
  • Allowlist-based branding lint (PR #71) — broad scan with 17 named allowlist rules; catalogues ~50 known-debt RTK strings; future leaks fail at test time instead of user-report time

Cleanup

  • 0 cargo warnings on develop tip (was 80+); deleted 6 orphan modules (-2953 lines), wired 2 latent legacy hook bugs (PR #64)
  • uninstall_codex_at ordering fix (PR #65) — AGENTS.md patched before file deletion so failure modes don't leave inconsistent state
  • git_hardening test fixture (PR #60) — hermetic against host commit.gpgsign=true so 1Password/gpg-agent unavailability doesn't break the suite
  • GLOBAL_ENV_LOCK consolidation (PR #68) — shared lock for env-mutating integration tests

Branding sweep (3 rounds; user-reported)

  • PR #61, #63 — clap help text + error prefixes + init messages
  • PR #71 — allowlist-style lint replaces regression-style FORBIDDEN_TOKENS

Quality bar

  • cargo build clean: 0 warnings
  • cargo test --bin contextcrawler2119 pass / 0 fail
  • 10 integration suites green: branding_lint, git_hardening, cargo_hardening, node_hardening, runtime_hardening, cloud_hardening, harness_standalone, proxy_nudge, gh_glab_gt_hardening, harness_claude_code (new)

PRs included

#54 #56 #58 #59 #60 #61 #63 #64 #65 #66 #68 #70 #71 #72 (14 PRs)

Deferred to v0.1.10+

  • #28 codex compliance 80→95% lift (needs 24h post-#54 dashboard sample)
  • #29 Tier 3 Codex bench harness
  • #40 trusted-PATH binary resolution
  • #67-debt ~50 catalogued RTK strings in dashboard / uninstall / etc. (lint-pinned, ready for cleanup PR)
  • #69 unit-test src/ env race follow-up to #48

Install

cargo install --git https://github.com/thehoff/contextcrawler --tag v0.1.9
contextcrawler --version    # → contextcrawler 0.1.9

If you're using the Claude Code hook installed via contextcrawler init -g --claude-md: strongly recommend updating so the hook's rewrite output matches the binary name. Pre-v0.1.9 the hook returned rtk X substitutions that Claude Code couldn't execute.

v0.1.8 — zero-trust CLI wrapper hardening

18 May 10:02
1e967af

Choose a tag to compare

Highlights

This release lands the zero-trust CLI wrapper layer across 15 secure_*_command helpers + universal env-strip + per-tool RCE-vector deny-lists for rg/find/git/cargo/node/python/ruby/jvm/dotnet/go/k8s/docker/aws/psql/curl/wget.

Threat model framing: the LLM is the caller, and we want defense in depth against hostile env or arg injection inherited from upstream processes — this is hardening, not a panic patch. (See docs/security/zero-trust-wrapped-cli.md)

Confirmed RCE blocked

  • RIPGREP_CONFIG_PATH → loaded config can carry --pre=<script> → rg executes the script per file. Empirically reproduced 4/4 attack paths firing the preprocessor as the user's uid, then blocked via secure_rg_command() (env strip) + check_forbidden_rg_args() (arg deny + bundle-form detection).

What's new

  • 15 secure_*_command helpers wrapping every spawned tool — LD_PRELOAD, DYLD_*, BASH_FUNC_*, PROMPT_COMMAND, PERL5OPT, LUA_INIT, RUBYOPT, JAVA_TOOL_OPTIONS, DOTNET_STARTUP_HOOKS, RUSTC_WRAPPER, NODE_OPTIONS, GIT_EXTERNAL_DIFF, KUBECONFIG (k8s exec credential plugins), AWS_CONFIG_FILE (credential_process), RIPGREP_CONFIG_PATH, etc.
  • Real contextcrawler security subcommand — Tirith dashboard with binary resolution, gate state, downgrade log tail, --json mode. (Was previously a doc-only feature; invocations fell through to macOS /usr/bin/security.)
  • Token-savings improvements — symmetric 80/80 head/tail cap (was 80/20), passthrough_extensions allowlist, two-line marker with contextcrawler proxy cat <path> escape hatch, grep -c/-L/-o/-Z long-form routed through rg.
  • Codex compliance lifted 0% → 80% via strengthened AGENTS.md template with MUST + WRONG/RIGHT examples.
  • Bench harness Tier 1 — isolated DB tempfile, JSON/MD reports keyed by git sha.

Rebrand insurance

  • --version now prints contextcrawler 0.1.8 (was leaking rtk 0.39.0).
  • 47 [rtk][contextcrawler] print-string sweep.
  • 4 regression-pinning tests (branding_lint, RTK_MD constant pin, CLI name pin, config-file canonical name).

Tests

  • 2116 bin tests + 7 integration suites green.

PRs included

#10 #14 #15 #16 #17 #18 #21 #24 #25 #30 #31 #33 #41 #42 #43 #44 #46 #47 #51

Install

cargo install --git https://github.com/thehoff/contextcrawler --tag v0.1.8
contextcrawler --version    # → contextcrawler 0.1.8

v0.1.6 — security & maintenance

15 May 12:31

Choose a tag to compare

[0.1.6] — 2026-05-15

Security and maintenance release. Closes 12 audit findings from the
2026-05-15 review (extending the three GHSAs from v0.1.5 plus
downstream-only findings on the web command, supply-chain integration,
filter trust model, and tirith gate). Adds the long-term-maintenance
framework: threat model, release runbook, upstream-rebase strategy,
quality baselines, three per-module security audits, and a roadmap.

Security

  • Build-host metadata stripped from release binaries. Previously
    the release binary embedded ~284 /Users/<builder>/.cargo/registry/...
    paths used by Rust's panic-backtrace metadata, leaking the builder's
    username and directory layout. scripts/build-release.sh now sets
    --remap-path-prefix for $CARGO_HOME and the workspace; --verify
    mode asserts zero builder paths in the produced binary.

  • strip_ansi extended + raw-emit sweep. strip_ansi already
    covered CSI; v0.1.5 added OSC / OSC 8 hyperlinks / DCS / SOS / PM /
    APC / private DEC modes. v0.1.6 sweeps 58 raw eprint!/println!
    sites across 9 files (cmds/git/, cmds/cloud/, cmds/js/,
    cmds/python/, cmds/dotnet/, cmds/system/grep_cmd.rs,
    cmds/go/, core/runner.rs) so failure-path tool output goes
    through the sanitiser before reaching the agent.

  • Global TOML filter trust gate (H-3). ~/.config/rtk/filters.toml
    was previously loaded with no integrity check while the project-local
    .rtk/filters.toml was SHA-256-pinned. Closed: same trust store,
    same content-change-revokes semantics. New CLI: contextcrawler trust --global / untrust --global. Plus a TOCTOU fix
    (check_trust_bytes works on the already-read buffer instead of
    re-opening the path between hash and parse).

  • CI trust-override now requires platform-injected token (H-2).
    RTK_TRUST_PROJECT_FILTERS=1 previously trusted any env that set
    CI=true (settable by a hostile Makefile). Tightened to also require
    a platform-injected token (GITHUB_TOKEN, CI_JOB_TOKEN,
    BUILDKITE_AGENT_ACCESS_TOKEN, JENKINS_NODE_COOKIE/BUILD_TAG,
    CIRCLE_TOKEN/CIRCLE_BUILD_NUM, DRONE_BUILD_NUMBER). An in-repo
    Makefile can't fake these.

  • Tirith subprocess hardening (F-01 / F-02 / F-04 / F-05).

    • wait_timeout(8s) so a hung tirith check no longer freezes the
      agent's PreToolUse hook (was indefinite).
    • 4 MiB stdout cap.
    • Stdio::null() on stdin and stderr — the stderr pipe was never
      drained, so a noisy tirith could fill the 64 KiB kernel buffer
      and stall the wait_timeout until it fired.
    • JSON re-canonicalisation in log_downgrade before embedding in
      downgrades.jsonl — closes a log-injection vector where a
      hostile tirith could emit literal newlines to forge a top-level
      log record. Sentinel-on-parse-failure keeps the line valid JSON.
    • Same subprocess pattern applied to the security_cmd dashboard
      (fetch_audit_stats, fetch_doctor_status).
    • New dep: wait-timeout = "0.2".
  • Web command hardening (F-01 / F-02 / F-03 / F-04 / F-07).
    contextcrawler web now:

    • parses the URL with the url crate, rejects non-http(s)
      schemes (closes file:///etc/passwd local-read);
    • resolves the host and refuses if any resolved IP is in a blocked
      range (loopback / link-local / RFC1918 / ULA / CGN / multicast /
      unspecified / 0.0.0.0/8 / 198.18/15 benchmark / 240/4 future-use,
      plus IPv4-mapped-private-in-IPv6, plus Azure metadata
      168.63.129.16, plus AWS metadata 169.254.169.254 via link-local);
    • pins the validated IPs into curl via --resolve so curl can't
      independently re-resolve to a private IP between our check and
      the fetch (DNS-rebinding defence);
    • caps curl at --max-time 30, --max-filesize 64 MiB,
      --max-redirs 10;
    • uses -- to terminate flag parsing before the URL;
    • wraps stderr in strip_ansi.
    • New dep: url = "2".
    • Residual: multi-host-redirect (other.example after a redirect
      re-resolves DNS) tracked for v0.2.0.

Process & docs

  • Threat model: new docs/security/THREAT_MODEL.md. Documents
    assets, attack surfaces, threat actors, mitigations matrix,
    accepted limitations.

  • Module audits: per-file security audits for supply_chain_gate.rs
    (6 findings, no High/Critical), tirith_gate.rs (5 findings,
    closed), Commands::Web dispatch + web_cmd.rs (6 findings,
    closed), and combined jsonl_rewriter + session_compact_cmd

    • security_cmd (3 Mediums, 6 LOW/INFO). Subprocess-timeout
      class-audit conclusion in AUDIT_subprocess_timeout_class.md.
  • Quality baselines: docs/quality/BASELINE.md snapshots test
    count, clippy state, cargo audit result, unsafe blocks, unwrap
    distribution. deny.toml covers advisories, licenses, bans,
    sources (passes cargo deny check).

  • Release & rebase docs: docs/contributing/RELEASING.md
    (end-to-end runbook) + docs/contributing/UPSTREAM_REBASE.md
    (rtk-ai/rtk tracking strategy, what-to-take-vs-skip matrix,
    conflict resolution for hardened paths).

  • Roadmap: docs/ROADMAP.md — v0.1.x line, v0.2.0 candidates
    organised into security/process/capability buckets, tracking
    model.

  • Session record: docs/sessions/2026-05-15-overnight.md
    branch-by-branch summary with Codex round results and merge order.

Build & infrastructure

  • rust-version = "1.80" MSRV declared in Cargo.toml (covers
    Ipv6Addr::to_ipv4_mapped used by the SSRF block check).
  • New scripts: scripts/build-release.sh (with --verify and
    --install modes), scripts/bump-version.sh.
  • Proposed CI jobs documented in docs/quality/CI_JOBS_PROPOSED.md
    (release-leak gate + cargo deny check). Wire in when the
    .github/ gitignore situation is resolved.

Tests

1845+ passed across the merged tree (was 1828 at v0.1.5). 32 new
regression tests for argv-mode guard / OSC stripping / scrub /
SSRF block / CI trust check / JSONL canonicalisation.

Acknowledgements

Three rounds of Codex peer review on each fix branch. Every
finding tracked, every fix verified.

v0.1.5 — security release (3 downstream-only GHSAs)

14 May 16:30
4268cde

Choose a tag to compare

Security release. Three downstream-only fixes covering attack surfaces upstream rtk-ai/rtk has declined to address (rtk-ai/rtk#640 — "by design / tracking"). Each landed on its own feature branch with three rounds of Codex peer review. All three are tracked as draft GitHub Security Advisories pending coordinated publication.

Security fixes

GHSA-3mmh-86cm-g6w4 — shell-execution trust boundary

`contextcrawler err / test / summary` now parse the trailing command as argv and exec without a shell by default.

  • Shell metacharacters (`|` `;` `&` `<` `>` backtick `$` newline) cause rejection.
  • The first token is refused if it is a known shell (sh, bash, zsh, dash, ksh, fish, tcsh, csh, ash and their `.exe` variants; cmd, powershell, pwsh; busybox, toybox).
  • The first token is refused if it is an exec wrapper (env, nice, nohup, time, timeout, gtimeout, ionice, chroot, setpriv, unshare, taskset, stdbuf, script, xargs, watch, sudo, doas, su, runuser, pkexec) — these replace the process image with arg[1+], reintroducing the attack surface.
  • `--shell` is the documented escape hatch for users who actually want `sh -c` semantics.

Closes a prompt-injection → shell-injection chain where an agent could append a shell payload to a build-triage command and have it auto-execute.

GHSA-wjx4-ffxm-fxxp — terminal-escape stripping

`strip_ansi` previously matched CSI only. Extended to cover the full set of escapes that survive into LLM context:

  • CSI including DEC private modes (`ESC [ ? ...`)
  • OSC (window titles, palette changes, notifications)
  • OSC 8 terminal hyperlinks — visible text preserved, URL payload dropped (hyperlinks are a smuggling channel for instructions or exfil URLs)
  • DCS, SOS, PM, APC (`ESC P|X|^|_ ... ESC \`)
  • Standalone Fe/Fp/Fs escapes used by some pagers

Prisma command paths (`run_generate` / `run_migrate` / `run_db_push`) now wrap their failure-fallback `eprint!` calls in `strip_ansi`. A broader audit of remaining raw-emit paths (git / container / dotnet / python / pnpm / grep) is tracked as follow-up.

GHSA-2cwv-rr7c-2p4c — credential scrubbing before tracking-db insert

`scrub_secrets` runs at the INSERT boundary in `tracking.db` and redacts:

  • Credential-bearing flags: `--password`, `--token`, `--api-key`, `--secret`, `--access-key`, `--auth-token`, `--client-secret` (`=value`, space-value, and escape-aware quoted-value forms).
  • HTTP `Authorization: Bearer|Basic|Token|ApiKey ` headers.
  • URL-embedded credentials: `scheme://user:password@host`.
  • AWS access key IDs (`AKIA…`, `ASIA…`).
  • GitHub tokens: classic / OAuth / user-to-server / server / refresh PATs (`ghp_`, `gho_`, `ghu_`, `ghs_`, `ghr_`) and fine-grained PATs (`github_pat_…`).
  • Slack tokens (`xox[abprs]-…`).
  • mysql / mariadb `-p` (scoped to mysql / mariadb invocations only — `curl -p3000` and similar are not rewritten).

Without scrubbing, `gain --history` would feed credentials back into agent context on every read.

Tests

1828 passed, 0 failed across all three branches and the merged `develop`. Each fix landed with a dedicated regression-test block.

Install

```sh
cargo install --git https://github.com/thehoff/contextcrawler --tag v0.1.5 --locked
```

Full changelog: `CHANGELOG.md`

v0.1.4 — discover report rebrand + gate-design comment

14 May 14:57
bfe32f1

Choose a tag to compare

Mop-up release covering two surfaces v0.1.3 didn't touch.

Fixed

  • contextcrawler discover output still printed RTK — banner, stats line, empty-state hint, section header, column header, and per-row "Equivalent" cells all said RTK … / rtk git. Fixed by widening the existing display_rtk helper from the rewrite path to the discover report path (made pub, applied at the print site in src/discover/report.rs). Internal rtk_cmd: rule literals in rules.rs intentionally untouched — kept as lookup keys aligned with upstream rtk. (#7)

Documented

  • Added a design-intent comment to process_claude_payload clarifying that the Tirith and supply-chain gates only fire on the PermissionVerdict::Allow path. The gate is a safety net for explicitly-allowlisted command shapes, not a universal filter. Future investigators won't repeat the false alarm of "fresh probes don't appear in downgrades.jsonl". (#7)

Verified

  • cargo test — 1785 passed / 0 failed / 6 ignored (unchanged from v0.1.3)

Upgrade

cargo install --git https://github.com/thehoff/contextcrawler --tag v0.1.4 --force

🤖 Generated with Claude Code

v0.1.3 — finish rebrand polish

14 May 13:17
f3bdc35

Choose a tag to compare

Polish release rolling up the post-v0.1.2 rebrand work, surfaced via fresh-install devel-testing on macOS and Ubuntu.

Fixed (correctness)

  • Hook rewrite prefix — every rewrite emitted rtk <subcmd>, which failed with command not found: rtk on the documented install (only contextcrawler on PATH). Now emits contextcrawler <subcmd>. Legacy rtk prefix still recognized as already-rewritten passthrough — existing Bash(rtk:*) allowlist entries keep working. (#1)
  • contextcrawler -v (flag-only invocations) — CLI fallback path tried to exec args[0] when clap parsing failed; with args[0]=-v it produced [rtk: No such file or directory (os error 2)]. Now shows clap's "subcommand required" error; fallback prefix is [contextcrawler: ...]. (#4)

Fixed (cosmetic)

  • gain dashboard header: RTK Token SavingsContextCrawler Token Savings (#1)
  • init -g success output, codex config listing, agent hook output (cline / windsurf / kilocode / antigravity), uninstall messages, and init -g usage text — all consistently branded ContextCrawler / CONTEXTCRAWLER.md / @CONTEXTCRAWLER.md. (#3)
  • Sourced from RTK_MD / RTK_MD_REF constants so future renames stay centralized.

Added

  • Tirith gate status in contextcrawler init -g — reports whether the URL-security defense-in-depth gate is armed. Detect-only — does NOT modify ~/.bashrc / ~/.zshrc / fish config. The gate operates exclusively at the Claude Code PreToolUse hook layer via subprocess invocation of tirith check. (#2 superseded by #5)

Internal

  • rtk source-level identifiers (mod / struct / field names, rtk_cmd: rule values, rtk_equivalent classification keys) intentionally retained to keep upstream rebases against rtk-ai/rtk small.

Verified

  • cargo test — 1785 passed / 0 failed / 6 ignored
  • End-to-end smoke test of the rewrite hook on both macOS and Ubuntu
  • Empirical 61.4% token savings observed on Ubuntu after install (cargo test + find + git log mix)

Upgrade

cargo install --git https://github.com/thehoff/contextcrawler --tag v0.1.3 --force
contextcrawler init -g     # re-register hook + report Tirith gate status

After upgrading, fully /exit + relaunch any running Claude Code sessions so they pick up the new binary on PATH.

🤖 Generated with Claude Code

v0.1.2 — first release where init -g actually works

14 May 06:28
423e5cc

Choose a tag to compare

ContextCrawler v0.1.2 — first release where init -g actually works

Upgrade strongly recommended for anyone on v0.1.0 / v0.1.1.

TL;DR

The hook command was hardcoded to rtk hook claude, a binary that doesn't exist in this distribution. Every contextcrawler init -g since the rebrand has been writing a broken entry into settings.json — the hook fired, the rtk binary wasn't found, the bash shim gracefully degraded, and Claude Code received raw, un-filtered command output. ContextCrawler was effectively a no-op on every install. This release fixes that and migrates existing broken installs automatically.

A second-pass dual-agent code review (Codex + Claude) also surfaced two real security issues, both patched.

Fixed (critical)

  • Hook command rename. CLAUDE_HOOK_COMMAND, CURSOR_HOOK_COMMAND, and the Gemini wrapper script + Copilot hook JSON all referenced the non-existent rtk binary. Now write contextcrawler hook <agent>. Install-time matchers recognize the legacy string so existing broken entries get cleaned up on next init -g.

Fixed (security — from a second Codex + Claude review pass)

  • Session compactor path traversal in resolve_session_path. A bare session id like ../foo was joined under each project directory and the resulting candidate was returned if it resolved to a file. Now rejects ids containing /, \, or ...
  • Supply-chain cooldown bypass on future-dated publishes. The age guard age > -1d let packages "published" up to 24h ahead of now pass through both bounds entirely. Now clamps negative ages to zero — future dates are treated as just-published, which always trips the cooldown.

Fixed (UX)

  • Every [rtk] ... / run \rtk ...`warning string now readscontextcrawler` so pasted commands actually work.
  • ~/.claude/RTK.md and @RTK.md reference renamed to CONTEXTCRAWLER.md / @CONTEXTCRAWLER.md. Legacy files auto-migrate on next install.
  • contextcrawler gain no longer prefixes every row with the redundant rtk string in the by-command and recent-commands tables.
  • README/MIGRATING gained a pre-install callout warning users who previously ran upstream rtk or jee599/contextzip to clean out stale hook entries.

Other

  • CodeQL py/insecure-temporary-file fixed (tempfile.mktempNamedTemporaryFile).
  • CodeQL rust/cleartext-logging false-positive in rtk trust --list defused.
  • Unused-variable compiler warning silenced.

Install

cargo install --git https://github.com/thehoff/contextcrawler --tag v0.1.2 --locked --force
contextcrawler init -g

If you're already on v0.1.0 / v0.1.1, the --force is important — cargo install will skip the upgrade otherwise. Run init -g again after install to migrate RTK.mdCONTEXTCRAWLER.md and replace the broken rtk hook claude entry in your ~/.claude/settings.json. Then restart Claude Code.

Full changelog: CHANGELOG.md.