diff --git a/CHANGELOG.md b/CHANGELOG.md index 418f4f0..7d1549b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -11,6 +11,22 @@ checklist. ## [Unreleased] +## [0.7.0] - 2026-06-12 + +hands becomes scriptable. v0.6.0 gave Claude Login mode real sessions, audit, and guardrails; v0.7.0 exposes that machinery to scripts, cron jobs, and orchestrators — `hands run` no longer has to end in an interactive prompt, and the now-dual-mode audit log is queryable. Minor-version bump for the new flags. + +PRs in this release: #81 (one-shot scripting mode), #82 (audit list filters). + +### Added — one-shot scripting mode: `--once`, `--json`, exit-code contract (#81) + +- **`hands run --once ""`** runs a single task and exits — no "What next?" loop. The session pointer is saved before returning, so `hands run -c --once ""` chains multi-step automation across invocations (and reboots). +- **`hands run --json ""`** (implies `--once`) emits exactly one machine-readable JSON line on stdout: `{ ok, mode, result, turns, costUsd, tokens: {input, output}, sessionId? }`. Decorative output is silenced (the spinner included), and failures still emit one JSON line (`{ ok: false, error }`) — a parsing script never sees pretty text. The field set is stable: fields get added, never renamed. SDK mode honors `--json` too, with a `dryRun: true` marker when `--dry-run` forced it. +- **Exit codes**, pinned in tests: `0` task completed, `1` setup/config error, `2` task did not complete cleanly (max-turns cutoff or execution error, surfaced from the stream's result envelope). + +### Added — audit list filters: `--mode`, `--tool`, `--failed`, `--json` (#82) + +`hands audit list` can now answer "what did Claude Login bash do?" (`--mode cli --tool bash`) and "what went wrong?" (`--failed` — including guardrail blocks) and emit the result as JSON. Filtering never renumbers: printed indexes stay positions in the full log, because they're what `audit show/replay ` accept. CLI-mode entries carry a `[cli]` marker in listings; pre-0.6 entries count as `sdk`. + ## [0.6.0] - 2026-06-11 Claude Login mode grows up. The default, $0 mode used to run blind: `--dangerously-skip-permissions` with nothing but prompt text for protection, no audit trail ("only SDK mode is covered"), stderr string-scraping for the action display, and "session memory" that was 200-char task summaries re-injected into the system prompt. All four are gone — replaced with the claude CLI's real primitives. Minor-version bump for the new `--continue` flag and audit-log `mode` field. diff --git a/README.md b/README.md index 06478e4..23b9307 100644 --- a/README.md +++ b/README.md @@ -317,6 +317,8 @@ hands voice-setup # download whisper.cpp + speech model for --voice hands run "" # interactive computer control session hands run --continue # resume the most recent session (works after exit/reboot) hands run -c "" # resume and immediately run a follow-up task +hands run --once "" # one task, no interactive loop — for scripts and cron +hands run --json "" # one JSON result line on stdout (implies --once) hands run "" --voice # voice input via local whisper hands run "" --dry-run # plan + audit-log without executing (SDK mode) hands run "" -m claude-opus-4-6 # override model (this run only) @@ -326,10 +328,26 @@ hands run "" --persona thorough # named system-prompt override (SDK hands run "" --no-dario # skip the dario auto-detect probe at startup ``` +### Scripting & automation + +`--once` + `--json` + `-c` turn hands into a composable building block — cron jobs, CI steps, and orchestrators get `spawn → parse → branch` semantics: + +```bash +hands run --once "open the spreadsheet and add June's numbers" +hands run -c --once "now export it as PDF to ~/reports" # same conversation, next step + +result=$(hands run --json "check if the nightly build passed") +echo "$result" | jq -r .result +``` + +The JSON line is `{ ok, mode, result, turns, costUsd, tokens, sessionId? }` — stable field set, fields get added but never renamed. Failures emit `{ ok: false, error }` so a parser never sees pretty text. Exit codes: `0` task completed, `1` setup/config error, `2` task didn't complete cleanly. + ### Audit ```bash hands audit list --last 20 # recent tool calls (both modes) with replay index +hands audit list --mode cli --tool bash # filter: Claude Login bash calls only +hands audit list --failed --json # everything that went wrong, as JSON hands audit show # full JSON detail for one entry hands audit replay # dry-run replay of one tool call; --execute fires it ``` @@ -464,7 +482,7 @@ Env wins over config: `ANTHROPIC_BASE_URL`, `ANTHROPIC_API_KEY` (for SDK + dario - **Security issues** — email **security@askalf.org**, not a public issue. See [SECURITY.md](SECURITY.md). - **PRs welcome.** See [CONTRIBUTING.md](CONTRIBUTING.md) for build / test flow. Code style matches dario / agent / deepdive: small TypeScript, pure decision functions where possible, `strict: true`, no `any`, no unused imports. -Run `npm install && npm run build && npm test` to get a working dev tree (192 tests across 22 test files). +Run `npm install && npm run build && npm test` to get a working dev tree (204 tests across 24 test files). --- diff --git a/package.json b/package.json index 836f92d..e47b007 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "@askalf/hands", - "version": "0.6.0", + "version": "0.7.0", "description": "Cross-platform computer-use agent. Your LLM on your mouse, keyboard, and screen. Windows (PowerShell), macOS (open + osascript), Linux (xdotool / ydotool). Voice optional, safety guardrails. Routes through dario or any Anthropic-compat endpoint.", "main": "./dist/cli.js", "types": "./dist/cli.d.ts",