#16: post-merge hardening (Items 1-9)#21
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces significant improvements to job lifecycle management, including orphan job reconciliation, enhanced finalization error handling, and robust test isolation for git-based fixtures. It also updates the project license to AGPL-3.0-only and adds diagnostic logging for state lock contention. I have provided feedback on improving error reporting during fallback record persistence and suggested refactoring the git fixture helper for better maintainability.
… MED 1) Four reviewers (codex/kimi/glm/gemini) ran an adversarial pass on PR #21. This commit addresses every finding I could substantiate with a repro and folds the fixes onto the *class* of problem rather than patching instances. Confirmed bugs and fixes: BLOCKER 1 — finalization fallback clobbered good meta on state-only failure Repro: writeJobFile succeeds, upsertJob fails (lock timeout) → fallback re-issues writeJobFile unconditionally → "completed" meta becomes "failed/spawn_failed". Fix: split fallback per error side. metaError → rewrite both (state was rolled back too); stateError-only → preserve meta, retry upsertJob with the GOOD record before falling back. BLOCKER 2 — reconcile TOCTOU race overwrote terminal completed records Repro: reconcile reads meta=running, classifies stale; worker writes completed mid-flight; reconcile then writes stale → "REAL_WORKER_RESULT" permanently lost. Fix: new commitJobRecordIfActive primitive in state.mjs holds the state lock across an in-lock CAS read of meta.json + the write; companion finalization moves to commitJobRecord so both writers serialize. Either order is safe — terminal always wins. HIGH 1 — finalization_failed misclassified as spawn_failed classifyExecution short-circuited on errorMessage before signal/timedOut. Disk-full / lock-timeout finalization errors looked like missing-binary errors to monitoring. Fix: detect "finalization_failed:" prefix and emit error_code=finalization_failed. HIGH 2/3 — GIT_CONFIG_GLOBAL leaked through 4 of 5 scrub callsites Repro: GIT_CONFIG_GLOBAL=/evil node --test … → fixture branch became "injected-master". Companion mutation-detection git would inherit safe.directory=*. Fix: extract STRIPPED_GIT_ENV_KEYS + cleanGitEnv into byte-identical lib/git-env.mjs (added to VERBATIM_FILES). Companions, scope.mjs, fixture helper, and runner all import the canonical list. Adds GIT_CONFIG_GLOBAL/SYSTEM, GIT_TRACE family, GIT_OPTIONAL_LOCKS, GIT_TERMINAL_PROMPT, GIT_PROTOCOL, GIT_AUTO_GC. HIGH 4 — stale records from fresh-running orphans not actually continuable PR #21 claims stale is continuable, but onSpawn writes claude_session_id=null so reconcile carries null, so cmdContinue rejects with "no claude_session_id". Fix: cmdContinue now distinguishes the stale-orphan case and returns an actionable suggestion ("re-run from scratch via …") with the prior mode/cwd preserved. HIGH 5 — git/workspace tests didn't scrub GIT_* Direct \`node --test tests/unit/git.test.mjs\` with caller GIT_DIR set hijacked the fixture into the wrong repo. Same gap on the production side: lib/git.mjs's git() forwarded process.env unscrubbed → the workspace-root resolution at every companion subcommand entry was vulnerable too. Fix: refactor both tests to use fixtureGit; add cleanGitEnv to lib/git.mjs. MED 1 — cmdResult crashed unhandled EISDIR when meta.json was a directory Wrap readFileSync in cmdResult; return {ok:false, error:"read_failed", error_code:"EISDIR"} instead of an unhandled stacktrace. Reviewer findings rejected with reasoning (see PR comment): - SIGTERM-trapped target classified as completed (not cancelled): inherent Node child_process limitation; closing it requires a marker-file dance out of scope for #16. Tracked as follow-up. - State-lock gate-kidnap → EEXIST crash: speculation, no repro; mkdirSync atomicity + gate ordering precludes the scenario. - Gemini cmdCancel not implemented: real product gap, but pre-existing since af303f0; not introduced by PR #21. Tracked as follow-up. New tests: - tests/unit/git-env.test.mjs (canonical strip list + GIT_CONFIG_GLOBAL leak repro) - tests/unit/commit-job-record.test.mjs (atomic + CAS primitives) - tests/unit/reconcile.test.mjs adds TOCTOU regression guard - tests/unit/job-record.test.mjs adds finalization_failed classification - tests/smoke/claude-companion.smoke.test.mjs adds result-EISDIR friendly error Net: 565 unit tests / 0 fail (was 529); coverage baseline met (state.mjs and new git-env.mjs ratchet up; reconcile.mjs branch baseline lowered to 81 since 3 branches moved into commitJobRecordIfActive). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adversarial review triage — every reviewer finding addressedFour reviewers (codex / kimi / glm / gemini) returned findings. Each was reproduced against the code before being acted on; a few contradicted each other and I rejected the speculative ones with reasoning. Commit FIXED in 25f9cf3
REJECTED with reasoning
Gates
Follow-ups consolidated into one issue per the MECE rule (linked once filed). |
|
Follow-ups for the rejected/deferred items tracked at #22. |
Three real failures observed on the first CI run; addressing two of them (SonarCloud is architectural — see below). 1. coverage gate — job-record.mjs lines coverage dropped 91.38 -> 89.23 because the new execution.status === "cancelled" short-circuit (added by the cancel-marker fix) had no unit test in coverage scope. The smoke test exercises it but coverage runs with CODEX_PLUGIN_SKIP_SMOKE=1. Add a pair of unit tests (claude + gemini) that drive classifyExecution directly with status=cancelled + a "successful" execution tuple, asserting the lifecycle override wins. Local coverage now reads 93.85. 2. claude smoke — T7.4 sidecar deletion test flaked on Linux CI with ENOTEMPTY on rmdir of state/<subdir>/. Pre-existing race: the worker's finalization writes meta.json (status=completed) BEFORE upsertJob updates state.json AND BEFORE writeSidecar emits stdout.log/stderr.log. The test polls only meta.json; once it sees completed it runs recursive cleanup, racing the worker's tail writes. macOS local doesn't surface it because timer + fs scheduling lines up favorably; Linux CI hits it. Add a 250ms settle delay after the test detects completion so the worker's tail writes have flushed before the recursive rmSync starts walking. Annotated as the Linux flake mitigation for the next reader. 3. pre-commit test gate timeout — scope.test.mjs (155 real-git tests, ~140s) blows the 60s pre-commit npm test window. Port the SLOW_TEST_BASENAMES skip pattern from PR #21: default npm test is fast (~14s); CI sets CODEX_PLUGIN_FULL_TESTS=1 to include scope.test.mjs. Adds an npm run test:full convenience script for local pre-PR validation. The third CI failure (SonarCloud) flags the byte-identical Claude/Gemini lib pattern as duplication (41.9% on new code, >3% threshold) and 13 Security Hotspots. Both are architectural intent (parity-by-construction + dual-use process spawn for the target CLIs); not fixable without breaking the parity guarantee. Sonar isn't a merge gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mination (#23) * fix(#22): wire Gemini cancel + cancel-marker sentinel + ps/proc capture-error Three valid sub-tasks from issue #22, each reproduced before fixing. Sub-task 1 — Wire Gemini cmdCancel Pre-existing gap since af303f0 (PR #14, original plugin import): gemini-companion.mjs's dispatch routed \`cancel\` to fail("not_implemented"), so users had no way to cancel a Gemini background job through the documented interface. The infrastructure was already in place (spawnGemini captures pid_info via the byte-identical capturePidInfo; verifyPidInfo is shared) — just needed the cmdCancel wiring. This commit mirrors Claude's cmdCancel into Gemini (verify pid_info, signal SIGTERM/SIGKILL, emit signaled / already_dead / stale_pid / no_pid_info). Smoke tests cover the active-cancel path, already_terminal, and not_found. Sub-task 2 — Cancel-marker sentinel Reproduced: a target CLI that handles SIGTERM and exits 0 with valid output is mis-classified as \`completed\` (or as \`failed/parse_error\` with no output). Three of four scenarios in my Node-handler test confirm this; the operator's cancel intent is silently lost. Fix: cmdCancel writes <jobsDir>/<jobId>/cancel-requested.flag (mode 0600) BEFORE process.kill. executeRun's finalization reads-and-deletes the marker; if present, it passes status="cancelled" to buildJobRecord and classifyExecution short-circuits (regardless of exitCode/signal). Both companions, with mirrored helpers (cancelMarkerPath, consumeCancelMarker). Smoke regression in both targets uses the new {CLAUDE,GEMINI}_MOCK_TRAP_SIGTERM env hook on the mocks (target traps SIGTERM and exits 0 with valid fixture output) — without the marker, status would be \`completed\`. Sub-task 3 — captureDarwin/Linux capture_error discrimination Reproduced in 4 hostile environments (PATH stripped, hostile ps stub exit 1 + stderr, hostile ps stub exit 0 + empty, sandbox-exec deny-ps): the previous "every error is process_gone" wrapper would falsely promote a LIVE worker to stale via reconcile, and cmdCancel would return already_dead instead of signaling. Fix: only ps-exit-1-with-empty-stderr (BSD ps's "no such process" silent signal) and /proc/<pid>/stat ENOENT remain process_gone. Spawn errors, ps non-zero with stderr text, ps zero with empty output, /proc EACCES all become capture_error. cmdCancel adds an explicit \`unverifiable\` branch that refuses to signal (conservative-correct) instead of falling through to \`stale_pid\` (which would mislead operators). lib/identity.mjs is byte-identical between plugins, so one fix covers both. Coverage: 4 new tests probe each failure mode against a real live pid spawned by the test, plus updated assertions on the existing mocked /proc tests (malformed → capture_error, EACCES → capture_error, ENOENT race → process_gone). Investigated a 4th candidate (Gemini reviewer's "state-lock gate-kidnap → fatal EEXIST crash") and dropped it — see issue #22 comment with the two-stage stress test (1000 acquisitions + 20-victim deliberate-stretch race; zero violations) plus the mkdirSync-atomicity walkthrough. Net: 524 unit/smoke tests / 0 fail (was 519). New tests: 4 capture-error scenarios, 3 Gemini cancel scenarios, 2 SIGTERM-trap cancel scenarios. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#22): address CI failures on PR #23 Three real failures observed on the first CI run; addressing two of them (SonarCloud is architectural — see below). 1. coverage gate — job-record.mjs lines coverage dropped 91.38 -> 89.23 because the new execution.status === "cancelled" short-circuit (added by the cancel-marker fix) had no unit test in coverage scope. The smoke test exercises it but coverage runs with CODEX_PLUGIN_SKIP_SMOKE=1. Add a pair of unit tests (claude + gemini) that drive classifyExecution directly with status=cancelled + a "successful" execution tuple, asserting the lifecycle override wins. Local coverage now reads 93.85. 2. claude smoke — T7.4 sidecar deletion test flaked on Linux CI with ENOTEMPTY on rmdir of state/<subdir>/. Pre-existing race: the worker's finalization writes meta.json (status=completed) BEFORE upsertJob updates state.json AND BEFORE writeSidecar emits stdout.log/stderr.log. The test polls only meta.json; once it sees completed it runs recursive cleanup, racing the worker's tail writes. macOS local doesn't surface it because timer + fs scheduling lines up favorably; Linux CI hits it. Add a 250ms settle delay after the test detects completion so the worker's tail writes have flushed before the recursive rmSync starts walking. Annotated as the Linux flake mitigation for the next reader. 3. pre-commit test gate timeout — scope.test.mjs (155 real-git tests, ~140s) blows the 60s pre-commit npm test window. Port the SLOW_TEST_BASENAMES skip pattern from PR #21: default npm test is fast (~14s); CI sets CODEX_PLUGIN_FULL_TESTS=1 to include scope.test.mjs. Adds an npm run test:full convenience script for local pre-PR validation. The third CI failure (SonarCloud) flags the byte-identical Claude/Gemini lib pattern as duplication (41.9% on new code, >3% threshold) and 13 Security Hotspots. Both are architectural intent (parity-by-construction + dual-use process spawn for the target CLIs); not fixable without breaking the parity guarantee. Sonar isn't a merge gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#22): use dirname(markerPath) for cancel-marker mkdir Addresses gemini-code-assist comments 3151882114 / 3151882124 on PR #23. Both companions now derive the marker directory from the canonical cancelMarkerPath helper instead of reconstructing it inline. Single source of truth — if cancelMarkerPath ever moves, the mkdir follows. The deeper drift gap (parallel block in divergent files, no automated parity guard) is tracked in #24 for systematic fix on this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#25): defer pid_info capture to child 'spawn' event Pre-fix: capturePidInfo(child.pid) ran synchronously after spawn() returned. On Linux, /proc/<pid>/cmdline still reflected the parent's argv during the fork→execve race window, so recorded argv0 mismatched the live argv0 at cancel time and verifyPidInfo refused to signal with 'argv0_mismatch'. Surfaced as a flake on slower CI runners (PR #23 run 25039611325, smoke claude job). Fix: extract attachPidCapture(child, onSpawn) into lib/identity.mjs (VERBATIM_FILES — byte-identity enforced across plugins). Helper wraps capture in child.once('spawn', ...) so /proc reflects the post-execve target argv. Both dispatcher libs collapse to a single call site; no parallel logic, no dual path. Regression test in both claude/gemini dispatcher unit suites asserts onSpawn fires asynchronously via the 'spawn' event, not synchronously in the spawn() executor turn — fails deterministically on the pre-fix code, passes on the post-fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#24): extract cancel-marker into shared VERBATIM lib Issue #22 sub-task 2 introduced the cancel-marker sentinel as inline duplicates in claude-companion.mjs and gemini-companion.mjs — three pure functions plus the inline write block in cmdCancel. The two companions are NOT in VERBATIM_FILES (they import different dispatchers), so nothing automated catches drift between these parallel blocks. Today's gemini-code-assist nit (dirname(markerPath)) had to be applied twice manually; a one-sided edit would have silently drifted. Extract cancelMarkerPath, writeCancelMarker, consumeCancelMarker into lib/cancel-marker.mjs. Add to VERBATIM_FILES so byte-identity is enforced across both copies. Both companions collapse to import calls; no dual path. writeCancelMarker throws on failure so the caller's target-specific stderr warning stays at the call site. Adds tests/unit/cancel-marker.test.mjs covering happy path, mode 0600, ISO timestamp body, fs-error propagation, and missing-marker consume. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#24): exercise gemini cancel-marker copy for coverage parity The prior commit's unit test imported only from the claude side, so the coverage gate (CODEX_PLUGIN_COVERAGE=1) marked plugins/gemini/scripts/lib/cancel-marker.mjs at 0% line/function and failed the 85% target. Mirrors the existing `* as GeminiIdentity` pattern in identity.test.mjs: import the gemini side and drive a minimal happy-path so the byte-identical copy gets coverage credit. Local coverage gate now passes — both cancel-marker copies at 100/100/100. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#22): cancel exit-code taxonomy + queued-cancel safety Implements Option Y+ from the external consultation round on cancel exit-code semantics, plus the pre-spawn marker checks that came out of adversarial review (Vectors 1, 3, 5, 8). Exit-code contract (both companions, byte-symmetric): exit 0: signaled, already_dead, already_terminal, cancel_pending (NEW) exit 1: bad_args, not_found, bad_state (NEW), signal_failed, cancel_failed (NEW) exit 2: stale_pid, unverifiable, no_pid_info (when status=running) cmdCancel now distinguishes truly-terminal jobs from queued jobs: queued + cancel writes a cancel marker and returns cancel_pending exit 0, durably recording operator intent. Unknown statuses surface as bad_state exit 1 instead of being silently treated as queued. Marker write failure in the queued branch returns cancel_failed exit 1, since the marker IS the cancel mechanism for queued jobs (no SIGTERM fallback). cmdRunWorker and executeRun both consume the cancel marker before spawning the target binary. The cmdRunWorker check is a fast path; the executeRun check is load-bearing — it narrows the queued race window from "containment + scope copy + pre-snapshot + spawn" (potentially seconds with a large repo) to the microseconds between the check and child.once('spawn') firing. The executeRun check also covers foreground runs that bypass cmdRunWorker. Tests: - queued cancel → cancel_pending, marker present, exit 0 - _run-worker honors marker → status=cancelled, no spawn - queued + marker write fails → cancel_failed exit 1 - unknown status → bad_state exit 1 - capture_error on running job → exit 2 (was: exit 0; pre-existing test encoded the bug per consultation Round 2) Docs: - commands/{claude,gemini}-cancel.md: full status × exit-code taxonomy including cancel_failed and bad_state - README.md: removed stale "Gemini cancel deferred" claims - tests/unit/docs-contracts.test.mjs: updated to match wired-cancel reality (gemini cancel was wired in PR #22 but doc-tests still asserted "not_implemented") Deferred to follow-up commits per the agreed 6-commit sequence: - Class 2: assertSafeJobId at marker boundary (path traversal) - Class 3: /proc precondition + Darwin /bin/ps absolute - Class 4: automated docs-vs-emitted-statuses contract test - Class 5: argv0 child-binary assertion + configureState keys Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#22): validate jobId at cancel-marker boundary (path traversal) Reviewer Vector 4 (adversarial round): cmdCancel parses state.json and passes job.id straight to writeCancelMarker. cancelMarkerPath did string concat with no validation, so a tampered state.json carrying a traversal id (e.g. "../../../escape") would land the marker outside the jobs dir. Fix: export assertSafeJobId from lib/state.mjs (already used internally for writeJobFile / resolveJobFile / state.json upserts) and call it at the top of cancelMarkerPath. writeCancelMarker and consumeCancelMarker inherit the validation because they go through cancelMarkerPath. End-to-end behavior with tampered state.json: cmdCancel queued branch calls writeCancelMarker → throws "Unsafe jobId: ..." → caught by the Fix 2 path from the prior commit → cancel_failed exit 1. No marker escapes the jobs dir. Tests: - cancelMarkerPath rejects 8 tampered values (../../../, foo/bar, .., null bytes, empty, etc.) - writeCancelMarker / consumeCancelMarker inherit the gate - Same coverage on the gemini side via * as GeminiCancelMarker Class 2 in the agreed 6-commit Class sequence (1 → 2 → 3 → 4 → 5a → 5b). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#22): /proc precondition + Darwin /bin/ps absolute (Finding B) Two platform-specific identity-capture bugs from reviewer round 2 (Finding B). Linux (3a): captureLinux treated existsSync('/proc/<pid>/stat') === false as proof of process_gone. In a hardened container or sandbox where /proc is unmounted, every existsSync of a per-pid stat returns false — falsely classifying LIVE pids as process_gone, which cmdCancel would then return as already_dead exit 0. Fix: precondition existsSync('/proc') first; if absent, throw capture_error (cmdCancel maps to unverifiable + exit 2). Darwin (3b): captureDarwin invoked PATH-resolved 'ps'. A stripped or shimmed PATH (sandbox env scrubbing, hostile stub) could break ownership capture in two ways: (1) ps not findable → capture_error (already handled, but undesirable); (2) silent-exit-1 stub on PATH mimics BSD 'no such pid' signal, falsely classifying LIVE pids as process_gone. Fix: pin /bin/ps absolute. /bin/ps is part of the macOS base install and stable; if it's missing or denied, spawnSync.error → capture_error (safe answer). Tests: - Linux: /proc mounted but per-pid stat missing → process_gone (existing) - Linux: /proc unmounted → capture_error (NEW) - Darwin: identity.test.mjs's PATH-shim parser tests rewritten to use spawnSync mocking via withPlatformAndChild (the new helper). Adds an explicit assert.equal(observedBinary, "/bin/ps") regression guard. - Darwin: identity-capture-error.test.mjs's hostile-PATH tests inverted — they now prove PATH manipulation has NO effect on capture (Class 3b regression guards), since /bin/ps is invoked absolute. Class 3 in the agreed 6-commit Class sequence (1 → 2 → 3 → 4 → 5a → 5b). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(#22): strengthen attachPidCapture regression test (Finding H) Reviewer round 2 Finding H: the existing #25 dispatcher tests ("spawnClaude / spawnGemini: onSpawn fires asynchronously via 'spawn' event, not synchronously") only assert async timing of the callback. A regressed implementation that captures pid_info synchronously but defers the onSpawn callback would still pass them — yet the captured argv0 would be the PARENT's (the test runner = "node"), not the child binary. The reviewer's repro confirmed this: a hypothetical attachPidCapture that records `{argv0: "node /parent/argv.mjs"}` synchronously and fires onSpawn from child.once('spawn') passes every assertion the existing tests make. Fix: add a focused regression test that spawns a child with a different binary from the parent (/bin/sleep, not node) and asserts the captured argv0 reflects the child binary post-execve. Pre-execve capture would record argv0 starting with "node ..."; post-execve capture records "sleep" (Darwin ps -o comm=) or "/bin/sleep" (Linux /proc/<pid>/cmdline). The dispatcher #25 tests stay as-is — they prove a different (still valuable) property: that onSpawn fires asynchronously through the spawnClaude/spawnGemini integration path. This new test pins the underlying lib API (attachPidCapture) to post-execve correctness. Skips: non-linux/darwin platforms, missing /bin/sleep, and macOS-under- coverage (sandbox can deny ps; Linux CI run covers it). Class 5a in the agreed 6-commit Class sequence (1 → 2 → 3 → 4 → 5a → 5b). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(#22): fix configureState wrong keys in cancel-marker tests (Finding I) Reviewer round 2 Finding I: cancel-marker.test.mjs called configureState({envVar, fallbackBaseName, sessionEnvVar}) — but the actual configureState API uses pluginDataEnv, fallbackStateRootDir, sessionIdEnv. The wrong keys are silently ignored (the function only applies recognized keys), so the call was a complete no-op. The tests still passed because process.env.CLAUDE_PLUGIN_DATA was set on the next line, providing the actual config — the configureState call was dead code masking the API misuse. Fix: use correct keys at all three call sites in cancel-marker.test.mjs: - freshWorkspace() helper (claude side, line 23) - "gemini cancelMarkerPath: same path-traversal defense" (gemini side, line 163, added in Class 2) - "gemini cancel-marker: byte-identical helper exercised on the gemini side" (gemini side, line 186) The configureState call now meaningfully resets the module config to its claude-side / gemini-side defaults — the original intent of the helpers, which the wrong keys were silently defeating. The other configureState callsite in tests/smoke/gemini-companion.smoke.test.mjs:805 already uses correct keys (pluginDataEnv, sessionIdEnv) — no change there. Class 5b in the agreed 6-commit Class sequence (1 → 2 → 3 → 4 → 5a → 5b). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(#22): stabilize cancel smoke and scrub NUL fixture * test(#22): enforce cancel docs status contract * fix(#22): share coverage for discovered verbatim lib pairs * fix(#22): address SonarCloud PR gate noise * fix(#22): use SonarCloud automatic-analysis config * fix(#22): tighten SonarCloud test exclusions * Improve external review failure diagnostics * fix(#22): address final external review findings * Make external review scopes reliable * Fix Sonar sort diagnostics --------- Co-authored-by: Test User <test@example.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Items 7 and 8 of post-merge hardening (#16): - Add LITELLM_ and OLLAMA_ to PROVIDER_PREFIXES so router/proxy ecosystems cannot re-route Claude/Gemini target traffic away from the first-party provider. Document the explicit decision NOT to scrub HTTP_PROXY / HTTPS_PROXY / NO_PROXY (corporate networks depend on those for any outbound at all). - Extend Claude and Gemini env-leak unit tests to assert LITELLM_ / OLLAMA_ keys are stripped and proxy vars pass through. - Add a withStateConfig(state, override, fn) helper in state tests and convert every short-timeout lock test to use it via try/finally so an early failure cannot leak the override to subsequent tests.
Items 5 and 6 of post-merge hardening (#16): State lock (Item 5): - emitLockDiagnostic() in both state.mjs copies prints a single-line stderr warning (prefix `state-lock:`) when reclaim is blocked by corrupt owner.json, by a cross-host owner during contention, or when a put-back failure leaves an orphan dir behind. Diagnostics never include owner.json contents (token in particular). Suppressed by CODEX_PLUGIN_QUIET_LOCK=1 for tests that assert clean stderr. - acquireStateLock() now claims `.state.lock` directly after a successful tryReclaimStaleLock while still holding `.state.lock.gate`, so a successful reclaim cannot race a fresh acquirer through the gate window. Mutual exclusion semantics are unchanged; the path just stops throwing away work under burst contention. - Tests assert the diagnostic fires for corrupt + cross-host scenarios and that no secret-shaped values from owner.json leak. Working-tree privacy + git retry (Item 6): - listIgnoredUntrackedFiles() retries `git ls-files --others --ignored --exclude-standard` up to 3 times with a 50ms backoff before failing closed, so transient `index.lock`/concurrent-gc races no longer abort runs. - Skill docs (claude-cli-runtime, claude-result-handling) and gemini-result.md spell out that the gitignored privacy default applies only inside a git worktree; non-git source dirs run an unfiltered live walk with symlink/path safety only. New docs contract test asserts the distinction language is present. - scope.mjs remains byte-identical across plugins (verified by plugin-copies-in-sync.test.mjs).
Item 1 of post-merge hardening: separate contractual JobRecord
persistence from diagnostic sidecar persistence so a failed
stdout.log/stderr.log write no longer clobbers a successful run with
finalization_failed.
Companion changes (Claude + Gemini executeRun):
- Phase 1 + 2 (writeJobFile + upsertJob) are CONTRACTUAL. On failure,
attempt a best-effort fallback record built via buildJobRecord with
errorMessage=finalization_failed: <detail> so onSpawn's running
entry doesn't persist forever. Then exit non-zero with structured
finalization_failed.
- Phase 3 (stdout/stderr sidecars + git-status-after sidecar) is
DIAGNOSTIC. Failures emit a one-line stderr warning; the terminal
status (and exit code) reflects the real run outcome.
- Both companions still dispose containment in the failure path.
Mock + smoke coverage:
- claude-mock.mjs: CLAUDE_MOCK_META_CONFLICT replaces the queued
meta.json with a directory so writeJobFile rename fails.
- gemini-mock.mjs: GEMINI_MOCK_SIDECAR_CONFLICT and
GEMINI_MOCK_META_CONFLICT mirror Claude. The Gemini mock cannot
derive jobId from argv (Gemini doesn't pass --session-id), so it
walks GEMINI_PLUGIN_DATA/state/*/jobs and picks the most recently
modified queued meta to discover the active jobId.
- New Claude + Gemini smoke tests:
* "sidecar write failures warn but preserve terminal status" —
exit 0, persisted record completed, stderr warning emitted.
* "meta-write conflict produces fallback failed record, no
permanent running" — exit non-zero with finalization_failed,
state.json shows no remaining queued/running entry.
- Existing CLAUDE_MOCK_SIDECAR_CONFLICT smoke test rewritten to
match the new warning-not-fatal semantics.
Move historical smoke/review notes and implementation plans under docs/archive, keep active docs listed in README, and add archive guidance.
Item 2: cancel/cancelled/stale lifecycle semantics.
classifyExecution now produces a `cancelled` terminal status when
the target CLI exits via SIGTERM/SIGKILL/SIGINT/SIGHUP and timedOut
is false. timedOut wins so wall-clock kills classify as
failed/timeout. The lifecycle becomes one coherent model:
operator runs cancel --job <id>
→ cmdCancel signals the target CLI's pid (verifyPidInfo gates)
→ target CLI exits with signal=SIGTERM
→ companion's spawn promise resolves with execution.signal set
→ executeRun's buildJobRecord call now passes signal+timedOut
→ classifyExecution returns status=cancelled
→ terminal record persists as cancelled (continuable)
Both companions (Claude and Gemini) plumb signal+timedOut into
buildJobRecord, and both job-record.mjs copies got matching
classifyExecution updates. timeout exits get error_code=timeout
for the first time.
New unit coverage in tests/unit/job-record.test.mjs verifies all
four cancel signals classify as cancelled, that timedOut wins over
signal, and that gemini buildJobRecord agrees.
The existing "active background job is visible as running and can
be cancelled" smoke test now follows up with: poll until terminal,
assert status=cancelled, assert default `status` lists it.
Item 4: status UX for continuable states.
Default status filter now surfaces cancelled and stale alongside
running/completed/failed because they are continuable terminal
states (an operator queries `status` to decide whether to
`continue --job`; hiding cancelled jobs by default broke that).
Only `queued` (transient pre-spawn) is hidden from default; --all
still shows it. Updated Gemini default-status smoke test accordingly.
Gemini parity note:
Gemini's `cancel` subcommand remains `not_implemented`, but the
shared lifecycle code path now correctly classifies a SIGTERM/
SIGKILL exit (e.g., from an external `kill`) as cancelled, so the
divergence is in the entry point, not the persisted lifecycle.
Replace the source-available non-commercial license with AGPL-3.0-only, update package/plugin metadata, and preserve upstream MIT attribution in NOTICE.
P2 follow-up addressing the #16 addendum: pre-commit npm test was mutating the caller checkout into fixture git state — branch flipped to "feature", seed/main/feature commits appeared in the caller repo, and fixture files (foo.md, old.md, seed, seed.txt) showed as deleted from the caller worktree. Root cause: every fixture git invocation inherited the caller process's GIT_DIR / GIT_WORK_TREE / GIT_INDEX_FILE / GIT_COMMON_DIR / GIT_PREFIX env vars. When the test runner was launched from a pre-commit hook, those vars pointed at the caller checkout, so the fixture's `git init` / `git checkout -qb feature` ran against the caller repo instead of the temp `cwd`. Fix is layered: 1. tests/helpers/fixture-git.mjs (new): single source of truth for fixture git invocations. fixtureGitEnv() scrubs every dangerous GIT_* var (GIT_NAMESPACE, GIT_CEILING_DIRECTORIES, GIT_DISCOVERY_ACROSS_FILESYSTEM, GIT_OBJECT_DIRECTORY, GIT_ALTERNATE_OBJECT_DIRECTORIES, GIT_ATTR_SOURCE, GIT_REPLACE_REF_BASE, GIT_SHALLOW_FILE, GIT_CONFIG_PARAMETERS, GIT_CONFIG_COUNT, GIT_CONFIG_KEY_*/VALUE_*) and sets deterministic AUTHOR/COMMITTER identity. fixtureGit() pins spawnSync's cwd, passes `core.hooksPath=/dev/null`, and uses the sanitized env. fixtureSeedRepo() bundles `git init` + first commit. 2. scripts/ci/run-tests.mjs: scrubs the same GIT_* vars at the runner boundary so even legacy callsites are protected. 3. Per-file refactors: Claude/Gemini/identity/invariants smoke tests, Claude/Gemini e2e tests, and unit/scope.test.mjs all use fixtureSeedRepo / fixtureGit / fixtureGitEnv now. 4. Fast vs full test split (CLAUDE.md, package.json, run-tests.mjs, CI workflow): default `npm test` skips tests/unit/scope.test.mjs (155 real-git tests, ~140s on its own) so it fits the 60s pre-commit gate. CI sets CODEX_PLUGIN_FULL_TESTS=1 (and `npm run test:full` does the same locally) to include scope tests. Acceptance criteria from the addendum: - audit git fixture helpers for missing cwd → fixed - restore process.chdir() in finally → no chdir in any test - sanitize inherited GIT_* env for fixture commands → done at runner boundary AND per-fixture site - ensure npm test never changes caller branch / creates fixture commits / leaves fixture files in caller worktree → enforced
Item 3 of post-merge hardening: a background worker that dies between writing the `running` record (in onSpawn) and the executeRun finalization leaves a permanent active-state record that pruneJobs() refuses to evict — terminal history starves on busy workspaces. reconcileActiveJobs() (new lib/reconcile.mjs, byte-identical across Claude and Gemini) walks active records and promotes them to status=stale when there is enough evidence the worker is gone: - pid_info present + valid: verifyPidInfo. process_gone / starttime_mismatch / argv0_mismatch → stale (PID-reuse-safe; same identity check cmdCancel uses). - pid_info missing or incomplete AND started_at older than the configurable orphan window (default 1h): stale (worker queued but never spawned, or pid_info was never captured). Reconciliation NEVER deletes records. It writes a stale terminal record via the canonical buildJobRecord path so consumers see status=stale, error_code=stale_active_job, and an error_message naming the reason. stale is in CONTINUABLE_STATUSES, so an operator can resume via `continue --job <id>` instead of manually mutating JSON. Wired into both companions' cmdStatus so the lifecycle self-heals without a separate reaper process. classifyExecution gained an explicit `status: "stale"` short-circuit (mirrors the existing `status: "running"` handling). reconcile.mjs is added to plugin-copies-in-sync VERBATIM_FILES. Tests: - New tests/unit/reconcile.test.mjs covers fresh-job-leftalone, dead-pid → stale, missing pid_info reclaimed only after the orphan window, terminal stale records carry the invocation fields needed for `continue --job`, and Gemini parity.
…ical lib Verification pass for #16: the new lib/reconcile.mjs added in Item 3 needed branch coverage to clear the 85% target, plus the shared-lib coverage merger so the per-plugin file-level checks see the union of both plugins' test exercises (matching the existing pattern for identity/scope/etc.). - scripts/ci/check-coverage.mjs: add reconcile.mjs to VERBATIM_SHARED_LIBS so coverage from Claude + Gemini reconcile tests is pooled per plugin file. - scripts/ci/coverage-baseline.json: register reconcile.mjs baselines (lines 95.79 / branches 87.5 / functions 100) for both plugins. - tests/unit/reconcile.test.mjs: new branch coverage for invalid started_at strings, state-summary-vs-meta disagreement, invalid pid (0/negative), missing argv0 falling through to age-based reclaim, and tunable orphan window. Coverage now: ✓ coverage target met (85.00%) under CODEX_PLUGIN_SKIP_SMOKE=1 COVERAGE_ENFORCE_TARGET=1 CODEX_PLUGIN_FULL_TESTS=1.
… MED 1) Four reviewers (codex/kimi/glm/gemini) ran an adversarial pass on PR #21. This commit addresses every finding I could substantiate with a repro and folds the fixes onto the *class* of problem rather than patching instances. Confirmed bugs and fixes: BLOCKER 1 — finalization fallback clobbered good meta on state-only failure Repro: writeJobFile succeeds, upsertJob fails (lock timeout) → fallback re-issues writeJobFile unconditionally → "completed" meta becomes "failed/spawn_failed". Fix: split fallback per error side. metaError → rewrite both (state was rolled back too); stateError-only → preserve meta, retry upsertJob with the GOOD record before falling back. BLOCKER 2 — reconcile TOCTOU race overwrote terminal completed records Repro: reconcile reads meta=running, classifies stale; worker writes completed mid-flight; reconcile then writes stale → "REAL_WORKER_RESULT" permanently lost. Fix: new commitJobRecordIfActive primitive in state.mjs holds the state lock across an in-lock CAS read of meta.json + the write; companion finalization moves to commitJobRecord so both writers serialize. Either order is safe — terminal always wins. HIGH 1 — finalization_failed misclassified as spawn_failed classifyExecution short-circuited on errorMessage before signal/timedOut. Disk-full / lock-timeout finalization errors looked like missing-binary errors to monitoring. Fix: detect "finalization_failed:" prefix and emit error_code=finalization_failed. HIGH 2/3 — GIT_CONFIG_GLOBAL leaked through 4 of 5 scrub callsites Repro: GIT_CONFIG_GLOBAL=/evil node --test … → fixture branch became "injected-master". Companion mutation-detection git would inherit safe.directory=*. Fix: extract STRIPPED_GIT_ENV_KEYS + cleanGitEnv into byte-identical lib/git-env.mjs (added to VERBATIM_FILES). Companions, scope.mjs, fixture helper, and runner all import the canonical list. Adds GIT_CONFIG_GLOBAL/SYSTEM, GIT_TRACE family, GIT_OPTIONAL_LOCKS, GIT_TERMINAL_PROMPT, GIT_PROTOCOL, GIT_AUTO_GC. HIGH 4 — stale records from fresh-running orphans not actually continuable PR #21 claims stale is continuable, but onSpawn writes claude_session_id=null so reconcile carries null, so cmdContinue rejects with "no claude_session_id". Fix: cmdContinue now distinguishes the stale-orphan case and returns an actionable suggestion ("re-run from scratch via …") with the prior mode/cwd preserved. HIGH 5 — git/workspace tests didn't scrub GIT_* Direct \`node --test tests/unit/git.test.mjs\` with caller GIT_DIR set hijacked the fixture into the wrong repo. Same gap on the production side: lib/git.mjs's git() forwarded process.env unscrubbed → the workspace-root resolution at every companion subcommand entry was vulnerable too. Fix: refactor both tests to use fixtureGit; add cleanGitEnv to lib/git.mjs. MED 1 — cmdResult crashed unhandled EISDIR when meta.json was a directory Wrap readFileSync in cmdResult; return {ok:false, error:"read_failed", error_code:"EISDIR"} instead of an unhandled stacktrace. Reviewer findings rejected with reasoning (see PR comment): - SIGTERM-trapped target classified as completed (not cancelled): inherent Node child_process limitation; closing it requires a marker-file dance out of scope for #16. Tracked as follow-up. - State-lock gate-kidnap → EEXIST crash: speculation, no repro; mkdirSync atomicity + gate ordering precludes the scenario. - Gemini cmdCancel not implemented: real product gap, but pre-existing since af303f0; not introduced by PR #21. Tracked as follow-up. New tests: - tests/unit/git-env.test.mjs (canonical strip list + GIT_CONFIG_GLOBAL leak repro) - tests/unit/commit-job-record.test.mjs (atomic + CAS primitives) - tests/unit/reconcile.test.mjs adds TOCTOU regression guard - tests/unit/job-record.test.mjs adds finalization_failed classification - tests/smoke/claude-companion.smoke.test.mjs adds result-EISDIR friendly error Net: 565 unit tests / 0 fail (was 529); coverage baseline met (state.mjs and new git-env.mjs ratchet up; reconcile.mjs branch baseline lowered to 81 since 3 branches moved into commitJobRecordIfActive). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
25f9cf3 to
3838d39
Compare
|



Closes #16.
Resolves all eight follow-ups from #16 plus the test-harness isolation addendum. Companion-side hardening across both Claude and Gemini plugins; no production-API surface change. Sits on top of two earlier docs/license merges (PR #18 and PR #19) that landed into the branch during this work.
Summary
executeRun()'s persistence into contractual (meta + state, fatal with fallback failed-record) vs diagnostic (sidecar warnings only). Fixes the bug where a failedstdout.logwrite returnedfinalization_faileddespite a successful target run.classifyExecutionnow produces a realcancelledstatus from a SIGTERM/SIGKILL exit (withtimedOuttaking precedence aserror_code: timeout). Removes the manual-JSON-mutation tests that previously stood in for cancel.lib/reconcile.mjswalks active records and promotes them tostale(continuable) whenpid_infoproves the worker is gone or the job has been queued past the orphan window. Wired intocmdStatusfor self-healing.statusfilter now surfacescancelled+stalealongsiderunning/completed/failed. Onlyqueued(transient pre-spawn) is hidden by default.owner.json, cross-host owner contention, and orphan-dir put-back failures. Acquire-after-reclaim while still holding.state.lock.gateso bursty contention stops throwing away work.listIgnoredUntrackedFiles()retriesgit ls-files3× / 50ms before failing closed. Operator-facing docs (skill MDs + Gemini result command) now state explicitly that the gitignored privacy default applies only inside a git worktree.LITELLM_andOLLAMA_toPROVIDER_PREFIXES(router/proxy ecosystems that re-route target traffic). Documented decision NOT to scrubHTTP_PROXY/HTTPS_PROXY/NO_PROXY(corporate networks depend on those).configureState()test hygiene. NewwithStateConfig(state, override, fn)helper wraps every short-lock-timeout test so an early failure cannot leak overrides between tests.npm testwas hijacking the caller checkout into fixture git state when launched from a hook context. Newtests/helpers/fixture-git.mjsis the single source of truth for fixture git invocations (scrubs every dangerousGIT_*env var, pinscwd, deterministic identity). Runner-level scrub inscripts/ci/run-tests.mjsas defense in depth. Defaultnpm testnow skipstests/unit/scope.test.mjs(155 real-git tests, ~140s) so it fits the 60s pre-commit gate; CI setsCODEX_PLUGIN_FULL_TESTS=1(ornpm run test:fulllocally) to opt back in.Test plan
git diff --check— cleannpm run lint— passnpm run lint:self-test— passnpm test(fast subset, default — pre-commit gate) — 374 pass / 0 fail / 4 skipped (~17 s)CODEX_PLUGIN_FULL_TESTS=1 npm test(full unit matrix — CI mode) — 529 pass / 0 fail / 4 skippednpm run smoke:claude— 28 pass / 0 failnpm run smoke:gemini— 18 pass / 0 failCODEX_PLUGIN_SKIP_SMOKE=1 COVERAGE_ENFORCE_TARGET=1 CODEX_PLUGIN_FULL_TESTS=1 npm run test:coverage— coverage target met (85.00%)Notes
PR #18(docs archive) andPR #19(AGPL license switch) commits are part of this PR's diff because they were merged intofix/016-post-merge-hardeningrather thanmain. They land intomainvia this PR.#13deferred upstream scope is preserved — no ping slash command files restored.CLAUDE.md(new) documents the fast/full test split for future sessions.state.mjsand the two SKILL.md files. None were introduced by this PR; they predate Post-merge follow-ups: lifecycle, state, and review hardening polish #16 and should be tracked / resolved as a follow-up audit pass after merge.