Skip to content

#22: gemini cmdCancel + cancel-marker sentinel + capture_error discrimination#23

Merged
seungpyoson merged 21 commits into
mainfrom
fix/022-followups
Apr 29, 2026
Merged

#22: gemini cmdCancel + cancel-marker sentinel + capture_error discrimination#23
seungpyoson merged 21 commits into
mainfrom
fix/022-followups

Conversation

@seungpyoson
Copy link
Copy Markdown
Owner

@seungpyoson seungpyoson commented Apr 28, 2026

Closes #22. Closes #24. Closes #25. Closes #26. Closes #28.

Summary

Three sub-tasks from #22, each reproduced before fixing. For reviewers in a hurry, the parts that need adversarial attention are #2 and #3. #1 is a mechanical mirror of existing Claude code.

# Sub-task Risk to reviewer Files
1 Wire Gemini cmdCancel Low — pure mirror. Replicates claude-companion.mjs's existing cmdCancel (verify pid_info → SIGTERM/SIGKILL → emit signaled/already_dead/stale_pid/no_pid_info/unverifiable). Any bug here also exists in the unmodified Claude original. plugins/gemini/scripts/gemini-companion.mjs (+~70 lines), 3 smoke tests
2 Cancel-marker sentinel HIGH — novel race-prone pattern. cmdCancel writes <jobsDir>/<jobId>/cancel-requested.flag (mode 0600) BEFORE process.kill. executeRun's finalization reads-and-deletes the marker and forces status=cancelled. New classifyExecution short-circuit on execution.status === "cancelled". Both companions, both lib/job-record.mjs, both mocks (MOCK_TRAP_SIGTERM), 2 smoke tests
3 capture_error/process_gone discrimination HIGH — semantic shift in verifyPidInfo. captureDarwin/captureLinux previously wrapped every failure as process_gone. New rule: only ps exit 1 + empty stderr and /proc/<pid>/stat ENOENT remain process_gone. Spawn errors, ps exit non-zero with stderr, ps exit 0 with empty output, /proc EACCES become capture_error. cmdCancel adds an unverifiable branch. lib/identity.mjs (byte-identical pair), 4 new unit tests, plus updated assertions on existing mocked /proc tests

Why each was filed

A 4th candidate from the original adversarial review on PR #21 (state-lock gate-kidnap → fatal EEXIST) was investigated and dropped — see issue #22 for the two-stage stress test (1000 + 20-victim race) and the mkdirSync-atomicity walkthrough.

Relationship to PR #21

PR #23 is branched from origin/main, not from PR #21. Practical implications for reviewers:

Test plan

🤖 Generated with Claude Code

…re-error

Three valid sub-tasks from issue #22, each reproduced before fixing.

Sub-task 1 — Wire Gemini cmdCancel
  Pre-existing gap since af303f0 (PR #14, original plugin import):
  gemini-companion.mjs's dispatch routed \`cancel\` to fail("not_implemented"),
  so users had no way to cancel a Gemini background job through the
  documented interface. The infrastructure was already in place (spawnGemini
  captures pid_info via the byte-identical capturePidInfo; verifyPidInfo
  is shared) — just needed the cmdCancel wiring. This commit mirrors
  Claude's cmdCancel into Gemini (verify pid_info, signal SIGTERM/SIGKILL,
  emit signaled / already_dead / stale_pid / no_pid_info). Smoke tests
  cover the active-cancel path, already_terminal, and not_found.

Sub-task 2 — Cancel-marker sentinel
  Reproduced: a target CLI that handles SIGTERM and exits 0 with valid
  output is mis-classified as \`completed\` (or as \`failed/parse_error\` with
  no output). Three of four scenarios in my Node-handler test confirm
  this; the operator's cancel intent is silently lost.
  Fix: cmdCancel writes <jobsDir>/<jobId>/cancel-requested.flag (mode 0600)
  BEFORE process.kill. executeRun's finalization reads-and-deletes the
  marker; if present, it passes status="cancelled" to buildJobRecord and
  classifyExecution short-circuits (regardless of exitCode/signal). Both
  companions, with mirrored helpers (cancelMarkerPath, consumeCancelMarker).
  Smoke regression in both targets uses the new {CLAUDE,GEMINI}_MOCK_TRAP_SIGTERM
  env hook on the mocks (target traps SIGTERM and exits 0 with valid
  fixture output) — without the marker, status would be \`completed\`.

Sub-task 3 — captureDarwin/Linux capture_error discrimination
  Reproduced in 4 hostile environments (PATH stripped, hostile ps stub
  exit 1 + stderr, hostile ps stub exit 0 + empty, sandbox-exec deny-ps):
  the previous "every error is process_gone" wrapper would falsely promote
  a LIVE worker to stale via reconcile, and cmdCancel would return
  already_dead instead of signaling.
  Fix: only ps-exit-1-with-empty-stderr (BSD ps's "no such process" silent
  signal) and /proc/<pid>/stat ENOENT remain process_gone. Spawn errors,
  ps non-zero with stderr text, ps zero with empty output, /proc EACCES
  all become capture_error. cmdCancel adds an explicit \`unverifiable\`
  branch that refuses to signal (conservative-correct) instead of falling
  through to \`stale_pid\` (which would mislead operators). lib/identity.mjs
  is byte-identical between plugins, so one fix covers both. Coverage:
  4 new tests probe each failure mode against a real live pid spawned by
  the test, plus updated assertions on the existing mocked /proc tests
  (malformed → capture_error, EACCES → capture_error, ENOENT race →
  process_gone).

Investigated a 4th candidate (Gemini reviewer's "state-lock gate-kidnap →
fatal EEXIST crash") and dropped it — see issue #22 comment with the
two-stage stress test (1000 acquisitions + 20-victim deliberate-stretch
race; zero violations) plus the mkdirSync-atomicity walkthrough.

Net: 524 unit/smoke tests / 0 fail (was 519). New tests: 4 capture-error
scenarios, 3 Gemini cancel scenarios, 2 SIGTERM-trap cancel scenarios.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the cancel subcommand for the Gemini companion and enhances the cancellation mechanism for both Claude and Gemini. It introduces a 'cancel-marker' file to ensure jobs are correctly classified as cancelled even if the target process traps SIGTERM and exits with code 0. Additionally, it refines process liveness detection by distinguishing between a process being genuinely gone and errors encountered while attempting to capture process info (e.g., sandbox restrictions). Review feedback suggests using path.dirname() when creating directories for marker files to improve maintainability.

Comment thread plugins/claude/scripts/claude-companion.mjs Outdated
Comment thread plugins/gemini/scripts/gemini-companion.mjs Outdated
Three real failures observed on the first CI run; addressing two of them
(SonarCloud is architectural — see below).

1. coverage gate — job-record.mjs lines coverage dropped 91.38 -> 89.23 because
   the new execution.status === "cancelled" short-circuit (added by the
   cancel-marker fix) had no unit test in coverage scope. The smoke test
   exercises it but coverage runs with CODEX_PLUGIN_SKIP_SMOKE=1. Add a
   pair of unit tests (claude + gemini) that drive classifyExecution
   directly with status=cancelled + a "successful" execution tuple,
   asserting the lifecycle override wins. Local coverage now reads 93.85.

2. claude smoke — T7.4 sidecar deletion test flaked on Linux CI with
   ENOTEMPTY on rmdir of state/<subdir>/. Pre-existing race: the worker's
   finalization writes meta.json (status=completed) BEFORE upsertJob
   updates state.json AND BEFORE writeSidecar emits stdout.log/stderr.log.
   The test polls only meta.json; once it sees completed it runs
   recursive cleanup, racing the worker's tail writes. macOS local
   doesn't surface it because timer + fs scheduling lines up favorably;
   Linux CI hits it. Add a 250ms settle delay after the test detects
   completion so the worker's tail writes have flushed before the
   recursive rmSync starts walking. Annotated as the Linux flake
   mitigation for the next reader.

3. pre-commit test gate timeout — scope.test.mjs (155 real-git tests, ~140s)
   blows the 60s pre-commit npm test window. Port the SLOW_TEST_BASENAMES
   skip pattern from PR #21: default npm test is fast (~14s); CI sets
   CODEX_PLUGIN_FULL_TESTS=1 to include scope.test.mjs. Adds an
   npm run test:full convenience script for local pre-PR validation.

The third CI failure (SonarCloud) flags the byte-identical Claude/Gemini
lib pattern as duplication (41.9% on new code, >3% threshold) and 13
Security Hotspots. Both are architectural intent (parity-by-construction
+ dual-use process spawn for the target CLIs); not fixable without
breaking the parity guarantee. Sonar isn't a merge gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses gemini-code-assist comments 3151882114 / 3151882124 on PR #23.
Both companions now derive the marker directory from the canonical
cancelMarkerPath helper instead of reconstructing it inline. Single
source of truth — if cancelMarkerPath ever moves, the mkdir follows.

The deeper drift gap (parallel block in divergent files, no automated
parity guard) is tracked in #24 for systematic fix on this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Test User and others added 11 commits April 28, 2026 16:43
Pre-fix: capturePidInfo(child.pid) ran synchronously after spawn()
returned. On Linux, /proc/<pid>/cmdline still reflected the parent's
argv during the fork→execve race window, so recorded argv0 mismatched
the live argv0 at cancel time and verifyPidInfo refused to signal with
'argv0_mismatch'. Surfaced as a flake on slower CI runners
(PR #23 run 25039611325, smoke claude job).

Fix: extract attachPidCapture(child, onSpawn) into lib/identity.mjs
(VERBATIM_FILES — byte-identity enforced across plugins). Helper wraps
capture in child.once('spawn', ...) so /proc reflects the post-execve
target argv. Both dispatcher libs collapse to a single call site;
no parallel logic, no dual path.

Regression test in both claude/gemini dispatcher unit suites asserts
onSpawn fires asynchronously via the 'spawn' event, not synchronously
in the spawn() executor turn — fails deterministically on the pre-fix
code, passes on the post-fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issue #22 sub-task 2 introduced the cancel-marker sentinel as inline
duplicates in claude-companion.mjs and gemini-companion.mjs — three
pure functions plus the inline write block in cmdCancel. The two
companions are NOT in VERBATIM_FILES (they import different
dispatchers), so nothing automated catches drift between these
parallel blocks. Today's gemini-code-assist nit (dirname(markerPath))
had to be applied twice manually; a one-sided edit would have silently
drifted.

Extract cancelMarkerPath, writeCancelMarker, consumeCancelMarker into
lib/cancel-marker.mjs. Add to VERBATIM_FILES so byte-identity is
enforced across both copies. Both companions collapse to import calls;
no dual path. writeCancelMarker throws on failure so the caller's
target-specific stderr warning stays at the call site.

Adds tests/unit/cancel-marker.test.mjs covering happy path, mode 0600,
ISO timestamp body, fs-error propagation, and missing-marker consume.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The prior commit's unit test imported only from the claude side, so
the coverage gate (CODEX_PLUGIN_COVERAGE=1) marked
plugins/gemini/scripts/lib/cancel-marker.mjs at 0% line/function and
failed the 85% target. Mirrors the existing `* as GeminiIdentity`
pattern in identity.test.mjs: import the gemini side and drive a
minimal happy-path so the byte-identical copy gets coverage credit.

Local coverage gate now passes — both cancel-marker copies at 100/100/100.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements Option Y+ from the external consultation round on cancel
exit-code semantics, plus the pre-spawn marker checks that came out of
adversarial review (Vectors 1, 3, 5, 8).

Exit-code contract (both companions, byte-symmetric):
  exit 0: signaled, already_dead, already_terminal, cancel_pending (NEW)
  exit 1: bad_args, not_found, bad_state (NEW), signal_failed, cancel_failed (NEW)
  exit 2: stale_pid, unverifiable, no_pid_info (when status=running)

cmdCancel now distinguishes truly-terminal jobs from queued jobs:
queued + cancel writes a cancel marker and returns cancel_pending exit 0,
durably recording operator intent. Unknown statuses surface as bad_state
exit 1 instead of being silently treated as queued. Marker write failure
in the queued branch returns cancel_failed exit 1, since the marker IS
the cancel mechanism for queued jobs (no SIGTERM fallback).

cmdRunWorker and executeRun both consume the cancel marker before
spawning the target binary. The cmdRunWorker check is a fast path; the
executeRun check is load-bearing — it narrows the queued race window
from "containment + scope copy + pre-snapshot + spawn" (potentially
seconds with a large repo) to the microseconds between the check and
child.once('spawn') firing. The executeRun check also covers foreground
runs that bypass cmdRunWorker.

Tests:
- queued cancel → cancel_pending, marker present, exit 0
- _run-worker honors marker → status=cancelled, no spawn
- queued + marker write fails → cancel_failed exit 1
- unknown status → bad_state exit 1
- capture_error on running job → exit 2 (was: exit 0; pre-existing test
  encoded the bug per consultation Round 2)

Docs:
- commands/{claude,gemini}-cancel.md: full status × exit-code taxonomy
  including cancel_failed and bad_state
- README.md: removed stale "Gemini cancel deferred" claims
- tests/unit/docs-contracts.test.mjs: updated to match wired-cancel
  reality (gemini cancel was wired in PR #22 but doc-tests still
  asserted "not_implemented")

Deferred to follow-up commits per the agreed 6-commit sequence:
- Class 2: assertSafeJobId at marker boundary (path traversal)
- Class 3: /proc precondition + Darwin /bin/ps absolute
- Class 4: automated docs-vs-emitted-statuses contract test
- Class 5: argv0 child-binary assertion + configureState keys

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer Vector 4 (adversarial round): cmdCancel parses state.json and
passes job.id straight to writeCancelMarker. cancelMarkerPath did string
concat with no validation, so a tampered state.json carrying a traversal
id (e.g. "../../../escape") would land the marker outside the jobs dir.

Fix: export assertSafeJobId from lib/state.mjs (already used internally
for writeJobFile / resolveJobFile / state.json upserts) and call it at
the top of cancelMarkerPath. writeCancelMarker and consumeCancelMarker
inherit the validation because they go through cancelMarkerPath.

End-to-end behavior with tampered state.json: cmdCancel queued branch
calls writeCancelMarker → throws "Unsafe jobId: ..." → caught by the
Fix 2 path from the prior commit → cancel_failed exit 1. No marker
escapes the jobs dir.

Tests:
- cancelMarkerPath rejects 8 tampered values (../../../, foo/bar, ..,
  null bytes, empty, etc.)
- writeCancelMarker / consumeCancelMarker inherit the gate
- Same coverage on the gemini side via * as GeminiCancelMarker

Class 2 in the agreed 6-commit Class sequence (1 → 2 → 3 → 4 → 5a → 5b).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two platform-specific identity-capture bugs from reviewer round 2 (Finding B).

Linux (3a): captureLinux treated existsSync('/proc/<pid>/stat') === false as
proof of process_gone. In a hardened container or sandbox where /proc is
unmounted, every existsSync of a per-pid stat returns false — falsely
classifying LIVE pids as process_gone, which cmdCancel would then return
as already_dead exit 0. Fix: precondition existsSync('/proc') first; if
absent, throw capture_error (cmdCancel maps to unverifiable + exit 2).

Darwin (3b): captureDarwin invoked PATH-resolved 'ps'. A stripped or
shimmed PATH (sandbox env scrubbing, hostile stub) could break ownership
capture in two ways: (1) ps not findable → capture_error (already
handled, but undesirable); (2) silent-exit-1 stub on PATH mimics BSD
'no such pid' signal, falsely classifying LIVE pids as process_gone.
Fix: pin /bin/ps absolute. /bin/ps is part of the macOS base install
and stable; if it's missing or denied, spawnSync.error → capture_error
(safe answer).

Tests:
- Linux: /proc mounted but per-pid stat missing → process_gone (existing)
- Linux: /proc unmounted → capture_error (NEW)
- Darwin: identity.test.mjs's PATH-shim parser tests rewritten to use
  spawnSync mocking via withPlatformAndChild (the new helper). Adds an
  explicit assert.equal(observedBinary, "/bin/ps") regression guard.
- Darwin: identity-capture-error.test.mjs's hostile-PATH tests inverted
  — they now prove PATH manipulation has NO effect on capture (Class 3b
  regression guards), since /bin/ps is invoked absolute.

Class 3 in the agreed 6-commit Class sequence (1 → 2 → 3 → 4 → 5a → 5b).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer round 2 Finding H: the existing #25 dispatcher tests
("spawnClaude / spawnGemini: onSpawn fires asynchronously via 'spawn'
event, not synchronously") only assert async timing of the callback.
A regressed implementation that captures pid_info synchronously but
defers the onSpawn callback would still pass them — yet the captured
argv0 would be the PARENT's (the test runner = "node"), not the child
binary. The reviewer's repro confirmed this: a hypothetical
attachPidCapture that records `{argv0: "node /parent/argv.mjs"}`
synchronously and fires onSpawn from child.once('spawn') passes every
assertion the existing tests make.

Fix: add a focused regression test that spawns a child with a different
binary from the parent (/bin/sleep, not node) and asserts the captured
argv0 reflects the child binary post-execve. Pre-execve capture would
record argv0 starting with "node ..."; post-execve capture records
"sleep" (Darwin ps -o comm=) or "/bin/sleep" (Linux /proc/<pid>/cmdline).

The dispatcher #25 tests stay as-is — they prove a different (still
valuable) property: that onSpawn fires asynchronously through the
spawnClaude/spawnGemini integration path. This new test pins the
underlying lib API (attachPidCapture) to post-execve correctness.

Skips: non-linux/darwin platforms, missing /bin/sleep, and macOS-under-
coverage (sandbox can deny ps; Linux CI run covers it).

Class 5a in the agreed 6-commit Class sequence (1 → 2 → 3 → 4 → 5a → 5b).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ing I)

Reviewer round 2 Finding I: cancel-marker.test.mjs called
configureState({envVar, fallbackBaseName, sessionEnvVar}) — but the
actual configureState API uses pluginDataEnv, fallbackStateRootDir,
sessionIdEnv. The wrong keys are silently ignored (the function only
applies recognized keys), so the call was a complete no-op. The tests
still passed because process.env.CLAUDE_PLUGIN_DATA was set on the
next line, providing the actual config — the configureState call was
dead code masking the API misuse.

Fix: use correct keys at all three call sites in cancel-marker.test.mjs:
- freshWorkspace() helper (claude side, line 23)
- "gemini cancelMarkerPath: same path-traversal defense" (gemini side, line 163, added in Class 2)
- "gemini cancel-marker: byte-identical helper exercised on the gemini side" (gemini side, line 186)

The configureState call now meaningfully resets the module config to
its claude-side / gemini-side defaults — the original intent of the
helpers, which the wrong keys were silently defeating.

The other configureState callsite in tests/smoke/gemini-companion.smoke.test.mjs:805
already uses correct keys (pluginDataEnv, sessionIdEnv) — no change there.

Class 5b in the agreed 6-commit Class sequence (1 → 2 → 3 → 4 → 5a → 5b).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@seungpyoson seungpyoson marked this pull request as ready for review April 28, 2026 13:47
Copy link
Copy Markdown
Owner Author

Follow-up commit 5d80c7a adds the issue #26 error-display work:

  • Adds structured JobRecord diagnostics for scope-prep failures: error_summary, error_cause, suggested_action, and disclosure_note.
  • Adds targeted guidance for unsafe symlinks, scope population failures, missing base refs, non-git workspaces, missing HEAD, missing custom scope paths, and invalid profiles.
  • Updates Claude/Gemini result docs and Gemini delegation docs so wired cancel --job behavior is not described as deferred.
  • Adds regression coverage for the new diagnostics and docs contracts.

Verification on the rebased commit:

  • npm run lint
  • npm run lint:self-test
  • node --test --test-reporter=spec tests/unit/job-record.test.mjs
  • node --test --test-reporter=spec tests/unit/docs-contracts.test.mjs
  • git diff --check
  • npm test — 405 pass, 6 skipped, 0 fail

Copy link
Copy Markdown
Owner Author

Implemented the repo-owned review reliability fixes from #28.

What changed:

  • Added custom-review mode for both Claude and Gemini so pinned bundles / selected files can be reviewed through explicit --scope-paths instead of misusing branch-diff.
  • Added preflight for both companions. It computes selected files, file count, byte count, and disclosure text without launching the provider.
  • Added scope_empty fail-closed behavior for empty branch-diff/custom scopes before target launch.
  • Added targeted scope_empty JobRecord diagnostics.
  • Updated result-handling guidance so target permission denials with no substantive output are rendered as review blocked / no findings produced.
  • Updated command/skill/spec/README docs to distinguish scope failures, target read denials, and the host-owned pre-launch provider disclosure block tracked in External-provider preflight mitigation and pre-launch denial boundary #27.
  • Added regression coverage for mode profiles, scope behavior, JobRecord diagnostics, docs contracts, and Claude/Gemini mock smoke paths.

Verification:

  • npm run lint
  • npm run lint:self-test
  • git diff --check
  • npm test -> 419 pass, 6 skipped, 0 fail
  • node --test --test-reporter=spec tests/unit/scope.test.mjs -> 156 pass, 0 fail

Pushed commit: 292feba

@sonarqubecloud
Copy link
Copy Markdown

@seungpyoson seungpyoson merged commit 539ea1b into main Apr 29, 2026
5 checks passed
@seungpyoson seungpyoson deleted the fix/022-followups branch April 29, 2026 05:58
seungpyoson added a commit that referenced this pull request Apr 30, 2026
Captures the load-bearing architectural decisions made during the early development of this repo so future contributors can understand why the implementation differs from a simple upstream port of openai/codex-plugin-cc.

Topics covered:
- Containment vs. scope as orthogonal dimensions (the most important architectural choice)
- ModeProfile as the single source of mode defaults (model tier, permission mode, disallowed tools, containment, scope, dispose, context-stripping, add-dir)
- One JobRecord shape across foreground/background/status/result paths
- Distinct identity types (job_id vs target session IDs vs resume chains vs PID ownership)
- Shared-library checks need behavior tests, not only byte identity
- Upstream relationship to openai/codex-plugin-cc

Doc was written during fix/022-followups development but never committed before PR #23 merged. This commit lands it on main as a standalone docs addition.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant