feat: test coverage catalog + failure triage + see-something-say-something (v0.9.5.0) by garrytan · Pull Request #259 · garrytan/gstack

garrytan · 2026-03-20T23:20:06Z

Summary

Extract {{TEST_COVERAGE_AUDIT}} shared resolver with 3 mode-specific placeholders (plan/ship/review)
Add /review Step 4.75 for codepath tracing during pre-landing review
Shared methodology: ASCII coverage diagrams, quality rubric, E2E decision matrix, regression detection iron rule, test framework auto-detection
Solo dev vs collaborative repo detection (bin/gstack-repo-mode) — solo devs get offered fixes, teams get blame+assign
Pre-existing test failure triage — classify in-branch vs pre-existing, offer options instead of blocking
"See Something, Say Something" universal principle across all skills
Shell injection prevention in gstack-repo-mode

Test Coverage

465 free tests pass (skill validation + gen-skill-docs)
2/2 coverage audit E2Es pass (ship + review, stable across 2 runs)
Full E2E suite: 7 pre-existing failures, all unrelated
Codex code review: PASS (from prior run; timed out on re-run due to diff size)

Pre-Landing Review

No issues found.

TODOS

Added: "See Something, Say Something" principle TODO for remaining skills

Test plan

All free tests pass (465 tests, 0 failures)
Ship coverage audit E2E passes
Review coverage audit E2E passes
Merge conflicts resolved cleanly
Generated SKILL.md files are fresh

🤖 Generated with Claude Code

DRY extraction of the test coverage audit methodology into a shared generator function with three explicit placeholders: - TEST_COVERAGE_AUDIT_PLAN (plan-eng-review) - TEST_COVERAGE_AUDIT_SHIP (ship) - TEST_COVERAGE_AUDIT_REVIEW (review) Shared across all modes: codepath tracing, ASCII diagram format, quality scoring rubric, E2E test decision matrix, regression rule, and test framework detection via CLAUDE.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace the thin 6-line Section 3 test review with the full shared methodology via {{TEST_COVERAGE_AUDIT_PLAN}}. Plan mode now: - Traces every codepath with full ASCII diagrams - Adds missing tests to the plan (not just "check for tests") - Writes test plan artifact for /qa consumption - Includes E2E/eval recommendations and regression detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace 135 lines of inline Step 3.4 methodology with {{TEST_COVERAGE_AUDIT_SHIP}}. Functionally identical output plus: - E2E test decision matrix (marks paths needing E2E vs unit) - Eval recommendations for LLM prompt changes - Regression detection iron rule - Test framework detection via CLAUDE.md first - Test plan artifact for /qa consumption Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add codepath tracing to the pre-landing review via {{TEST_COVERAGE_AUDIT_REVIEW}}. Review mode: - Produces ASCII coverage diagram (same methodology as plan/ship) - Generates tests for gaps via Fix-First (ASK user) - Subsumes Pass 2 "Test Gaps" checklist category - Gaps are INFORMATIONAL findings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

10 new tests verifying the three TEST_COVERAGE_AUDIT placeholders: - All modes share: codepath tracing, E2E matrix, regression rule - Plan mode: adds to plan + artifact, no ship-specific content - Ship mode: auto-generates + before/after count + coverage summary - Review mode: Fix-First ASK + INFORMATIONAL, no artifact - Regression guard: ship SKILL.md preserves all key phrases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Extract billing.ts fixture into coverage-audit-fixture.ts (DRY) - Refactor ship-coverage-audit E2E to use shared fixture - Add review-coverage-audit E2E for Step 4.75 - Update touchfiles: both E2Es depend on shared fixture Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The coverage audit E2E tests (ship + review) were only asserting exitReason === 'success' and readCalls > 0 — they passed even if the agent produced no coverage diagram. Add assertion that the output contains either GAP or TESTED markers. Found during /review. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Codex adversarial review caught that plan-eng-review was inheriting "git diff origin/<base>...HEAD" from the shared resolver, but plan mode reviews a plan document, not a code diff. Plan mode now says: "Trace every codepath in the plan" and "Read the plan document." Ship and review modes keep the git diff instruction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…e-catalog # Conflicts: # scripts/gen-skill-docs.ts # test/gen-skill-docs.test.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…e-catalog # Conflicts: # CHANGELOG.md # VERSION

* feat: add bin/gstack-repo-mode — solo vs collaborative detection with caching Detects whether a repo is solo-dev (one person does 80%+ of recent commits) or collaborative. Uses 90-day git shortlog window with 7-day cache in ~/.gstack/projects/{SLUG}/repo-mode.json. Config override via `gstack-config set repo_mode solo|collaborative` takes precedence over the heuristic. Minimum 5 commits required to classify (otherwise unknown). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: test failure ownership triage — see something say something Adds two new preamble sections to all gstack skills: - Repo Ownership Mode: explains solo vs collaborative behavior - See Something, Say Something: proactive issue flagging principle Adds {{TEST_FAILURE_TRIAGE}} template variable (opt-in, used by /ship): - Classifies test failures as in-branch vs pre-existing - Solo mode defaults to "investigate and fix now" - Collaborative mode offers "blame + assign GitHub issue" option - Also offers P0 TODO and skip options /ship Step 3 now triages test failures instead of hard-stopping on all failures. In-branch failures still block shipping. Pre-existing failures get user-directed triage based on repo mode. Adds P2 TODO for gstack notes system (deferred lightweight reminder). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files for Claude and Codex hosts All 22 Claude skills and 21 Codex skills regenerated with new preamble sections (Repo Ownership Mode, See Something Say Something) and {{TEST_FAILURE_TRIAGE}} resolved in ship/SKILL.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: validate repo mode values to prevent shell injection Codex adversarial review found that unvalidated config/cache values could be injected into shell via source <(gstack-repo-mode). Added validate_mode() that only allows solo|collaborative|unknown — anything else becomes "unknown". Prevents persistent code execution through malicious config.yaml or tampered cache JSON. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: shell injection via branch names + feature-branch sampling bias Codex code review found two issues: P1: eval $(gstack-slug) in gstack-repo-mode executes branch names as shell. Branch names like foo$(touch${IFS}pwned) are valid git refs and would execute arbitrary commands. Fix: compute SLUG directly with sed instead of eval'ing gstack-slug output. P2: git shortlog HEAD only sees current branch history. On feature branches that haven't merged main recently, other contributors disappear from the sample. Fix: use git shortlog on the default branch (origin/main) instead of HEAD. Also improved blame lookup in collaborative triage to check both the test file and the production code it covers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: broaden codex-host stripping test to accommodate triage section "Investigate and fix" now appears in TEST_FAILURE_TRIAGE (not just the Codex review step). Use CODEX_REVIEWS config string as a more specific marker for detecting the Codex review step in Codex-hosted skills. * fix: replace template placeholder in TODOS.md with readable text {{TEST_FAILURE_TRIAGE}} is template syntax but TODOS.md is not processed by gen-skill-docs — replaced with human-readable reference. * chore: bump version and changelog (v0.9.5.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add bin/ directory to project structure in CLAUDE.md * test: add triage resolver unit tests, plan-eng coverage audit E2E, and triage E2E - TEST_FAILURE_TRIAGE resolver: 6 unit tests verifying all triage steps (T1-T4), REPO_MODE branching, and safety default for ambiguous failures - plan-eng-coverage-audit E2E: tests /plan-eng-review coverage audit codepath (gap identified during eng review — existed on neither branch) - ship-triage E2E: planted-bug fixture with in-branch (truncate null) and pre-existing (divide-by-zero) failures; verifies correct classification - Touchfile entries for diff-based test selection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate stale Codex SKILL.md for retro Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Split `git remote get-url origin` into a separate variable with `|| true` so the script doesn't crash under `set -euo pipefail` in local-only repos. Falls back to REPO_MODE=unknown gracefully. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Changed preamble from `source <(...) || REPO_MODE=unknown` (which doesn't catch empty output) to `source <(...) || true` followed by `REPO_MODE=${REPO_MODE:-unknown}`. Regenerated all SKILL.md files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

math.test.js called process.exit(1) which killed the runner before string.test.js could execute. Changed test runner to use child_process so each test runs independently and both failure classes are exercised. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…e-catalog # Conflicts: # CHANGELOG.md

Fall back through origin/main → origin/master → HEAD when git symbolic-ref refs/remotes/origin/HEAD is not set. Prevents shortlog crash in repos where origin/HEAD isn't configured. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add assertions verifying both math.test.js (pre-existing failure) and string.test.js (in-branch failure) actually executed during triage. Prevents false passes where only one failure class is exercised. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Remove head -20 truncation that biased solo classification by dropping low-volume contributors from the denominator - Use atomic write (mktemp + mv) for cache to prevent concurrent preamble reads from seeing partial JSON Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…e-catalog Resolves conflicts: gen-skill-docs.ts (both repo-mode + search-before-building), test files (both coverage audit + plan-file-review-report tests), touchfiles (both repo-mode + ship-local-workflow entries). Regenerated all SKILL.md files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

garrytan and others added 10 commits March 20, 2026 07:57

Merge remote-tracking branch 'origin/main' into garrytan/test-coverag…

3860eb6

…e-catalog # Conflicts: # scripts/gen-skill-docs.ts # test/gen-skill-docs.test.ts

chore: bump version and changelog (v0.9.5.0)

8d25c58

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

boinger mentioned this pull request Mar 21, 2026

feat: /codebase-audit — full pipeline from diagnosis to fix plan to review #266

Open

6 tasks

garrytan and others added 2 commits March 21, 2026 09:27

Merge remote-tracking branch 'origin/main' into garrytan/test-coverag…

624c779

…e-catalog # Conflicts: # CHANGELOG.md # VERSION

garrytan changed the title ~~feat: shared test coverage audit across plan/ship/review (v0.9.5.0)~~ feat: test coverage catalog + failure triage + see-something-say-something (v0.9.5.0) Mar 21, 2026

garrytan and others added 8 commits March 21, 2026 09:55

Merge remote-tracking branch 'origin/main' into garrytan/test-coverag…

74add15

…e-catalog # Conflicts: # CHANGELOG.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: test coverage catalog + failure triage + see-something-say-something (v0.9.5.0)#259

feat: test coverage catalog + failure triage + see-something-say-something (v0.9.5.0)#259
garrytan wants to merge 20 commits intomainfrom
garrytan/test-coverage-catalog

garrytan commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Coverage

Pre-Landing Review

TODOS

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

garrytan commented Mar 20, 2026 •

edited

Loading