Skip to content

feat: add /egc-grok — full-repo audit using 1M context#54

Merged
Jamkris merged 2 commits intomainfrom
feat/grok-repo-audit
Apr 30, 2026
Merged

feat: add /egc-grok — full-repo audit using 1M context#54
Jamkris merged 2 commits intomainfrom
feat/grok-repo-audit

Conversation

@Jamkris
Copy link
Copy Markdown
Owner

@Jamkris Jamkris commented Apr 30, 2026

Summary

  • Introduce /egc-grok, a new slash command that audits the entire repo in one pass — language inventory, JS/TS import graph, circular dependencies (Tarjan SCC), and dead-file candidates.
  • Pattern matches the existing /egc-harness-audit design: a deterministic Node script is the source of truth for paths and counts; the LLM only adds a synthesis layer on top, so it cannot hallucinate file lists.
  • Designed to lean on Gemini's 1M context window — a vanilla CLI -> file-by-file workflow can't produce a repo-wide narrative in one shot, so this is the kind of "must-install" demo that answers "what does this give me on top of plain Gemini CLI?"

What's in the PR

File Role
scripts/lib/grok-engine.js Pure functions: walk, import extraction (require/import/export from/dynamic), graph build, Tarjan SCC, dead-file detection, text/json formatting.
scripts/grok.js CLI wrapper. --format text|json, --scope <dir>.
commands/egc-grok.toml Slash command with a strict synthesis contract pinning the LLM to script output.
workflows/grok.md Mirror workflow doc.
tests/lib/grok-engine.test.js 20 unit tests (import parsing, cycles incl. self-loop and DAG, dead-file heuristics, report shape, text/json output).
tests/run-all.js Register the new test file.

Why this is differentiated

  • Repo-scale, not file-scale. Vanilla gemini opens files one by one; /egc-grok reads the whole repo and emits a single report.
  • Deterministic core. The engine is a normal Node script — no LLM in the audit path. The LLM only adds the narrative + Top 3 actions on top of fixed JSON.
  • No new dependencies. All graph + cycle work uses Node stdlib.

Test plan

  • npm run lint — clean
  • npm test — 207/207 pass (20 new in lib/grok-engine.test.js)
  • Self-run node scripts/grok.js on this repo: 670 files inventoried, 54 JS/TS nodes / 39 edges, 0 cycles, 0 dead-file candidates after entry-pattern tuning
  • node scripts/grok.js --format json produces valid JSON parseable by JSON.parse
  • node scripts/grok.js --help prints usage
  • Try it inside Gemini CLI as /egc-grok end-to-end (manual smoke from a fresh shell)

Out of scope (deferred to v2)

  • --changed baseline mode for cost-efficient incremental audits
  • Python / Go / Rust import graph (v1 is JS/TS only — matches this repo's own surface)
  • Security smell scanning (overlaps with the existing /egc-code-review)

Summary by cubic

Add /egc-grok, a deterministic full-repo audit that inventories files, builds a JS/TS import graph, detects cycles, and flags dead-file candidates. The CLI is hardened and the report now surfaces read errors; Gemini’s 1M context adds a short synthesis on top.

  • New Features

    • CLI: node scripts/grok.js [scope] --format text|json.
    • Engine: walk, import parse, Tarjan SCC cycles, dead-file heuristics; text/json output.
    • Slash command + docs: commands/egc-grok.toml, workflows/grok.md.
    • Tests: engine + CLI tests, registered in tests/run-all.js. No new dependencies.
  • Bug Fixes

    • Normalize package.json main/bin paths to avoid false dead-file flags.
    • CLI: strict arg parsing, value checks, --format=/--scope= support, ensureDirectory validation.
    • Report: collect and display read errors in text and JSON outputs.
    • Rename to stripComments and document the string-literal limitation.

Written for commit f823410. Summary will update on new commits. Review in cubic

Summary by CodeRabbit

  • New Features

    • Introduced /egc-grok command for comprehensive repository auditing, analyzing dependencies, detecting circular dependencies, and identifying dead files with JSON or text output formats.
  • Documentation

    • Added workflow documentation detailing the deterministic audit process and output specifications.
  • Tests

    • Added test suite for the analysis engine covering import extraction, dependency detection, cycle identification, and dead-file discovery.

Introduce a deterministic Node engine plus a slash command that
audits the entire repo in one pass: language inventory, JS/TS
import graph, Tarjan-based circular-dependency detection, and
dead-file candidates. The slash command runs the script first so
Gemini's synthesis layer cites exact paths and counts instead of
re-deriving them.

- scripts/lib/grok-engine.js: walk, import extraction, graph,
  cycles, dead-file detection, text/json formatting.
- scripts/grok.js: CLI wrapper with --format text|json --scope.
- commands/egc-grok.toml + workflows/grok.md: slash command with
  a synthesis contract that pins the LLM to script output.
- tests/lib/grok-engine.test.js: 20 unit tests covering import
  parsing, graph construction, SCC cycle detection, dead-file
  heuristics, and report shape.
- tests/run-all.js: register the new test file.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

Walkthrough

Introduces a new /egc-grok command infrastructure for automated repository audits. The system scans codebases, builds dependency graphs, detects circular dependencies using Tarjan's algorithm, identifies dead files, and outputs results in text or JSON format via a CLI interface.

Changes

Cohort / File(s) Summary
Command & CLI
commands/egc-grok.toml, scripts/grok.js
New command definition and CLI entry point; parses --scope and --format flags; invokes scripts/grok.js as the authoritative generator; conditionally synthesizes output (text mode prepends hotspots and actions; JSON returns script output unchanged).
Core Audit Engine
scripts/lib/grok-engine.js
New engine module implementing repository scanning, import extraction (CommonJS/ES), dependency graph construction, cycle detection via Tarjan's strongly connected components, and dead-file identification via reverse-graph analysis; exports analysis and formatting functions.
Testing
tests/lib/grok-engine.test.js, tests/run-all.js
Comprehensive test suite validating import extraction, path resolution, graph construction, cycle detection (2-cycle, self-loop, DAG scenarios), dead-file detection against entry patterns, and report serialization; test runner updated to include new test file.
Documentation
workflows/grok.md
Workflow documentation defining the deterministic audit process, output requirements, and constraints on fix proposals (limited to detected cycle/dead-file lists unless explicitly overridden).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

commands, docs, security

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add /egc-grok — full-repo audit using 1M context' directly and clearly summarizes the main change: introduction of a new /egc-grok command for comprehensive repository auditing using Gemini's 1M context window.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/grok-repo-audit

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 30, 2026

Greptile Summary

This PR adds /egc-grok, a deterministic full-repo audit command that walks the file tree, builds a JS/TS import graph, detects circular dependencies via Tarjan's SCC, and flags dead-file candidates — with synthesis delegated to Gemini's 1M context window on top of fixed script output. The implementation is clean and well-tested (20 new unit tests), and the pkg.main/pkg.bin normalization from the previous review round is properly addressed via normalizePackagePath.

Confidence Score: 5/5

Safe to merge; the only finding is a P2 robustness concern about recursive Tarjan SCC that won't affect typical repos.

All findings are P2. The recursive Tarjan SCC is a theoretical concern that only manifests on repos with import chains tens-of-thousands of nodes deep — well beyond any real-world repo this tool would run against. The core logic is correct, entry-pattern normalization is solid, and the test coverage is comprehensive.

scripts/lib/grok-engine.js — specifically the recursive detectCycles implementation if very large repos become a target in the future.

Important Files Changed

Filename Overview
scripts/lib/grok-engine.js Core engine with walk, import parsing, Tarjan SCC, dead-file detection, and text/JSON formatting — well-structured with good error collection; recursive Tarjan SCC may overflow for deep graphs in large repos
scripts/grok.js CLI wrapper with clean argument parsing, format validation, and directory verification — all edge cases handled correctly
commands/egc-grok.toml Well-formed TOML command with egc- prefix, non-empty description, and a tight synthesis contract that pins the LLM to script output
tests/lib/grok-engine.test.js Comprehensive unit tests covering import parsing, cycle detection (including self-loops and DAGs), dead-file heuristics, and report shape/format
tests/lib/grok-cli.test.js CLI arg-parsing tests covering flags, edge cases, and directory validation — all correct
tests/run-all.js Registers both new test files in the correct order
workflows/grok.md Mirrors the TOML command's synthesis contract; clear usage, deterministic engine, and rules sections

Reviews (2): Last reviewed commit: "fix: address review feedback on PR #54" | Re-trigger Greptile

Comment thread scripts/lib/grok-engine.js Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/grok.js`:
- Around line 30-33: The CLI parsing currently allows `--scope` with no value
and proceeds; update the `if (arg === '--scope')` branch so that if args[index +
1] is missing or startsWith('--') you throw an error instead of silently
defaulting, and set parsed.scope to path.resolve(args[index + 1]) only when
present; additionally, before the report-building code that uses parsed.scope
(the later block around where parsed.scope/REPO_ROOT is referenced near line
90), validate that parsed.scope exists and is a directory (using fs.statSync or
fs.existsSync + isDirectory) and throw a clear error if not, so the script fails
fast on invalid/missing `--scope`.

In `@scripts/lib/grok-engine.js`:
- Around line 57-61: The try/catch blocks that silently swallow filesystem/JSON
errors (e.g., the fs.readdirSync(current) catch and the other similar catches
around fs.readFileSync/JSON.parse) must stop masking failures: either rethrow
the caught error or push a structured error into a shared errors array and
surface it in the tool's final report/exit code. Locate the catch blocks that
wrap fs.readdirSync, fs.readFileSync and JSON.parse (the ones catching _err) and
replace the empty continue; behavior with explicit handling — for CLI runs
rethrow the error (throw _err) or for tolerant runs push {path: current, op:
"readdir"/"readFile"/"parse", error: _err} into a collector and ensure the
caller checks that collector and fails with a non-zero exit or prints the
aggregated errors before exiting.
- Around line 275-280: The code pushes pkg.main and pkg.bin values verbatim into
patterns, causing mismatches for entries like "./cli/index.js"; update the logic
around patterns (the patterns array, the pkg.main branch, and the binValues
loop) to normalize each entry before pushing: run a path normalization (e.g.,
path.posix.normalize), then strip any leading "./" or leading "/" so entries
match walkRepo paths, and use the normalized string when calling patterns.push;
apply the same normalization to pkg.main and every bin value.

In `@tests/lib/grok-engine.test.js`:
- Around line 222-229: Add a regression test for dot-prefixed entry paths:
update the test that exercises loadEntryPatterns to include a package.json
bin/main entry using a './' prefix (e.g., './cli/index.js' or './src/main.js')
so the loader normalizes and includes the path; call engine.loadEntryPatterns
and assert that the returned patterns contain the normalized path without the
'./' prefix (referencing loadEntryPatterns and the existing test block in
grok-engine.test.js to locate where to add/extend the assertion).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6f437a8c-954e-45e9-946a-1c283ff951ac

📥 Commits

Reviewing files that changed from the base of the PR and between a49a0f9 and a42f579.

📒 Files selected for processing (6)
  • commands/egc-grok.toml
  • scripts/grok.js
  • scripts/lib/grok-engine.js
  • tests/lib/grok-engine.test.js
  • tests/run-all.js
  • workflows/grok.md

Comment thread scripts/grok.js
Comment thread scripts/lib/grok-engine.js
Comment thread scripts/lib/grok-engine.js Outdated
Comment thread tests/lib/grok-engine.test.js
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 6 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="scripts/grok.js">

<violation number="1" location="scripts/grok.js:41">
P2: `arg.split('=')[1]` truncates values containing `=`. Use `arg.slice(arg.indexOf('=') + 1)` to capture the full value after the first `=`.</violation>
</file>

<file name="scripts/lib/grok-engine.js">

<violation number="1" location="scripts/lib/grok-engine.js:59">
P2: Filesystem errors are silently swallowed (`catch (_err) { continue }`), which can cause entire directories to be skipped without any indication. For an audit tool, this can produce misleadingly clean results (e.g., "0 cycles, 0 dead files") when the real problem is that files couldn't be read. Consider collecting errors and surfacing them in the report or exit status.</violation>

<violation number="2" location="scripts/lib/grok-engine.js:119">
P2: `stripCommentsAndStrings` does not actually strip string literals despite its name. Import-like patterns inside strings (e.g., `"require('foo')"`) will be incorrectly parsed as real imports, producing false edges in the dependency graph.</violation>

<violation number="3" location="scripts/lib/grok-engine.js:275">
P1: Entry paths from `package.json` (`main`, `bin`) are stored verbatim, but `walkRepo` returns paths via `path.relative()` which never includes a leading `./`. Since `isEntryFile` uses strict `rel === pattern` comparison, a `package.json` with `"main": "./index.js"` will never match the walk result `"index.js"`, causing false dead-file reports. Normalize entry paths by stripping the `./` prefix (e.g., `path.normalize(value.replace(/^\.\//,  ''))`).</violation>
</file>

<file name="tests/lib/grok-engine.test.js">

<violation number="1" location="tests/lib/grok-engine.test.js:89">
P2: Do not treat absolute-path specifiers as relative imports here; they can be rewritten into repo-local candidates during resolution.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread scripts/lib/grok-engine.js Outdated
Comment thread scripts/grok.js
Comment thread scripts/lib/grok-engine.js Outdated
Comment thread scripts/lib/grok-engine.js Outdated
test('isRelativeImport classifies relative vs package specifiers', () => {
assert.strictEqual(engine.isRelativeImport('./a'), true);
assert.strictEqual(engine.isRelativeImport('../a'), true);
assert.strictEqual(engine.isRelativeImport('/abs/path'), true);
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Do not treat absolute-path specifiers as relative imports here; they can be rewritten into repo-local candidates during resolution.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tests/lib/grok-engine.test.js, line 89:

<comment>Do not treat absolute-path specifiers as relative imports here; they can be rewritten into repo-local candidates during resolution.</comment>

<file context>
@@ -0,0 +1,292 @@
+test('isRelativeImport classifies relative vs package specifiers', () => {
+  assert.strictEqual(engine.isRelativeImport('./a'), true);
+  assert.strictEqual(engine.isRelativeImport('../a'), true);
+  assert.strictEqual(engine.isRelativeImport('/abs/path'), true);
+  assert.strictEqual(engine.isRelativeImport('lodash'), false);
+  assert.strictEqual(engine.isRelativeImport('@scope/pkg'), false);
</file context>
Fix with Cubic

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping this as-is intentionally.

require('/abs/path') is a valid Node.js form (Node will resolve it as an absolute filesystem path), and our isRelativeImport is really asking "is this a path-style specifier the resolver should try against the repo file set, vs. a bare npm specifier we should ignore?" An absolute path that doesn't exist inside the repo just falls through resolveRelativeImport and gets recorded under unresolved — there is no rewrite, no false edge, and no graph corruption.

If we excluded /-prefixed specifiers up front we'd silently drop them from the unresolved count, which is worse signal for the synthesis layer. If a real-world repo started using absolute paths in a misleading way, the right fix is to surface them in the report (we already do, via unresolved), not to misclassify them at the boundary.

Happy to revisit if there's a concrete failure mode I'm missing.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I've saved this as a new learning to improve future reviews.

P1 — entry-pattern path normalization (Greptile, cubic, CodeRabbit):
  package.json `main` / `bin` values like "./index.js" are now
  normalized so they match walkRepo's relative paths. Without this,
  the package's own main file would be falsely flagged as a dead-file
  candidate. Added regression tests covering ./, /, and bare prefixes.

P2 — CLI hardening (CodeRabbit, cubic):
  - --scope and --format now require a non-flag value; bare or
    flag-prefixed arguments raise "Missing value for ..." instead of
    silently consuming the next arg.
  - ensureDirectory validates the resolved scope before audit and
    raises a clean message instead of a stack trace.
  - --format=/--scope= now use slice(indexOf('=')+1) so values
    containing '=' (e.g. directory names) are preserved.

P2 — fs error transparency (CodeRabbit, cubic):
  walkRepo and buildGraph now collect read errors into a shared
  errors[] which is surfaced as a "Read errors" section in text
  output and an `errors` array in JSON. Audits stay tolerant but
  no longer hide failures behind misleadingly clean results.

P2 — naming honesty (cubic):
  stripCommentsAndStrings renamed to stripComments (the function
  only strips comments). Documented the known limitation that
  import-like patterns inside string literals can produce false
  edges; acceptable for v1 since the LLM synthesis layer treats
  them as candidates.

Tests: 35 grok unit + CLI tests pass; full suite 222/222.
@Jamkris Jamkris merged commit c8f970a into main Apr 30, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant