feat: add /egc-grok — full-repo audit using 1M context#54
Conversation
Introduce a deterministic Node engine plus a slash command that audits the entire repo in one pass: language inventory, JS/TS import graph, Tarjan-based circular-dependency detection, and dead-file candidates. The slash command runs the script first so Gemini's synthesis layer cites exact paths and counts instead of re-deriving them. - scripts/lib/grok-engine.js: walk, import extraction, graph, cycles, dead-file detection, text/json formatting. - scripts/grok.js: CLI wrapper with --format text|json --scope. - commands/egc-grok.toml + workflows/grok.md: slash command with a synthesis contract that pins the LLM to script output. - tests/lib/grok-engine.test.js: 20 unit tests covering import parsing, graph construction, SCC cycle detection, dead-file heuristics, and report shape. - tests/run-all.js: register the new test file.
WalkthroughIntroduces a new Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR adds Confidence Score: 5/5Safe to merge; the only finding is a P2 robustness concern about recursive Tarjan SCC that won't affect typical repos. All findings are P2. The recursive Tarjan SCC is a theoretical concern that only manifests on repos with import chains tens-of-thousands of nodes deep — well beyond any real-world repo this tool would run against. The core logic is correct, entry-pattern normalization is solid, and the test coverage is comprehensive. scripts/lib/grok-engine.js — specifically the recursive detectCycles implementation if very large repos become a target in the future. Important Files Changed
Reviews (2): Last reviewed commit: "fix: address review feedback on PR #54" | Re-trigger Greptile |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/grok.js`:
- Around line 30-33: The CLI parsing currently allows `--scope` with no value
and proceeds; update the `if (arg === '--scope')` branch so that if args[index +
1] is missing or startsWith('--') you throw an error instead of silently
defaulting, and set parsed.scope to path.resolve(args[index + 1]) only when
present; additionally, before the report-building code that uses parsed.scope
(the later block around where parsed.scope/REPO_ROOT is referenced near line
90), validate that parsed.scope exists and is a directory (using fs.statSync or
fs.existsSync + isDirectory) and throw a clear error if not, so the script fails
fast on invalid/missing `--scope`.
In `@scripts/lib/grok-engine.js`:
- Around line 57-61: The try/catch blocks that silently swallow filesystem/JSON
errors (e.g., the fs.readdirSync(current) catch and the other similar catches
around fs.readFileSync/JSON.parse) must stop masking failures: either rethrow
the caught error or push a structured error into a shared errors array and
surface it in the tool's final report/exit code. Locate the catch blocks that
wrap fs.readdirSync, fs.readFileSync and JSON.parse (the ones catching _err) and
replace the empty continue; behavior with explicit handling — for CLI runs
rethrow the error (throw _err) or for tolerant runs push {path: current, op:
"readdir"/"readFile"/"parse", error: _err} into a collector and ensure the
caller checks that collector and fails with a non-zero exit or prints the
aggregated errors before exiting.
- Around line 275-280: The code pushes pkg.main and pkg.bin values verbatim into
patterns, causing mismatches for entries like "./cli/index.js"; update the logic
around patterns (the patterns array, the pkg.main branch, and the binValues
loop) to normalize each entry before pushing: run a path normalization (e.g.,
path.posix.normalize), then strip any leading "./" or leading "/" so entries
match walkRepo paths, and use the normalized string when calling patterns.push;
apply the same normalization to pkg.main and every bin value.
In `@tests/lib/grok-engine.test.js`:
- Around line 222-229: Add a regression test for dot-prefixed entry paths:
update the test that exercises loadEntryPatterns to include a package.json
bin/main entry using a './' prefix (e.g., './cli/index.js' or './src/main.js')
so the loader normalizes and includes the path; call engine.loadEntryPatterns
and assert that the returned patterns contain the normalized path without the
'./' prefix (referencing loadEntryPatterns and the existing test block in
grok-engine.test.js to locate where to add/extend the assertion).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6f437a8c-954e-45e9-946a-1c283ff951ac
📒 Files selected for processing (6)
commands/egc-grok.tomlscripts/grok.jsscripts/lib/grok-engine.jstests/lib/grok-engine.test.jstests/run-all.jsworkflows/grok.md
There was a problem hiding this comment.
5 issues found across 6 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="scripts/grok.js">
<violation number="1" location="scripts/grok.js:41">
P2: `arg.split('=')[1]` truncates values containing `=`. Use `arg.slice(arg.indexOf('=') + 1)` to capture the full value after the first `=`.</violation>
</file>
<file name="scripts/lib/grok-engine.js">
<violation number="1" location="scripts/lib/grok-engine.js:59">
P2: Filesystem errors are silently swallowed (`catch (_err) { continue }`), which can cause entire directories to be skipped without any indication. For an audit tool, this can produce misleadingly clean results (e.g., "0 cycles, 0 dead files") when the real problem is that files couldn't be read. Consider collecting errors and surfacing them in the report or exit status.</violation>
<violation number="2" location="scripts/lib/grok-engine.js:119">
P2: `stripCommentsAndStrings` does not actually strip string literals despite its name. Import-like patterns inside strings (e.g., `"require('foo')"`) will be incorrectly parsed as real imports, producing false edges in the dependency graph.</violation>
<violation number="3" location="scripts/lib/grok-engine.js:275">
P1: Entry paths from `package.json` (`main`, `bin`) are stored verbatim, but `walkRepo` returns paths via `path.relative()` which never includes a leading `./`. Since `isEntryFile` uses strict `rel === pattern` comparison, a `package.json` with `"main": "./index.js"` will never match the walk result `"index.js"`, causing false dead-file reports. Normalize entry paths by stripping the `./` prefix (e.g., `path.normalize(value.replace(/^\.\//, ''))`).</violation>
</file>
<file name="tests/lib/grok-engine.test.js">
<violation number="1" location="tests/lib/grok-engine.test.js:89">
P2: Do not treat absolute-path specifiers as relative imports here; they can be rewritten into repo-local candidates during resolution.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
| test('isRelativeImport classifies relative vs package specifiers', () => { | ||
| assert.strictEqual(engine.isRelativeImport('./a'), true); | ||
| assert.strictEqual(engine.isRelativeImport('../a'), true); | ||
| assert.strictEqual(engine.isRelativeImport('/abs/path'), true); |
There was a problem hiding this comment.
P2: Do not treat absolute-path specifiers as relative imports here; they can be rewritten into repo-local candidates during resolution.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tests/lib/grok-engine.test.js, line 89:
<comment>Do not treat absolute-path specifiers as relative imports here; they can be rewritten into repo-local candidates during resolution.</comment>
<file context>
@@ -0,0 +1,292 @@
+test('isRelativeImport classifies relative vs package specifiers', () => {
+ assert.strictEqual(engine.isRelativeImport('./a'), true);
+ assert.strictEqual(engine.isRelativeImport('../a'), true);
+ assert.strictEqual(engine.isRelativeImport('/abs/path'), true);
+ assert.strictEqual(engine.isRelativeImport('lodash'), false);
+ assert.strictEqual(engine.isRelativeImport('@scope/pkg'), false);
</file context>
There was a problem hiding this comment.
Keeping this as-is intentionally.
require('/abs/path') is a valid Node.js form (Node will resolve it as an absolute filesystem path), and our isRelativeImport is really asking "is this a path-style specifier the resolver should try against the repo file set, vs. a bare npm specifier we should ignore?" An absolute path that doesn't exist inside the repo just falls through resolveRelativeImport and gets recorded under unresolved — there is no rewrite, no false edge, and no graph corruption.
If we excluded /-prefixed specifiers up front we'd silently drop them from the unresolved count, which is worse signal for the synthesis layer. If a real-world repo started using absolute paths in a misleading way, the right fix is to surface them in the report (we already do, via unresolved), not to misclassify them at the boundary.
Happy to revisit if there's a concrete failure mode I'm missing.
There was a problem hiding this comment.
Thanks for the feedback! I've saved this as a new learning to improve future reviews.
P1 — entry-pattern path normalization (Greptile, cubic, CodeRabbit):
package.json `main` / `bin` values like "./index.js" are now
normalized so they match walkRepo's relative paths. Without this,
the package's own main file would be falsely flagged as a dead-file
candidate. Added regression tests covering ./, /, and bare prefixes.
P2 — CLI hardening (CodeRabbit, cubic):
- --scope and --format now require a non-flag value; bare or
flag-prefixed arguments raise "Missing value for ..." instead of
silently consuming the next arg.
- ensureDirectory validates the resolved scope before audit and
raises a clean message instead of a stack trace.
- --format=/--scope= now use slice(indexOf('=')+1) so values
containing '=' (e.g. directory names) are preserved.
P2 — fs error transparency (CodeRabbit, cubic):
walkRepo and buildGraph now collect read errors into a shared
errors[] which is surfaced as a "Read errors" section in text
output and an `errors` array in JSON. Audits stay tolerant but
no longer hide failures behind misleadingly clean results.
P2 — naming honesty (cubic):
stripCommentsAndStrings renamed to stripComments (the function
only strips comments). Documented the known limitation that
import-like patterns inside string literals can produce false
edges; acceptable for v1 since the LLM synthesis layer treats
them as candidates.
Tests: 35 grok unit + CLI tests pass; full suite 222/222.
Summary
/egc-grok, a new slash command that audits the entire repo in one pass — language inventory, JS/TS import graph, circular dependencies (Tarjan SCC), and dead-file candidates./egc-harness-auditdesign: a deterministic Node script is the source of truth for paths and counts; the LLM only adds a synthesis layer on top, so it cannot hallucinate file lists.vanilla CLI -> file-by-fileworkflow can't produce a repo-wide narrative in one shot, so this is the kind of "must-install" demo that answers "what does this give me on top of plain Gemini CLI?"What's in the PR
scripts/lib/grok-engine.jsrequire/import/export from/dynamic), graph build, Tarjan SCC, dead-file detection, text/json formatting.scripts/grok.js--format text|json,--scope <dir>.commands/egc-grok.tomlworkflows/grok.mdtests/lib/grok-engine.test.jstests/run-all.jsWhy this is differentiated
geminiopens files one by one;/egc-grokreads the whole repo and emits a single report.Test plan
npm run lint— cleannpm test— 207/207 pass (20 new inlib/grok-engine.test.js)node scripts/grok.json this repo: 670 files inventoried, 54 JS/TS nodes / 39 edges, 0 cycles, 0 dead-file candidates after entry-pattern tuningnode scripts/grok.js --format jsonproduces valid JSON parseable byJSON.parsenode scripts/grok.js --helpprints usage/egc-grokend-to-end (manual smoke from a fresh shell)Out of scope (deferred to v2)
--changedbaseline mode for cost-efficient incremental audits/egc-code-review)Summary by cubic
Add
/egc-grok, a deterministic full-repo audit that inventories files, builds a JS/TS import graph, detects cycles, and flags dead-file candidates. The CLI is hardened and the report now surfaces read errors; Gemini’s 1M context adds a short synthesis on top.New Features
node scripts/grok.js [scope] --format text|json.commands/egc-grok.toml,workflows/grok.md.tests/run-all.js. No new dependencies.Bug Fixes
package.jsonmain/binpaths to avoid false dead-file flags.--format=/--scope=support,ensureDirectoryvalidation.stripCommentsand document the string-literal limitation.Written for commit f823410. Summary will update on new commits. Review in cubic
Summary by CodeRabbit
New Features
/egc-grokcommand for comprehensive repository auditing, analyzing dependencies, detecting circular dependencies, and identifying dead files with JSON or text output formats.Documentation
Tests