diff --git a/.cursor/rules/codegraph.mdc b/.cursor/rules/codegraph.mdc index dac86b3a..4b6073b4 100644 --- a/.cursor/rules/codegraph.mdc +++ b/.cursor/rules/codegraph.mdc @@ -13,22 +13,22 @@ Use codegraph for **structural** questions — what calls what, what would break | Question | Tool | |---|---| +| "How does X work? / trace X / explain a system / architecture" | `codegraph_explore` (seed with symbol names) | | "Where is X defined?" / "Find symbol named X" | `codegraph_search` | | "What calls function Y?" | `codegraph_callers` | | "What does Y call?" | `codegraph_callees` | | "What would break if I changed Z?" | `codegraph_impact` | | "Show me Y's signature / source / docstring" | `codegraph_node` | | "Give me focused context for a task/area" | `codegraph_context` | -| "Survey an unfamiliar module/topic" | `codegraph_explore` | | "What files exist under path/" | `codegraph_files` | | "Is the index healthy?" | `codegraph_status` | ### Rules of thumb +- **`codegraph_explore` is the workhorse for understanding questions** ("how does X work", "trace…", "explain the Y system"). Feed it the key symbol/file names and read its output (line-numbered source from many files in one call). If the question names nothing concrete, do one quick `codegraph_search`/`codegraph_context` to surface the names, then explore with them. Fill gaps with `codegraph_node`/Read — don't grep-and-read your way through; that's the loop explore replaces. +- **Delegating exploration to a subagent?** Tell it to call `codegraph_explore` first and trust the result. A generic "explore"-style agent defaults to grep+Read and treats codegraph as just a search index, throwing away the token savings. - **Trust codegraph results.** They come from a full AST parse. Do NOT re-verify them with grep — that's slower, less accurate, and wastes context. - **Don't grep first** when looking up a symbol by name. `codegraph_search` is faster and returns kind + location + signature in one call. -- **Don't chain `codegraph_search` + `codegraph_node`** when you just want context — `codegraph_context` is one call. -- **`codegraph_explore` is the heavy hitter** for unfamiliar areas — it returns full source from all relevant files in one call, but is token-heavy. If your harness supports parallel subagents (e.g., Claude Code's Task tool), spawn one for explore-class questions to keep main session context clean. - **Index lag**: the file watcher debounces ~500ms behind writes; don't re-query immediately after editing a file in the same turn. ### If `.codegraph/` doesn't exist diff --git a/CHANGELOG.md b/CHANGELOG.md index 7c32c152..57f19200 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -44,6 +44,17 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). VS Code ~12%. Agent-trust floor still holds — the Relationships section, scored cluster selection, and structured-source output are all retained. Thanks to [@essopsp](https://github.com/essopsp) for the repro. +- **MCP / tool guidance**: the tool descriptions and installed instructions + now steer agents to treat `codegraph_explore` as the workhorse for + understanding/architecture/"how does X work" questions — seed it with the + key symbol names (a quick `codegraph_search`/`codegraph_context` first if + the question names nothing concrete) and read its output, rather than + searching and then Reading each file. Diagnosed from a benchmark run where + Claude Code's Explore agent used `codegraph_search` + Read + grep (37 tool + calls, ~90k tokens) and never called `codegraph_explore`, vs a + general-purpose agent that led with explore (13 calls, ~55k tokens) for the + same VS Code question. Updated in lockstep across `server-instructions.ts`, + `instructions-template.ts`, and `.cursor/rules/codegraph.mdc`. ### Fixed - **MCP**: source-omission markers in `codegraph_explore` and @@ -51,6 +62,15 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). `... (trimmed) ...`, `... (truncated) ...`) instead of C-style `//` comments, which were misleading inside Python, Ruby, and other non-C fenced source blocks. +- **Search/explore ranking**: test-file detection now recognizes Kotlin + (`*Test.kt`, `jvmTest/`/`commonTest/`/`androidTest/` source sets), Swift + (`*Tests.swift`), and other camelCase test conventions, so test code is + properly deprioritized in `codegraph_explore` / `codegraph_context` + results. Previously only Java/JS/Python conventions were known, which let + test files dominate exploration of Kotlin/Swift codebases (e.g. an OkHttp + "trace a request" query returned 8/9 test files; now it surfaces + `Call.kt`, `OkHttpClient.kt`, `Request.kt`, `Response.kt`). Capital-led + matching keeps production files like `latest.kt` / `manifest.kt` unflagged. ## [0.7.10] - 2026-05-19 diff --git a/README.md b/README.md index 910d7801..49cf8d54 100644 --- a/README.md +++ b/README.md @@ -492,6 +492,16 @@ The `.codegraph/config.json` file controls indexing: **Missing symbols** — The MCP server auto-syncs on save (wait a couple seconds). Run `codegraph sync` manually if needed. Check that the file's language is supported and isn't excluded by config patterns. +## Star History + + + + + + Star History Chart + + + ## License MIT diff --git a/__tests__/installer-targets.test.ts b/__tests__/installer-targets.test.ts index 89ba6290..2b494468 100644 --- a/__tests__/installer-targets.test.ts +++ b/__tests__/installer-targets.test.ts @@ -41,6 +41,16 @@ function setHome(dir: string): { restore: () => void } { }; } +function setPlatform(platform: NodeJS.Platform): { restore: () => void } { + const previous = Object.getOwnPropertyDescriptor(process, 'platform'); + Object.defineProperty(process, 'platform', { value: platform }); + return { + restore() { + if (previous) Object.defineProperty(process, 'platform', previous); + }, + }; +} + describe('Installer targets — contract', () => { let tmpHome: string; let tmpCwd: string; @@ -219,6 +229,33 @@ describe('Installer targets — partial-state idempotency', () => { expect(result.files[0].action).toBe('created'); }); + it('opencode: Windows global install uses ~/.config/opencode instead of APPDATA', () => { + const platform = setPlatform('win32'); + const appData = path.join(tmpHome, 'AppData', 'Roaming'); + const prevAppData = process.env.APPDATA; + const prevXdgConfigHome = process.env.XDG_CONFIG_HOME; + process.env.APPDATA = appData; + delete process.env.XDG_CONFIG_HOME; + + try { + const opencode = getTarget('opencode')!; + const result = opencode.install('global', { autoAllow: true }); + const configFile = path.join(tmpHome, '.config', 'opencode', 'opencode.jsonc'); + const instructionsFile = path.join(tmpHome, '.config', 'opencode', 'AGENTS.md'); + const appDataConfigFile = path.join(appData, 'opencode', 'opencode.jsonc'); + + expect(result.files.map((f) => f.path)).toContain(configFile); + expect(result.files.map((f) => f.path)).toContain(instructionsFile); + expect(fs.existsSync(configFile)).toBe(true); + expect(fs.existsSync(instructionsFile)).toBe(true); + expect(fs.existsSync(appDataConfigFile)).toBe(false); + } finally { + platform.restore(); + if (prevAppData === undefined) delete process.env.APPDATA; else process.env.APPDATA = prevAppData; + if (prevXdgConfigHome === undefined) delete process.env.XDG_CONFIG_HOME; else process.env.XDG_CONFIG_HOME = prevXdgConfigHome; + } + }); + it('opencode: preserves line and block comments through install + idempotent re-run', () => { const opencode = getTarget('opencode')!; const dir = path.join(tmpHome, '.config', 'opencode'); @@ -298,9 +335,9 @@ describe('Installer targets — partial-state idempotency', () => { const opencode = getTarget('opencode')!; const result = opencode.install('local', { autoAllow: true }); const paths = result.files.map((f) => f.path); - // macOS realpath shenanigans (/var vs /private/var) — suffix match. - expect(paths.some((p) => p.endsWith('/opencode.jsonc'))).toBe(true); - expect(paths.some((p) => p.endsWith('/AGENTS.md'))).toBe(true); + // Avoid absolute-path quirks (/var vs /private/var, or \ vs /). + expect(paths.some((p) => path.basename(p) === 'opencode.jsonc')).toBe(true); + expect(paths.some((p) => path.basename(p) === 'AGENTS.md')).toBe(true); }); it('opencode: uninstall removes only mcp.codegraph, preserves comments and siblings', () => { diff --git a/__tests__/is-test-file.test.ts b/__tests__/is-test-file.test.ts new file mode 100644 index 00000000..e3fc6d03 --- /dev/null +++ b/__tests__/is-test-file.test.ts @@ -0,0 +1,53 @@ +/** + * isTestFile heuristic — test-file detection used to deprioritize test code in + * search/explore ranking. + * + * Regression coverage for the cold-query fix: the heuristic previously only + * knew Java/JS/Python conventions, so Kotlin (`*Test.kt`, `jvmTest/`), Swift + * (`*Tests.swift`), and camelCase test source-set dirs slipped through — which + * let OkHttp's tests flood `codegraph_explore` results on a plain-language + * query. The false-positive guards matter just as much: `latest.kt` / + * `manifest.kt` / a `RealCall.kt` production file must NOT be flagged. + */ +import { describe, it, expect } from 'vitest'; +import { isTestFile } from '../src/search/query-utils'; + +describe('isTestFile', () => { + it('flags Kotlin test files and source sets', () => { + expect(isTestFile('okhttp/src/jvmTest/kotlin/okhttp3/CallTest.kt')).toBe(true); + expect(isTestFile('okhttp/src/commonTest/kotlin/okhttp3/CompressionInterceptorTest.kt')).toBe(true); + expect(isTestFile('app/src/androidTest/java/com/example/FooTest.kt')).toBe(true); + expect(isTestFile('module/src/integrationTest/kotlin/BarSpec.kt')).toBe(true); + }); + + it('flags Swift test files', () => { + expect(isTestFile('Tests/SessionTests.swift')).toBe(true); + expect(isTestFile('Sources/FooTest.swift')).toBe(true); + }); + + it('still flags the previously-supported conventions', () => { + expect(isTestFile('foo/test_bar.py')).toBe(true); + expect(isTestFile('pkg/bar_test.go')).toBe(true); + expect(isTestFile('src/foo.test.ts')).toBe(true); + expect(isTestFile('src/foo.spec.ts')).toBe(true); + expect(isTestFile('com/example/FooTest.java')).toBe(true); + expect(isTestFile('com/example/FooTestCase.java')).toBe(true); + expect(isTestFile('project/__tests__/foo.ts')).toBe(true); + expect(isTestFile('project/tests/foo.rb')).toBe(true); + }); + + it('does NOT flag production files that merely contain "test" lowercase', () => { + // The fix is capital-led so camelCase boundaries distinguish these. + expect(isTestFile('src/latest/loader.kt')).toBe(false); + expect(isTestFile('lib/manifest.kt')).toBe(false); + expect(isTestFile('okhttp/src/jvmMain/kotlin/okhttp3/internal/connection/RealCall.kt')).toBe(false); + expect(isTestFile('src/contestEntry.ts')).toBe(false); + expect(isTestFile('pkg/greatest.go')).toBe(false); + }); + + it('does NOT flag ordinary production source', () => { + expect(isTestFile('src/flask/app.py')).toBe(false); + expect(isTestFile('src/vs/workbench/api/common/extensionHostMain.ts')).toBe(false); + expect(isTestFile('okhttp/src/commonJvmAndroid/kotlin/okhttp3/OkHttpClient.kt')).toBe(false); + }); +}); diff --git a/run-interactive-test.md b/run-interactive-test.md new file mode 100644 index 00000000..448c9e62 --- /dev/null +++ b/run-interactive-test.md @@ -0,0 +1,131 @@ +# Running the agent-behavior test (how agents actually use codegraph) + +This explains how to measure **how a Claude Code agent uses the codegraph MCP +tools** on a real repo — which tools it calls (does it lead with +`codegraph_explore`?), how many follow-up `Read`/`Grep`s it does, and the token +cost. Use it when changing tool guidance (`server-instructions.ts`, +`instructions-template.ts`, tool descriptions) or retrieval, to verify the +change actually shifts agent behavior. + +Scripts live in `scripts/agent-eval/`. + +## Why two harnesses (read this first) + +| | Interactive (`itrun.sh`) | Headless (`run-agent.sh`) | +|---|---|---| +| Drives | the real TUI via tmux | `claude -p` print mode | +| Subagent it picks | **Explore** (matches real UX) | general-purpose (diverges) | +| Metrics | tool breakdown (from session logs) + `Done(…)` token summary | exact per-tool calls + tokens/cost (stream-json) | +| Cost | Claude Max subscription | API $ (`total_cost_usd`) | + +**Headless `claude -p` does NOT reproduce what users see** — it silently picks +the general-purpose subagent, while interactive sessions delegate to the +read-first **Explore** subagent. So for "what does my session actually do," use +the interactive harness. For a clean per-tool/token breakdown in one shot, use +headless (and ask for the Explore subagent in the prompt if you want that path). + +## Prerequisites + +- **tmux 3.0+** +- A logged-in `claude` CLI (Claude Max or API). +- codegraph configured as an MCP server (`claude mcp list` shows `codegraph`). + The interactive harness uses your global config, so it runs whatever + `codegraph` resolves to — point that at your dev build (`npm link` / the + symlinked global) to test local changes. +- A target repo, cloned and indexed: + ```bash + git clone --depth 1 https://github.com/square/okhttp /tmp/corpus/okhttp + cd /tmp/corpus/okhttp && codegraph init -i + ``` + Good scale spread for a sweep: Alamofire (~100 files), Excalidraw (~600), + OkHttp (~640), VS Code (~10k). + +## Interactive test (the faithful one) + +```bash +scripts/agent-eval/itrun.sh