diff --git a/.cursor/rules/codegraph.mdc b/.cursor/rules/codegraph.mdc
index dac86b3a..4b6073b4 100644
--- a/.cursor/rules/codegraph.mdc
+++ b/.cursor/rules/codegraph.mdc
@@ -13,22 +13,22 @@ Use codegraph for **structural** questions — what calls what, what would break
| Question | Tool |
|---|---|
+| "How does X work? / trace X / explain a system / architecture" | `codegraph_explore` (seed with symbol names) |
| "Where is X defined?" / "Find symbol named X" | `codegraph_search` |
| "What calls function Y?" | `codegraph_callers` |
| "What does Y call?" | `codegraph_callees` |
| "What would break if I changed Z?" | `codegraph_impact` |
| "Show me Y's signature / source / docstring" | `codegraph_node` |
| "Give me focused context for a task/area" | `codegraph_context` |
-| "Survey an unfamiliar module/topic" | `codegraph_explore` |
| "What files exist under path/" | `codegraph_files` |
| "Is the index healthy?" | `codegraph_status` |
### Rules of thumb
+- **`codegraph_explore` is the workhorse for understanding questions** ("how does X work", "trace…", "explain the Y system"). Feed it the key symbol/file names and read its output (line-numbered source from many files in one call). If the question names nothing concrete, do one quick `codegraph_search`/`codegraph_context` to surface the names, then explore with them. Fill gaps with `codegraph_node`/Read — don't grep-and-read your way through; that's the loop explore replaces.
+- **Delegating exploration to a subagent?** Tell it to call `codegraph_explore` first and trust the result. A generic "explore"-style agent defaults to grep+Read and treats codegraph as just a search index, throwing away the token savings.
- **Trust codegraph results.** They come from a full AST parse. Do NOT re-verify them with grep — that's slower, less accurate, and wastes context.
- **Don't grep first** when looking up a symbol by name. `codegraph_search` is faster and returns kind + location + signature in one call.
-- **Don't chain `codegraph_search` + `codegraph_node`** when you just want context — `codegraph_context` is one call.
-- **`codegraph_explore` is the heavy hitter** for unfamiliar areas — it returns full source from all relevant files in one call, but is token-heavy. If your harness supports parallel subagents (e.g., Claude Code's Task tool), spawn one for explore-class questions to keep main session context clean.
- **Index lag**: the file watcher debounces ~500ms behind writes; don't re-query immediately after editing a file in the same turn.
### If `.codegraph/` doesn't exist
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 7c32c152..57f19200 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -44,6 +44,17 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
VS Code ~12%. Agent-trust floor still holds — the Relationships section,
scored cluster selection, and structured-source output are all retained.
Thanks to [@essopsp](https://github.com/essopsp) for the repro.
+- **MCP / tool guidance**: the tool descriptions and installed instructions
+ now steer agents to treat `codegraph_explore` as the workhorse for
+ understanding/architecture/"how does X work" questions — seed it with the
+ key symbol names (a quick `codegraph_search`/`codegraph_context` first if
+ the question names nothing concrete) and read its output, rather than
+ searching and then Reading each file. Diagnosed from a benchmark run where
+ Claude Code's Explore agent used `codegraph_search` + Read + grep (37 tool
+ calls, ~90k tokens) and never called `codegraph_explore`, vs a
+ general-purpose agent that led with explore (13 calls, ~55k tokens) for the
+ same VS Code question. Updated in lockstep across `server-instructions.ts`,
+ `instructions-template.ts`, and `.cursor/rules/codegraph.mdc`.
### Fixed
- **MCP**: source-omission markers in `codegraph_explore` and
@@ -51,6 +62,15 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
`... (trimmed) ...`, `... (truncated) ...`) instead of C-style `//`
comments, which were misleading inside Python, Ruby, and other non-C
fenced source blocks.
+- **Search/explore ranking**: test-file detection now recognizes Kotlin
+ (`*Test.kt`, `jvmTest/`/`commonTest/`/`androidTest/` source sets), Swift
+ (`*Tests.swift`), and other camelCase test conventions, so test code is
+ properly deprioritized in `codegraph_explore` / `codegraph_context`
+ results. Previously only Java/JS/Python conventions were known, which let
+ test files dominate exploration of Kotlin/Swift codebases (e.g. an OkHttp
+ "trace a request" query returned 8/9 test files; now it surfaces
+ `Call.kt`, `OkHttpClient.kt`, `Request.kt`, `Response.kt`). Capital-led
+ matching keeps production files like `latest.kt` / `manifest.kt` unflagged.
## [0.7.10] - 2026-05-19
diff --git a/README.md b/README.md
index 910d7801..49cf8d54 100644
--- a/README.md
+++ b/README.md
@@ -492,6 +492,16 @@ The `.codegraph/config.json` file controls indexing:
**Missing symbols** — The MCP server auto-syncs on save (wait a couple seconds). Run `codegraph sync` manually if needed. Check that the file's language is supported and isn't excluded by config patterns.
+## Star History
+
+
+
+
+
+
+
+
+
## License
MIT
diff --git a/__tests__/installer-targets.test.ts b/__tests__/installer-targets.test.ts
index 89ba6290..2b494468 100644
--- a/__tests__/installer-targets.test.ts
+++ b/__tests__/installer-targets.test.ts
@@ -41,6 +41,16 @@ function setHome(dir: string): { restore: () => void } {
};
}
+function setPlatform(platform: NodeJS.Platform): { restore: () => void } {
+ const previous = Object.getOwnPropertyDescriptor(process, 'platform');
+ Object.defineProperty(process, 'platform', { value: platform });
+ return {
+ restore() {
+ if (previous) Object.defineProperty(process, 'platform', previous);
+ },
+ };
+}
+
describe('Installer targets — contract', () => {
let tmpHome: string;
let tmpCwd: string;
@@ -219,6 +229,33 @@ describe('Installer targets — partial-state idempotency', () => {
expect(result.files[0].action).toBe('created');
});
+ it('opencode: Windows global install uses ~/.config/opencode instead of APPDATA', () => {
+ const platform = setPlatform('win32');
+ const appData = path.join(tmpHome, 'AppData', 'Roaming');
+ const prevAppData = process.env.APPDATA;
+ const prevXdgConfigHome = process.env.XDG_CONFIG_HOME;
+ process.env.APPDATA = appData;
+ delete process.env.XDG_CONFIG_HOME;
+
+ try {
+ const opencode = getTarget('opencode')!;
+ const result = opencode.install('global', { autoAllow: true });
+ const configFile = path.join(tmpHome, '.config', 'opencode', 'opencode.jsonc');
+ const instructionsFile = path.join(tmpHome, '.config', 'opencode', 'AGENTS.md');
+ const appDataConfigFile = path.join(appData, 'opencode', 'opencode.jsonc');
+
+ expect(result.files.map((f) => f.path)).toContain(configFile);
+ expect(result.files.map((f) => f.path)).toContain(instructionsFile);
+ expect(fs.existsSync(configFile)).toBe(true);
+ expect(fs.existsSync(instructionsFile)).toBe(true);
+ expect(fs.existsSync(appDataConfigFile)).toBe(false);
+ } finally {
+ platform.restore();
+ if (prevAppData === undefined) delete process.env.APPDATA; else process.env.APPDATA = prevAppData;
+ if (prevXdgConfigHome === undefined) delete process.env.XDG_CONFIG_HOME; else process.env.XDG_CONFIG_HOME = prevXdgConfigHome;
+ }
+ });
+
it('opencode: preserves line and block comments through install + idempotent re-run', () => {
const opencode = getTarget('opencode')!;
const dir = path.join(tmpHome, '.config', 'opencode');
@@ -298,9 +335,9 @@ describe('Installer targets — partial-state idempotency', () => {
const opencode = getTarget('opencode')!;
const result = opencode.install('local', { autoAllow: true });
const paths = result.files.map((f) => f.path);
- // macOS realpath shenanigans (/var vs /private/var) — suffix match.
- expect(paths.some((p) => p.endsWith('/opencode.jsonc'))).toBe(true);
- expect(paths.some((p) => p.endsWith('/AGENTS.md'))).toBe(true);
+ // Avoid absolute-path quirks (/var vs /private/var, or \ vs /).
+ expect(paths.some((p) => path.basename(p) === 'opencode.jsonc')).toBe(true);
+ expect(paths.some((p) => path.basename(p) === 'AGENTS.md')).toBe(true);
});
it('opencode: uninstall removes only mcp.codegraph, preserves comments and siblings', () => {
diff --git a/__tests__/is-test-file.test.ts b/__tests__/is-test-file.test.ts
new file mode 100644
index 00000000..e3fc6d03
--- /dev/null
+++ b/__tests__/is-test-file.test.ts
@@ -0,0 +1,53 @@
+/**
+ * isTestFile heuristic — test-file detection used to deprioritize test code in
+ * search/explore ranking.
+ *
+ * Regression coverage for the cold-query fix: the heuristic previously only
+ * knew Java/JS/Python conventions, so Kotlin (`*Test.kt`, `jvmTest/`), Swift
+ * (`*Tests.swift`), and camelCase test source-set dirs slipped through — which
+ * let OkHttp's tests flood `codegraph_explore` results on a plain-language
+ * query. The false-positive guards matter just as much: `latest.kt` /
+ * `manifest.kt` / a `RealCall.kt` production file must NOT be flagged.
+ */
+import { describe, it, expect } from 'vitest';
+import { isTestFile } from '../src/search/query-utils';
+
+describe('isTestFile', () => {
+ it('flags Kotlin test files and source sets', () => {
+ expect(isTestFile('okhttp/src/jvmTest/kotlin/okhttp3/CallTest.kt')).toBe(true);
+ expect(isTestFile('okhttp/src/commonTest/kotlin/okhttp3/CompressionInterceptorTest.kt')).toBe(true);
+ expect(isTestFile('app/src/androidTest/java/com/example/FooTest.kt')).toBe(true);
+ expect(isTestFile('module/src/integrationTest/kotlin/BarSpec.kt')).toBe(true);
+ });
+
+ it('flags Swift test files', () => {
+ expect(isTestFile('Tests/SessionTests.swift')).toBe(true);
+ expect(isTestFile('Sources/FooTest.swift')).toBe(true);
+ });
+
+ it('still flags the previously-supported conventions', () => {
+ expect(isTestFile('foo/test_bar.py')).toBe(true);
+ expect(isTestFile('pkg/bar_test.go')).toBe(true);
+ expect(isTestFile('src/foo.test.ts')).toBe(true);
+ expect(isTestFile('src/foo.spec.ts')).toBe(true);
+ expect(isTestFile('com/example/FooTest.java')).toBe(true);
+ expect(isTestFile('com/example/FooTestCase.java')).toBe(true);
+ expect(isTestFile('project/__tests__/foo.ts')).toBe(true);
+ expect(isTestFile('project/tests/foo.rb')).toBe(true);
+ });
+
+ it('does NOT flag production files that merely contain "test" lowercase', () => {
+ // The fix is capital-led so camelCase boundaries distinguish these.
+ expect(isTestFile('src/latest/loader.kt')).toBe(false);
+ expect(isTestFile('lib/manifest.kt')).toBe(false);
+ expect(isTestFile('okhttp/src/jvmMain/kotlin/okhttp3/internal/connection/RealCall.kt')).toBe(false);
+ expect(isTestFile('src/contestEntry.ts')).toBe(false);
+ expect(isTestFile('pkg/greatest.go')).toBe(false);
+ });
+
+ it('does NOT flag ordinary production source', () => {
+ expect(isTestFile('src/flask/app.py')).toBe(false);
+ expect(isTestFile('src/vs/workbench/api/common/extensionHostMain.ts')).toBe(false);
+ expect(isTestFile('okhttp/src/commonJvmAndroid/kotlin/okhttp3/OkHttpClient.kt')).toBe(false);
+ });
+});
diff --git a/run-interactive-test.md b/run-interactive-test.md
new file mode 100644
index 00000000..448c9e62
--- /dev/null
+++ b/run-interactive-test.md
@@ -0,0 +1,131 @@
+# Running the agent-behavior test (how agents actually use codegraph)
+
+This explains how to measure **how a Claude Code agent uses the codegraph MCP
+tools** on a real repo — which tools it calls (does it lead with
+`codegraph_explore`?), how many follow-up `Read`/`Grep`s it does, and the token
+cost. Use it when changing tool guidance (`server-instructions.ts`,
+`instructions-template.ts`, tool descriptions) or retrieval, to verify the
+change actually shifts agent behavior.
+
+Scripts live in `scripts/agent-eval/`.
+
+## Why two harnesses (read this first)
+
+| | Interactive (`itrun.sh`) | Headless (`run-agent.sh`) |
+|---|---|---|
+| Drives | the real TUI via tmux | `claude -p` print mode |
+| Subagent it picks | **Explore** (matches real UX) | general-purpose (diverges) |
+| Metrics | tool breakdown (from session logs) + `Done(…)` token summary | exact per-tool calls + tokens/cost (stream-json) |
+| Cost | Claude Max subscription | API $ (`total_cost_usd`) |
+
+**Headless `claude -p` does NOT reproduce what users see** — it silently picks
+the general-purpose subagent, while interactive sessions delegate to the
+read-first **Explore** subagent. So for "what does my session actually do," use
+the interactive harness. For a clean per-tool/token breakdown in one shot, use
+headless (and ask for the Explore subagent in the prompt if you want that path).
+
+## Prerequisites
+
+- **tmux 3.0+**
+- A logged-in `claude` CLI (Claude Max or API).
+- codegraph configured as an MCP server (`claude mcp list` shows `codegraph`).
+ The interactive harness uses your global config, so it runs whatever
+ `codegraph` resolves to — point that at your dev build (`npm link` / the
+ symlinked global) to test local changes.
+- A target repo, cloned and indexed:
+ ```bash
+ git clone --depth 1 https://github.com/square/okhttp /tmp/corpus/okhttp
+ cd /tmp/corpus/okhttp && codegraph init -i
+ ```
+ Good scale spread for a sweep: Alamofire (~100 files), Excalidraw (~600),
+ OkHttp (~640), VS Code (~10k).
+
+## Interactive test (the faithful one)
+
+```bash
+scripts/agent-eval/itrun.sh