English | 中文說明
Delegate codebase analysis from Claude to Kimi Code (kimi-for-coding, 256K) — cut token cost ~90%.
| Task | Claude only | Claude + kimi-code-mcp | Savings |
|---|---|---|---|
| Analyze 200-file monorepo | ~250K tok | ~25K tok | 90% |
| Summarize 50-page RFC PDF | ~60K tok | ~6K tok | 90% |
| Cross-reference 100 commits | ~80K tok | ~8K tok | 90% |
*Illustrative — actual savings depend on task.
# 1. Install Kimi CLI and log in
curl -L code.kimi.com/install.sh | bash
kimi login
# 2. Install via npm
npm install -g kimi-mcp-serverAdd to .mcp.json (project-level or ~/.claude/mcp.json for global):
{
"mcpServers": {
"kimi-code": {
"command": "npx",
"args": ["-y", "kimi-mcp-server"]
}
}
}Run /mcp in Claude Code to verify — you should see kimi-code with 8 tools.
Tip
No CLI? Use API mode. kimi_verify (and kimi_query as a fallback) call the Kimi Code API directly — no Python CLI install required. Just provide an API key via $KIMICODE_API_KEY or ~/.kimi/config.toml (see Kimi Code API Setup). This makes Kimi usable as an external third-party verification agent that any Claude Code session can call to cross-check its own work.
- Claude calls the
kimi_analyzetool when a task needs bulk codebase reading. - MCP routes the request to Kimi Code (
kimi-for-coding, 256K context) — Kimi reads the entire codebase in one pass. - The result is piped back as a structured response — Claude acts on it with precise, targeted edits.
┌──────────────┐ stdio/MCP ┌──────────────┐ subprocess ┌──────────────┐
│ Claude Code │ ◄──────────► │ kimi-code-mcp│ ────────────► │ Kimi CLI │
│ (conductor) │ │ (MCP server) │ │ (256K ctx) │
└──────────────┘ └──────────────┘ └──────────────┘
MCP server that connects Kimi Code (model kimi-for-coding, 256K context, auto-upgraded) with Claude Code — letting Claude orchestrate while Kimi handles the heavy reading.
Kimi Code sits on the efficiency frontier — near-Claude intelligence at 10x lower cost. kimi.com/code
Tip
Stop paying Claude to read files. Kimi Code delivers frontier-class code intelligence at a fraction of the cost (see chart above). Delegate bulk codebase scanning to Kimi (256K context, near-zero cost) and let Claude focus on what it does best — reasoning, decisions, and precise code edits. One kimi_analyze call can replace 50+ file reads.
Kimi Code is an AI code agent by Moonshot AI. The model ID kimi-for-coding (1T MoE, 256K context) automatically receives backend upgrades — no version pinning required. It works across Terminal, IDE, and CLI — writing, debugging, refactoring, and analyzing code autonomously.
Key specs:
- 256K token context — reads entire codebases in one pass
- Parallel agent spawning — handles concurrent tasks
- Shell, file, and web access — full developer toolchain
- Install:
curl -L code.kimi.com/install.sh | bash
Warning
Kimi Code membership required. This MCP server calls the Kimi CLI under the hood, which requires an active Kimi Code plan. Make sure you have a valid subscription and have run kimi login before use. See kimi.com/code for the latest pricing tiers and quotas.
If you prefer to build locally instead of using the npm package:
git clone https://github.com/howardpen9/kimi-code-mcp.git
cd kimi-code-mcp && npm install && npm run build{
"mcpServers": {
"kimi-code": {
"command": "node",
"args": ["/absolute/path/to/kimi-code-mcp/dist/index.js"]
}
}
}Note
Kimi Code API and Moonshot API are separate providers — their API keys are not interchangeable.
There are two ways to configure the Kimi Code API for the CLI:
In the Kimi Code CLI shell, run:
kimiThen use the /login (or /setup) command:
/login
- Select Kimi Code as the platform
- Your browser opens for OAuth authorization
- Config is saved automatically to
~/.kimi/config.toml
- Visit code.kimi.com
- Sign in → Settings → API Keys
- Create a new key (starts with
sk-, shown only once)
nano ~/.kimi/config.tomlAdd:
[providers.kimi-code]
type = "kimi"
base_url = "https://api.kimi.com/coding/v1"
api_key = "sk-your-api-key"
[models.kimi-for-coding]
provider = "kimi-code"
model = "kimi-for-coding"
max_context_size = 262144
capabilities = ["thinking"]
[defaults]
model = "kimi-for-coding"# Add to ~/.zshrc (macOS) or ~/.bashrc (Linux)
export KIMICODE_API_KEY="sk-your-api-key"Then reference it in config.toml:
[providers.kimi-code]
type = "kimi"
base_url = "https://api.kimi.com/coding/v1"
api_key = "${KIMICODE_API_KEY}"You can configure both Kimi Code and Moonshot side by side:
[providers.kimi-code]
type = "kimi"
base_url = "https://api.kimi.com/coding/v1"
api_key = "${KIMICODE_API_KEY}"
[providers.moonshot-cn]
type = "kimi"
base_url = "https://api.moonshot.cn/v1"
api_key = "${MOONSHOT_API_KEY}"
[models.kimi-for-coding]
provider = "kimi-code"
model = "kimi-for-coding"
max_context_size = 262144
capabilities = ["thinking"]
[models.kimi-k2]
provider = "moonshot-cn"
model = "kimi-k2-0905-preview"
max_context_size = 256000
capabilities = ["thinking"]
[defaults]
model = "kimi-for-coding"Switch models at any time with /model or /model kimi-k2 in the CLI.
| Feature | Kimi Code | Moonshot |
|---|---|---|
| Focus | Optimized for coding | General-purpose chat |
| Endpoint | api.kimi.com/coding/v1 |
api.moonshot.cn/v1 |
| API Key | Separate — apply at code.kimi.com | Separate |
| SearchWeb / FetchURL | Built-in | Not available |
| Context | 262K | 256K |
Just tell Claude what you need. It will delegate to Kimi automatically:
| Prompt | What happens |
|---|---|
| "Analyze this codebase's architecture" | Kimi reads all files (256K ctx), Claude acts on the report |
| "Scan for security vulnerabilities, then review Kimi's findings" | Kimi audits, Claude cross-examines — AI pair review |
| "Map all dependencies of the auth module, then plan the refactoring" | Kimi builds the dependency graph, Claude plans the changes |
| "Review the recent changes for regressions and edge cases" | Kimi reviews full context (not just the diff), Claude synthesizes |
| "Resume the last Kimi session and ask about the API design" | Kimi retains 256K tokens of context across sessions |
Claude Code is powerful but expensive. Every file it reads costs tokens. Meanwhile, many tasks — pre-reviewing large codebases, scanning for patterns, generating audit reports — are high-certainty work that doesn't need Claude's full reasoning power.
Important
The cost equation: Claude reads 50 files to understand your architecture = expensive. Kimi reads 50 files via kimi_analyze = near-zero cost. Claude then acts on Kimi's structured report = minimal tokens. Total savings: 60-80% fewer Claude tokens on analysis-heavy tasks.
┌─────────────────────────────┐
│ You (the developer) │
└──────────┬──────────────────┘
│ prompt
▼
┌─────────────────────────────┐
│ Claude Code (conductor) │
│ - orchestrates workflow │
│ - makes decisions │
│ - writes & edits code │
└──────┬──────────────┬───────┘
precise │ │ delegate
edits │ │ bulk reading
(tokens) │ │ (FREE)
▼ ▼
┌──────────┐ ┌──────────────┐
│ your │ │ Kimi Code │
│ codebase │ │ - 256K ctx │
└──────────┘ │ - reads all │
│ - reports │
└──────────────┘
- Claude receives your task → decides it needs codebase understanding
- Claude calls
kimi_analyzevia MCP → Kimi reads the entire codebase (256K context, near-zero cost) - Kimi returns a structured analysis
- Claude acts on the analysis with precise, targeted edits
Result: Claude only spends tokens on decision-making and code writing, not on reading files.
kimi-for-coding is a 1T MoE model designed for deep code comprehension. This enables AI pair review:
- Kimi pre-reviews — 256K context means it sees the entire codebase at once: security issues, anti-patterns, dead code, architectural problems
- Claude cross-examines — reviews Kimi's findings, challenges questionable items, adds its own insights
- Two perspectives — different models catch different things. What one misses, the other finds
Beyond ad-hoc analysis, you can use Kimi as a dedicated reviewer in your workflow:
┌──────────────┐ diff ┌──────────────┐ structured ┌──────────────┐
│ Your PR │ ────────► │ Kimi Code │ findings │ Claude Code │
│ (changes) │ │ (reviewer) │ ────────────►│ (decision) │
└──────────────┘ └──────────────┘ └──────────────┘
| When | What | Why |
|---|---|---|
| Before merging | Kimi scans diff + affected modules | Catch regressions early |
| Weekly | Full codebase sweep | Accumulated tech debt |
| Pre-release | Security-focused audit | Ship with confidence |
Each review session can be resumed (kimi_resume) — Kimi retains up to 256K tokens of context from previous sessions, building understanding over time.
| Review Type | Why Kimi Excels |
|---|---|
| Security audit | 256K context sees full attack surface, not just isolated files |
| Dead code detection | Can trace imports/exports across entire codebase |
| API consistency | Compares patterns across all endpoints simultaneously |
| Dependency analysis | Maps full dependency graph in one pass |
| Architecture review | Sees the forest and the trees at the same time |
| Tool | Description | Timeout |
|---|---|---|
kimi_analyze |
Deep codebase analysis (architecture, audit, refactoring) | 10 min |
kimi_query |
Quick programming questions, no codebase context (API fallback when CLI absent) | 2 min |
kimi_verify |
API mode — independent third-party verification of code/diffs/claims; no CLI required, context-driven | 5 min |
kimi_list_sessions |
List existing Kimi sessions with metadata | instant |
kimi_resume |
Resume a previous session (up to 256K token context) | 10 min |
kimi_status |
Check CLI installation, version, and authentication status | instant |
kimi_cache_status |
View session cache statistics and performance metrics | instant |
kimi_cache_invalidate |
Manually invalidate cached sessions (by dir or all) | instant |
kimi_analyze and kimi_resume support these parameters to control output size:
| Parameter | Values | Default | Effect |
|---|---|---|---|
detail_level |
summary / normal / detailed |
normal |
Controls prompt-side verbosity instructions |
max_output_tokens |
number | 15000 |
Hard ceiling — output truncated at clean boundary if exceeded |
include_thinking |
boolean | false |
Include Kimi's internal reasoning chain (10-30K extra tokens) |
kimi_query also supports max_output_tokens and include_thinking.
Note
The savings come from compression ratio, not from free reading. Kimi's subscription cost still applies, but the key benefit is reducing expensive Claude Code token consumption.
Without kimi-code-mcp With kimi-code-mcp (normal)
───────────────────── ───────────────────────────
Raw source: 50 files × ~4K = 200K Kimi reads (subscription cost)
Claude reads: 200K tokens 5-15K token report
Claude token cost: $$$ $
Compression ratio by detail_level:
| Level | Compression | Output Size | Equivalent Source | Best For |
|---|---|---|---|---|
summary |
40-100x | ~2-5K tokens | ~8-20K chars / ~200-500 lines of code | Quick orientation, file inventory |
normal |
15-40x | ~5-15K tokens | ~20-60K chars / ~500-1500 lines of code | Architecture review, dependency mapping |
detailed |
5-15x | ~15-40K tokens | ~60-160K chars / ~1500-4000 lines of code | Security audit with code snippets |
When savings happen:
- Large codebases (50+ files) — architecture understanding, cross-file scanning
- Security audits, dead code detection, API consistency checks
- Pre-review before targeted edits (scan first → edit specific files)
When to skip and let Claude read directly:
- Small codebases (<10 files) — direct reading is faster
- Single-file modifications — Claude's built-in file reading is sufficient
- When you need every line of code —
detailedoutput approaches raw reading cost
Under the hood:
- Claude Code calls an MCP tool (e.g.,
kimi_analyze) - This server spawns the
kimiCLI with the prompt and codebase path - Kimi autonomously reads files, analyzes the code (up to 256K tokens)
- The result is parsed from Kimi's JSON output and returned to Claude Code
- Claude acts on the structured results — edits, plans, or further analysis
The MCP server calls the Kimi CLI in non-interactive (print) mode:
kimi --work-dir <path> --print -p "<prompt>"| Flag | Purpose |
|---|---|
--print |
Non-interactive mode — outputs result and exits (required for subprocess use) |
-p / --prompt |
Pass prompt directly (bypasses interactive shell) |
--work-dir / -w |
Set codebase root directory |
-S <id> |
Resume a specific session by ID |
--no-thinking |
Disable thinking mode |
Note
There is no kimi analyze subcommand. The MCP tool is named kimi_analyze, but the underlying CLI uses the flags above. Use this syntax to call Kimi directly for debugging or scripting.
For development (auto-recompile on changes):
{
"mcpServers": {
"kimi-code": {
"command": "npx",
"args": ["tsx", "/absolute/path/to/kimi-code-mcp/src/index.ts"]
}
}
}Published as kimi-mcp-server on npm.
npx kimi-mcp-server # run directly
npm install -g kimi-mcp-server # install globallysrc/
├── index.ts # MCP server setup, tool definitions
├── kimi-runner.ts # Spawns kimi CLI, parses output, handles timeouts
└── session-reader.ts # Reads Kimi session metadata from ~/.kimi/
See CONTRIBUTING.md for guidelines.
See CHANGELOG.md for version history.
MIT