Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions plugins/copilot/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,17 @@ Ask Copilot things like *"run the sleep cycle"*, *"what did the last sleep
propose?"*, *"adopt the staged sleep proposal"*. Copilot calls the MCP tools:
`sleep_status`, `sleep_dry_run`, `sleep_run`, `sleep_adopt`, `sleep_harvest`.

Each tool takes optional `project`, `backend` (`mock`/`claude`/`codex`), and
`scope` arguments. Default backend is `mock` (no API spend).
Each tool takes optional `project`, `backend` (`mock`/`claude`/`codex`/`copilot`), and
`scope` arguments. Default backend is `mock` (no API spend). The `copilot`
backend drives the GitHub Copilot CLI (`copilot -p ... --output-format json`)
and requires the `copilot` CLI to be installed and authenticated.

For speed, the `copilot` backend runs each call against an isolated
`COPILOT_HOME` with built-in MCP servers and custom instructions disabled, so
your user MCP servers (including this project's own) are not spawned per call
(~5x faster). Override with `SKILLOPT_SLEEP_COPILOT_HOME=<dir>`, pick a model
with `SKILLOPT_SLEEP_COPILOT_MODEL`, or set `SKILLOPT_SLEEP_COPILOT_FULL_ENV=1`
to use your real Copilot environment instead.

## Verify the server directly (no Copilot needed)

Expand Down
4 changes: 2 additions & 2 deletions plugins/copilot/mcp_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@
"type": "object",
"properties": {
"project": {"type": "string", "description": "Project dir to evolve (default: cwd)."},
"backend": {"type": "string", "enum": ["mock", "claude", "codex"],
"description": "mock = no API spend (default); claude/codex = real."},
"backend": {"type": "string", "enum": ["mock", "claude", "codex", "copilot"],
"description": "mock = no API spend (default); claude/codex/copilot = real."},
"scope": {"type": "string", "enum": ["invoked", "all"]},
},
"additionalProperties": False,
Expand Down
98 changes: 98 additions & 0 deletions plugins/copilot/skillopt/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# SkillOpt — GitHub Copilot integration

Give **Copilot** (CLI or VS Code) direct access to the **SkillOpt** research
engine via a tiny **MCP server**. MCP is GitHub's supported way to extend
Copilot, so this works across Copilot CLI, VS Code, and other MCP clients with
the same server.

SkillOpt is **validation-gated, text-space skill optimization**: it reflects on
rollouts, makes bounded edits to a skill, and keeps a change only if it improves
a held-out validation set. This plugin exposes the repo's training and eval
entry points (`scripts/train.py`, `scripts/eval_only.py`) as Copilot tools.

> This is the companion to the **SkillOpt-Sleep** plugin (`../mcp_server.py`,
> `sleep_*` tools). Sleep evolves a *local coding agent* from your past
> sessions; this server drives the *research* training/eval loops on the
> benchmark configs in [`../../../configs`](../../../configs).

## What's here

| File | Purpose |
|---|---|
| `mcp_server.py` | stdlib-only MCP (stdio) server exposing `skillopt_*` tools |
| `mcp-config.example.json` | drop-in MCP server config |
| `copilot-instructions.snippet.md` | paste into `.github/copilot-instructions.md` |

## Install

Requires Python ≥ 3.10. The MCP server itself is pure stdlib, but the tools it
launches need SkillOpt's runtime deps — install the package first:

```bash
pip install -e . # or: pip install -r requirements.txt
```

1. **Register the MCP server.** Add the server to your Copilot MCP config
(Copilot CLI: `~/.copilot/mcp-config.json`; VS Code: your MCP settings).
Use `mcp-config.example.json` as a template — set `SKILLOPT_REPO` to this
repo's path:

```json
{
"mcpServers": {
"skillopt": {
"command": "python3",
"args": ["/abs/path/SkillOpt/plugins/copilot/skillopt/mcp_server.py"],
"env": { "SKILLOPT_REPO": "/abs/path/SkillOpt" }
}
}
}
```

2. **(Optional) Tell Copilot about it.** Append
`copilot-instructions.snippet.md` to your repo's
`.github/copilot-instructions.md` so Copilot reaches for the tools when the
user asks to "optimize a skill" or "train on a benchmark".

## Use

Ask Copilot things like *"what configs can I run?"*, *"optimize the searchqa
skill"*, or *"evaluate this skill on the dataset"*. Copilot calls the MCP tools:
`skillopt_list_configs`, `skillopt_train`, `skillopt_eval`.

| Tool | Required args | Notes |
|---|---|---|
| `skillopt_list_configs` | — | Lists `configs/**/*.yaml` you can pass as `config`. |
| `skillopt_train` | `config` | Runs a reflective optimization loop. Long-running; spends budget. |
| `skillopt_eval` | `config`, `skill` | Evaluates one skill markdown file; no training. |

Common optional args (both train and eval): `env`, `backend`,
`optimizer_model`, `target_model`, `out_root`, `cfg_options` (space-separated
`KEY=VALUE` YAML overrides), and `extra_args` (raw passthrough flags for the
underlying script). `skillopt_train` also accepts `num_epochs`, `batch_size`,
`seed`, and `use_gate`.

Runs can be very long. The server's subprocess timeout defaults to 6 hours;
override it with the `SKILLOPT_RUN_TIMEOUT` environment variable (seconds).

## Verify the server directly (no Copilot needed)

```bash
printf '%s\n' \
'{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' \
'{"jsonrpc":"2.0","id":2,"method":"tools/list"}' \
'{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"skillopt_list_configs","arguments":{}}}' \
| SKILLOPT_REPO="$(pwd)" python3 plugins/copilot/skillopt/mcp_server.py
```

You should see the server info, the three `skillopt_*` tools, and the list of
benchmark configs.

## Notes / status

- MCP is the stable, official Copilot extension surface, so this is portable
across Copilot CLI and IDE from one server.
- `skillopt_list_configs` is filesystem-only and safe to call anytime;
`skillopt_train` / `skillopt_eval` shell out to the repo scripts and require
the SkillOpt runtime deps (and, for real backends, model credentials — see
[`../../../.env.example`](../../../.env.example)).
33 changes: 33 additions & 0 deletions plugins/copilot/skillopt/copilot-instructions.snippet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
<!--
Copy this block into your repo's .github/copilot-instructions.md so Copilot
knows the SkillOpt research-engine tools exist. (Copilot reads
copilot-instructions.md automatically as ambient guidance.)
-->

## SkillOpt (research skill-optimization engine)

This repo exposes the core **SkillOpt** training/eval engine via an MCP server
(`skillopt`). SkillOpt is validation-gated, text-space skill optimization: it
reflects on rollouts, makes bounded edits to a skill, and keeps a change only
if it improves a held-out validation set.

When the user asks to "optimize a skill", "train on <benchmark>", "run
SkillOpt", "evaluate this skill", or "what configs can I run", use the MCP
tools:

- `skillopt_list_configs` — list the benchmark YAML configs you can pass as `config`
- `skillopt_train` — run a reflective skill-optimization loop on a config (long-running; spends API/compute budget)
- `skillopt_eval` — evaluate a single skill markdown file on a dataset (no training)

Guidance:
- Always run `skillopt_list_configs` first if you don't already know a valid `config` path.
- `skillopt_train` and `skillopt_eval` are long-running and consume the user's
model backend/budget — confirm the `config`, `backend`, and model choices
with the user before launching, and surface the held-out gate result when the
run finishes.
- For one-off YAML overrides use `cfg_options` (e.g. `seed=123 batch_size=40`);
for any other underlying flag use `extra_args`.

This is distinct from the **SkillOpt-Sleep** MCP server (`skillopt-sleep`,
`sleep_*` tools), which evolves a local coding agent from past sessions rather
than running the research benchmarks.
11 changes: 11 additions & 0 deletions plugins/copilot/skillopt/mcp-config.example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"mcpServers": {
"skillopt": {
"command": "python3",
"args": ["plugins/copilot/skillopt/mcp_server.py"],
"env": {
"SKILLOPT_REPO": "${workspaceFolder}"
}
}
}
}
Loading