Add Copilot as a SkillOpt-Sleep model backend (CopilotCliBackend) + research-engine MCP plugin by Dongbumlee · Pull Request #50 · microsoft/SkillOpt

Dongbumlee · 2026-06-12T15:35:34Z

What this adds (and how it differs from existing Copilot support)

The repo already ships a Copilot MCP client plugin (plugins/copilot/mcp_server.py) that lets the Copilot CLI trigger a SkillOpt-Sleep cycle. But until now the cycle's actual LLM work could only run on the mock, claude, or codex backends — there was no CopilotCliBackend, so Copilot could orchestrate a sleep cycle but not run one on itself.

This PR adds Copilot as a model backend (backend="copilot"), closing that loop:

Layer	Before this PR	After
Copilot as orchestrator (MCP client plugin)	✅ exists	✅ unchanged
Copilot as model backend (runs attempt / judge / reflect)	❌ missing (`mock\|claude\|codex` only)	✅ `backend="copilot"`

It is also the only Sleep CLI backend that is explicitly Windows-safe (UTF-8 decoding + cross-platform tool shims), where claude/codex are Unix-oriented.

Summary

Adds first-class GitHub Copilot support to SkillOpt, in two independent pieces:

CopilotCliBackend for SkillOpt-Sleep — lets the self-evolution engine run its attempt / judge / reflect loop on the GitHub Copilot CLI, alongside the existing Claude and Codex CLI backends.
A SkillOpt research-engine MCP server plugin (plugins/copilot/skillopt/) — exposes skillopt_list_configs / skillopt_train / skillopt_eval to Copilot. This is a new, separate plugin from the existing SkillOpt-Sleep MCP server (plugins/copilot/mcp_server.py): it drives the research training/eval scripts rather than the Sleep cycle.

Why

SkillOpt-Sleep already supported claude and codex CLI backends, and a Copilot MCP client plugin already let Copilot trigger a cycle — but Copilot itself could not be the backend that runs the cycle. This extends the same CliBackend contract to the GitHub Copilot CLI so a Copilot user can drive validation-gated skill optimization end-to-end on Copilot, without switching tools.

What changed

Copilot backend (`skillopt_sleep/backend.py`)

New resolve_copilot_path() + CopilotCliBackend, registered in get_backend() with aliases copilot, github_copilot, copilot_cli, gh_copilot. (Upstream backend.py had no Copilot backend; --backend choices were mock | claude | codex.)
Invocation: copilot -p <prompt> --output-format json --stream off --no-color --log-level none --allow-all-tools -C <tempdir>, parsing assistant.message events from the JSONL stream. (-s/--silent returns empty stdout on Windows, so JSONL parsing is required.)
Cross-platform attempt_with_tools: honest tool-call detection mirroring the Claude/Codex backends — writes per-tool executable shims into the work dir and detects real invocations from a calllog (not self-reported markers). Shims are cross-platform: a .cmd batch shim on Windows and a chmod'd bash shim on POSIX, with an OS-specific tool hint.
Windows UTF-8 fix: force encoding="utf-8", errors="replace" on the subprocess — text=True decodes as cp1252 on Windows and crashes on Copilot's UTF-8 output (byte 0x9d).
Startup optimization (~4.9×, 36s → 7.4s per call): runs each Copilot invocation against an isolated COPILOT_HOME (no user MCP servers spawned — avoids recursively launching SkillOpt's own MCP servers), plus --disable-builtin-mcps and --no-custom-instructions. Auth is unaffected (stored in the OS credential store).
Escape hatches via env: SKILLOPT_SLEEP_COPILOT_HOME, SKILLOPT_SLEEP_COPILOT_MODEL, SKILLOPT_SLEEP_COPILOT_FULL_ENV=1.
copilot added to --backend choices in __main__.py and experiments/run_experiment.py; the config comment and the existing Sleep MCP server's backend enum are updated to match.

Research-engine MCP plugin (`plugins/copilot/skillopt/`)

New, stdlib-only mcp_server.py driving scripts/train.py / scripts/eval_only.py, plus mcp-config.example.json, an instructions snippet, and a README — same structure as the existing plugins/copilot/ SkillOpt-Sleep server, but for the research loops. (No research-engine MCP plugin existed upstream.)

Tests (`tests/test_sleep_engine.py`)

TestCopilotBackend (7 tests, no real CLI required):
- _parse_jsonl_response — multi-message concat, junk-line skipping, empty / non-assistant events
- get_backend alias resolution
- isolated-home / full-env / home-override behavior
- attempt_with_tools honest detection via an OS-aware stub (runs on both Windows and Linux)

Validation

Full suite: 104 passed, 2 skipped (the 2 skips are pre-existing and data-dependent).
Lint: ruff check reports 0 new findings (pre-existing findings on main are unchanged).
Real-data run via experiments/run_experiment.py with the copilot backend: researcher persona showed a 0.42 → 1.00 lift; the programmer persona's validation gate correctly rejected non-improving edits.
Cross-platform attempt_with_tools verified live: real Copilot call on Windows (tool actually invoked, detected from the calllog) and the POSIX path on Linux/WSL.

Notes

Two separate backend systems exist in the repo: the research skillopt/model/ system (async ModelBackend) and the SkillOpt-Sleep skillopt_sleep/backend.py system (sync CliBackend). This PR's Copilot backend follows the Sleep CliBackend pattern (alongside ClaudeCliBackend / CodexCliBackend); the research-engine MCP plugin simply shells out to the existing scripts/ entry points.

Exposes scripts/train.py and scripts/eval_only.py as Copilot MCP tools (skillopt_list_configs, skillopt_train, skillopt_eval) via a stdlib-only stdio server, mirroring the existing SkillOpt-Sleep plugin layout. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add CopilotCliBackend that drives the GitHub Copilot CLI in non-interactive mode (copilot -p ... --output-format json) and parses the JSONL event stream for assistant.message content. Registered as the 'copilot' backend (with aliases) and wired through the CLI, config, experiment harness, and the Copilot MCP server's backend enum. - Force UTF-8 decoding of CLI output (fixes cp1252 UnicodeDecodeError on Windows when responses contain non-cp1252 bytes). - Minimise per-call startup: isolated COPILOT_HOME with built-in MCPs and custom instructions disabled, so user MCP servers are not spawned per call (~5x faster: 36s -> 7.4s). Override via SKILLOPT_SLEEP_COPILOT_HOME / SKILLOPT_SLEEP_COPILOT_MODEL / SKILLOPT_SLEEP_COPILOT_FULL_ENV. Validated end-to-end on real held-out tasks (researcher persona: 0.42 -> 1.00 lift; gate correctly rejects non-improving edits). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…d home) Covers _parse_jsonl_response (multi-message concat, junk-line skipping, empty/non-assistant events), get_backend alias resolution, and the isolated-COPILOT_HOME / full-env opt-out behavior. Pure logic, no CLI required. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…shims Adds honest tool-call detection for CopilotCliBackend, mirroring the Claude/Codex backends. Writes per-tool executable shims into the work dir and detects real invocations from a calllog (not self-reported markers). The Copilot backend is Windows-validated, so shims are cross-platform: a .cmd batch shim on Windows and a chmod'd bash shim on POSIX, with an OS-specific tool hint. Mirrors _call's flags/env (isolated COPILOT_HOME, --allow-all-tools, MCP/instruction disabling) and the UTF-8 subprocess fix. Adds test_attempt_with_tools_honest_detection: a CI-friendly, OS-aware stub stands in for the CLI, runs the shim, and asserts both JSONL parsing and log-based detection. Validated live on Windows (real Copilot call) and on Linux/WSL (POSIX path). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… azure_openai) The advertised backend choices in scripts/train.py use 'azure_openai', not 'openai'; align the inputSchema description hint accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Dongbumlee and others added 5 commits June 12, 2026 08:21

Dongbumlee marked this pull request as ready for review June 12, 2026 16:19

Dongbumlee changed the title ~~Add GitHub Copilot support: CopilotCliBackend for SkillOpt-Sleep + research-engine MCP plugin~~ Add Copilot as a SkillOpt-Sleep model backend (CopilotCliBackend) + research-engine MCP plugin Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Copilot as a SkillOpt-Sleep model backend (CopilotCliBackend) + research-engine MCP plugin#50

Add Copilot as a SkillOpt-Sleep model backend (CopilotCliBackend) + research-engine MCP plugin#50
Dongbumlee wants to merge 5 commits into
microsoft:mainfrom
Dongbumlee:Dongbumlee/copilot-sleep-backend

Dongbumlee commented Jun 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Dongbumlee commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this adds (and how it differs from existing Copilot support)

Summary

Why

What changed

Copilot backend (skillopt_sleep/backend.py)

Research-engine MCP plugin (plugins/copilot/skillopt/)

Tests (tests/test_sleep_engine.py)

Validation

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Dongbumlee commented Jun 12, 2026 •

edited

Loading

Copilot backend (`skillopt_sleep/backend.py`)

Research-engine MCP plugin (`plugins/copilot/skillopt/`)

Tests (`tests/test_sleep_engine.py`)