Skip to content

Add Copilot as a SkillOpt-Sleep model backend (CopilotCliBackend) + research-engine MCP plugin#50

Open
Dongbumlee wants to merge 5 commits into
microsoft:mainfrom
Dongbumlee:Dongbumlee/copilot-sleep-backend
Open

Add Copilot as a SkillOpt-Sleep model backend (CopilotCliBackend) + research-engine MCP plugin#50
Dongbumlee wants to merge 5 commits into
microsoft:mainfrom
Dongbumlee:Dongbumlee/copilot-sleep-backend

Conversation

@Dongbumlee

@Dongbumlee Dongbumlee commented Jun 12, 2026

Copy link
Copy Markdown

What this adds (and how it differs from existing Copilot support)

The repo already ships a Copilot MCP client plugin (plugins/copilot/mcp_server.py) that lets the Copilot CLI trigger a SkillOpt-Sleep cycle. But until now the cycle's actual LLM work could only run on the mock, claude, or codex backends — there was no CopilotCliBackend, so Copilot could orchestrate a sleep cycle but not run one on itself.

This PR adds Copilot as a model backend (backend="copilot"), closing that loop:

Layer Before this PR After
Copilot as orchestrator (MCP client plugin) ✅ exists ✅ unchanged
Copilot as model backend (runs attempt / judge / reflect) ❌ missing (mock|claude|codex only) backend="copilot"

It is also the only Sleep CLI backend that is explicitly Windows-safe (UTF-8 decoding + cross-platform tool shims), where claude/codex are Unix-oriented.

Summary

Adds first-class GitHub Copilot support to SkillOpt, in two independent pieces:

  1. CopilotCliBackend for SkillOpt-Sleep — lets the self-evolution engine run its attempt / judge / reflect loop on the GitHub Copilot CLI, alongside the existing Claude and Codex CLI backends.
  2. A SkillOpt research-engine MCP server plugin (plugins/copilot/skillopt/) — exposes skillopt_list_configs / skillopt_train / skillopt_eval to Copilot. This is a new, separate plugin from the existing SkillOpt-Sleep MCP server (plugins/copilot/mcp_server.py): it drives the research training/eval scripts rather than the Sleep cycle.

Why

SkillOpt-Sleep already supported claude and codex CLI backends, and a Copilot MCP client plugin already let Copilot trigger a cycle — but Copilot itself could not be the backend that runs the cycle. This extends the same CliBackend contract to the GitHub Copilot CLI so a Copilot user can drive validation-gated skill optimization end-to-end on Copilot, without switching tools.

What changed

Copilot backend (skillopt_sleep/backend.py)

  • New resolve_copilot_path() + CopilotCliBackend, registered in get_backend() with aliases copilot, github_copilot, copilot_cli, gh_copilot. (Upstream backend.py had no Copilot backend; --backend choices were mock | claude | codex.)
  • Invocation: copilot -p <prompt> --output-format json --stream off --no-color --log-level none --allow-all-tools -C <tempdir>, parsing assistant.message events from the JSONL stream. (-s/--silent returns empty stdout on Windows, so JSONL parsing is required.)
  • Cross-platform attempt_with_tools: honest tool-call detection mirroring the Claude/Codex backends — writes per-tool executable shims into the work dir and detects real invocations from a calllog (not self-reported markers). Shims are cross-platform: a .cmd batch shim on Windows and a chmod'd bash shim on POSIX, with an OS-specific tool hint.
  • Windows UTF-8 fix: force encoding="utf-8", errors="replace" on the subprocess — text=True decodes as cp1252 on Windows and crashes on Copilot's UTF-8 output (byte 0x9d).
  • Startup optimization (~4.9×, 36s → 7.4s per call): runs each Copilot invocation against an isolated COPILOT_HOME (no user MCP servers spawned — avoids recursively launching SkillOpt's own MCP servers), plus --disable-builtin-mcps and --no-custom-instructions. Auth is unaffected (stored in the OS credential store).
  • Escape hatches via env: SKILLOPT_SLEEP_COPILOT_HOME, SKILLOPT_SLEEP_COPILOT_MODEL, SKILLOPT_SLEEP_COPILOT_FULL_ENV=1.
  • copilot added to --backend choices in __main__.py and experiments/run_experiment.py; the config comment and the existing Sleep MCP server's backend enum are updated to match.

Research-engine MCP plugin (plugins/copilot/skillopt/)

  • New, stdlib-only mcp_server.py driving scripts/train.py / scripts/eval_only.py, plus mcp-config.example.json, an instructions snippet, and a README — same structure as the existing plugins/copilot/ SkillOpt-Sleep server, but for the research loops. (No research-engine MCP plugin existed upstream.)

Tests (tests/test_sleep_engine.py)

  • TestCopilotBackend (7 tests, no real CLI required):
    • _parse_jsonl_response — multi-message concat, junk-line skipping, empty / non-assistant events
    • get_backend alias resolution
    • isolated-home / full-env / home-override behavior
    • attempt_with_tools honest detection via an OS-aware stub (runs on both Windows and Linux)

Validation

  • Full suite: 104 passed, 2 skipped (the 2 skips are pre-existing and data-dependent).
  • Lint: ruff check reports 0 new findings (pre-existing findings on main are unchanged).
  • Real-data run via experiments/run_experiment.py with the copilot backend: researcher persona showed a 0.42 → 1.00 lift; the programmer persona's validation gate correctly rejected non-improving edits.
  • Cross-platform attempt_with_tools verified live: real Copilot call on Windows (tool actually invoked, detected from the calllog) and the POSIX path on Linux/WSL.

Notes

  • Two separate backend systems exist in the repo: the research skillopt/model/ system (async ModelBackend) and the SkillOpt-Sleep skillopt_sleep/backend.py system (sync CliBackend). This PR's Copilot backend follows the Sleep CliBackend pattern (alongside ClaudeCliBackend / CodexCliBackend); the research-engine MCP plugin simply shells out to the existing scripts/ entry points.

Dongbumlee and others added 5 commits June 12, 2026 08:21
Exposes scripts/train.py and scripts/eval_only.py as Copilot MCP tools
(skillopt_list_configs, skillopt_train, skillopt_eval) via a stdlib-only
stdio server, mirroring the existing SkillOpt-Sleep plugin layout.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add CopilotCliBackend that drives the GitHub Copilot CLI in
non-interactive mode (copilot -p ... --output-format json) and parses the
JSONL event stream for assistant.message content. Registered as the
'copilot' backend (with aliases) and wired through the CLI, config,
experiment harness, and the Copilot MCP server's backend enum.

- Force UTF-8 decoding of CLI output (fixes cp1252 UnicodeDecodeError on
  Windows when responses contain non-cp1252 bytes).
- Minimise per-call startup: isolated COPILOT_HOME with built-in MCPs and
  custom instructions disabled, so user MCP servers are not spawned per
  call (~5x faster: 36s -> 7.4s). Override via SKILLOPT_SLEEP_COPILOT_HOME
  / SKILLOPT_SLEEP_COPILOT_MODEL / SKILLOPT_SLEEP_COPILOT_FULL_ENV.

Validated end-to-end on real held-out tasks (researcher persona:
0.42 -> 1.00 lift; gate correctly rejects non-improving edits).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…d home)

Covers _parse_jsonl_response (multi-message concat, junk-line skipping,
empty/non-assistant events), get_backend alias resolution, and the
isolated-COPILOT_HOME / full-env opt-out behavior. Pure logic, no CLI required.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…shims

Adds honest tool-call detection for CopilotCliBackend, mirroring the
Claude/Codex backends. Writes per-tool executable shims into the work dir
and detects real invocations from a calllog (not self-reported markers).
The Copilot backend is Windows-validated, so shims are cross-platform:
a .cmd batch shim on Windows and a chmod'd bash shim on POSIX, with an
OS-specific tool hint. Mirrors _call's flags/env (isolated COPILOT_HOME,
--allow-all-tools, MCP/instruction disabling) and the UTF-8 subprocess fix.

Adds test_attempt_with_tools_honest_detection: a CI-friendly, OS-aware
stub stands in for the CLI, runs the shim, and asserts both JSONL parsing
and log-based detection. Validated live on Windows (real Copilot call) and
on Linux/WSL (POSIX path).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… azure_openai)

The advertised backend choices in scripts/train.py use 'azure_openai',
not 'openai'; align the inputSchema description hint accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Dongbumlee Dongbumlee marked this pull request as ready for review June 12, 2026 16:19
@Dongbumlee Dongbumlee changed the title Add GitHub Copilot support: CopilotCliBackend for SkillOpt-Sleep + research-engine MCP plugin Add Copilot as a SkillOpt-Sleep model backend (CopilotCliBackend) + research-engine MCP plugin Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant