diff --git a/README.md b/README.md index 28c3da2e..94797774 100644 --- a/README.md +++ b/README.md @@ -77,7 +77,7 @@ harvest session transcripts → mine recurring tasks → replay offline | Platform | Folder | Install | |---|---|---| | **Claude Code** | [`plugins/claude-code`](plugins/claude-code) | `/plugin marketplace add ./plugins/claude-code` → `/sleep` | -| **Codex** | [`plugins/codex`](plugins/codex) | `bash plugins/codex/install.sh` → `/sleep` | +| **Codex** | [`plugins/codex`](plugins/codex) | `bash plugins/codex/install.sh` → `skillopt-sleep` skill | | **Copilot** | [`plugins/copilot`](plugins/copilot) | register `plugins/copilot/mcp_server.py` as an MCP server | **Validated on real models.** On the public diff --git a/docs/sleep/PR_DRAFT.md b/docs/sleep/PR_DRAFT.md index 5845bef9..86b940e2 100644 --- a/docs/sleep/PR_DRAFT.md +++ b/docs/sleep/PR_DRAFT.md @@ -15,7 +15,7 @@ Synthesizes SkillOpt (validation-gated bounded text edits), Claude Dreams Shipped as plugins for **three agents**, one engine + three thin shells: - **Claude Code** — `.claude-plugin` + `/sleep` command + skill + hooks -- **Codex** — `~/.codex/prompts/sleep.md` + `~/.agents/skills` + `install.sh` +- **Codex** — user-level `skillopt-sleep` skill + shared runner + `install.sh` - **Copilot** — a stdlib-only MCP server exposing `sleep_*` tools ## Design notes diff --git a/docs/sleep/plugin_load_test.md b/docs/sleep/plugin_load_test.md index 04bf28e6..c4206463 100644 --- a/docs/sleep/plugin_load_test.md +++ b/docs/sleep/plugin_load_test.md @@ -23,7 +23,7 @@ from scratch for this test. Two forms were used: | Shell | What was run | Result | |---|---|---| | **Claude Code** (`scripts/sleep.sh`) | `harvest`, full `run`, `adopt` | harvest found 2 sessions → 2 tasks; `run` staged a proposal; `adopt` honored the safety contract (no live change when nothing was accepted) | -| **Codex** (`install.sh` + shared runner) | `install.sh` into a temp HOME | placed `~/.codex/prompts/sleep.md` and `~/.agents/skills/skillopt-sleep/SKILL.md` correctly | +| **Codex** (`install.sh` + shared runner) | `install.sh` into a temp HOME | placed the user-level `~/.agents/skills/skillopt-sleep/SKILL.md` skill correctly and moved any legacy custom prompt aside instead of installing one | | **Copilot** (`mcp_server.py`) | `initialize` → `tools/list` → `tools/call sleep_harvest` | 5 tools listed; `sleep_harvest` returned real engine output (2 sessions → 2 tasks) | ### Genuine improvement (real model, fresh persona) @@ -71,6 +71,6 @@ Shell checks: # Copilot MCP server printf '%s\n' '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' \ | SKILLOPT_SLEEP_REPO="$(pwd)" python3 plugins/copilot/mcp_server.py -# Codex installer (into a throwaway HOME) +# Codex skill installer (into a throwaway HOME) HOME=$(mktemp -d) bash plugins/codex/install.sh ``` diff --git a/plugins/README.md b/plugins/README.md index 0fe7b692..4f3150ff 100644 --- a/plugins/README.md +++ b/plugins/README.md @@ -22,7 +22,7 @@ literature (short-term experience → long-term competence). | Platform | Folder | Mechanism | Status | |---|---|---|---| | **Claude Code** | [`claude-code/`](claude-code) | `.claude-plugin` + `/sleep` command + skill + hooks | full, installable | -| **Codex** | [`codex/`](codex) | `~/.codex/prompts/sleep.md` + `~/.agents/skills` + `AGENTS.md` | full | +| **Codex** | [`codex/`](codex) | user-level `skillopt-sleep` skill + shared runner | full | | **Copilot** | [`copilot/`](copilot) | MCP server (`sleep_*` tools) + `copilot-instructions` | full (MCP) | All three call the **same** [`plugins/run-sleep.sh`](run-sleep.sh) → `python -m diff --git a/plugins/codex/README.md b/plugins/codex/README.md index f5960da0..376bc466 100644 --- a/plugins/codex/README.md +++ b/plugins/codex/README.md @@ -14,28 +14,35 @@ as the Claude Code plugin (`skillopt_sleep`), wrapped for Codex. ## What Codex supports (and what we use) Codex (`@openai/codex`) extends via **`AGENTS.md`** instructions, **skills** at -`~/.agents/skills//SKILL.md`, and **custom prompts** at -`~/.codex/prompts/.md` (invoked as `/`). This integration ships all -three, plus a shared runner. +`~/.agents/skills//SKILL.md`, and plugins that can distribute skills. +Custom prompts are deprecated in Codex, so this integration is skill-first: the +installed `skillopt-sleep` skill contains the launch commands and operating +rules. The shared runner remains a plain shell entrypoint that the skill calls. ## Install ```bash git clone SkillOpt-Sleep cd SkillOpt-Sleep -bash plugins/codex/install.sh # installs the /sleep prompt + skill +bash plugins/codex/install.sh # installs the skill export SKILLOPT_SLEEP_REPO="$(pwd)" # so the runner is found from anywhere ``` +If a previous install created `~/.codex/prompts/sleep.md`, the installer moves +that deprecated prompt aside with a `.skillopt-legacy*.bak` suffix. + Requires Python ≥ 3.10 and the `codex` CLI on PATH. ## Use +Mention `$skillopt-sleep` where Codex supports explicit skill mentions, or ask +Codex in natural language: + ```text -/sleep status # what's happened -/sleep dry-run # safe preview, stages nothing -/sleep run # full cycle, stages a reviewed proposal (no live edits) -/sleep adopt # apply the staged proposal (with backup) +Use the skillopt-sleep skill to run status for this project. +Use the skillopt-sleep skill to run a dry-run for this project. +Use the skillopt-sleep skill to run the full cycle for this project with the Codex backend. +Use the skillopt-sleep skill to adopt the latest staged proposal. ``` Or call the engine directly: @@ -53,7 +60,7 @@ identically — see [`../../docs/sleep/CONTROLLABLE_DREAMING.md`](../../docs/sle - Codex's `exec` runs shell, so the real-tool-loop replay (e.g. the `tool_called: search` benchmark seed) works natively. -- Codex's standalone *plugin-package manifest* format is not yet a stable public - spec; this integration uses the documented `AGENTS.md` + skills + prompts - mechanisms, which are stable. If/when a `codex plugin` package format ships, - we'll add a one-file manifest. +- This integration no longer installs a `.codex/prompts` slash command. Skills + are the reusable Codex workflow surface; mention `skillopt-sleep` explicitly + or ask for a sleep/dream/offline self-improvement run and Codex can load the + skill. diff --git a/plugins/codex/install.sh b/plugins/codex/install.sh index b7c0e14e..11b07352 100755 --- a/plugins/codex/install.sh +++ b/plugins/codex/install.sh @@ -1,24 +1,30 @@ #!/usr/bin/env bash -# Install the SkillOpt-Sleep Codex integration into the user's ~/.codex and -# ~/.agents directories. Idempotent; prints what it does. +# Install the SkillOpt-Sleep Codex integration as a user-level Codex skill. +# Idempotent; prints what it does. set -euo pipefail REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)" CODEX_HOME="${CODEX_HOME:-$HOME/.codex}" AGENTS_SKILLS="${HOME}/.agents/skills" +LEGACY_PROMPT="$CODEX_HOME/prompts/sleep.md" echo "[install] repo: $REPO_ROOT" -# 1) custom /sleep prompt -mkdir -p "$CODEX_HOME/prompts" -cp "$REPO_ROOT/plugins/codex/prompts/sleep.md" "$CODEX_HOME/prompts/sleep.md" -echo "[install] /sleep prompt -> $CODEX_HOME/prompts/sleep.md" - -# 2) user-level skill +# 1) user-level skill mkdir -p "$AGENTS_SKILLS/skillopt-sleep" cp "$REPO_ROOT/plugins/codex/skills/skillopt-sleep/SKILL.md" "$AGENTS_SKILLS/skillopt-sleep/SKILL.md" echo "[install] skill -> $AGENTS_SKILLS/skillopt-sleep/SKILL.md" +# 2) retire the old custom prompt entrypoint from previous installs +if [ -f "$LEGACY_PROMPT" ]; then + backup="${LEGACY_PROMPT}.skillopt-legacy.bak" + if [ -e "$backup" ]; then + backup="${LEGACY_PROMPT}.skillopt-legacy.$(date +%Y%m%d%H%M%S).bak" + fi + mv "$LEGACY_PROMPT" "$backup" + echo "[install] legacy prompt -> $backup" +fi + # 3) record the repo location so the runner is found from anywhere echo "[install] add to your shell profile:" echo " export SKILLOPT_SLEEP_REPO=\"$REPO_ROOT\"" @@ -29,8 +35,10 @@ cat < mine recurring tasks -> replay offline -> consolidate validated memory + skills behind a held-out gate." --- -# SkillOpt-Sleep (Codex skill) +# SkillOpt-Sleep: offline self-evolution for a local Codex agent -This skill drives the `skillopt_sleep` engine — an offline "sleep cycle" that -makes a Codex agent better at the user's recurring work without retraining. +SkillOpt-Sleep gives the user's Codex agent a sleep cycle. While the user is +offline or on demand, it reviews past local sessions, re-runs recurring tasks +on the user's own budget, and consolidates what it learns into memory and +skills. It keeps only changes that pass a held-out validation gate, and live +files change only after the user explicitly adopts a staged proposal. There is +no model-weight training. ## When to use -Trigger when the user wants to: review past sessions, learn their preferences, -consolidate feedback into long-term memory/skills, run a nightly/offline -self-improvement cycle, or adopt a staged proposal. +Trigger when the user wants any of: -## How to run it +- Codex to learn from past sessions or get better the more they use it; +- a nightly/scheduled or on-demand sleep/dream/offline self-improvement run; +- to review past sessions and distill recurring tasks; +- to consolidate feedback into memory or managed skills; +- to run `status`, `harvest`, `dry-run`, `run`, or `adopt` for SkillOpt-Sleep. + +## The cycle + +1. **Harvest** - read local session transcripts according to the engine + configuration and normalize them into session digests. +2. **Mine** - turn digests into recurring `TaskRecord`s with outcomes and + checkable references where possible. +3. **Replay** - re-run mined tasks offline under the current skill and memory. +4. **Consolidate** - reflect on failures and propose bounded edits. +5. **Gate** - accept edits only when the held-out validation score improves. +6. **Stage** - write the proposal under + `/.skillopt-sleep/staging//`; nothing live changes. +7. **Adopt** - only after explicit user approval, copy staged files over live + files with backups. + +## How to drive it Invoke the bundled runner via shell (Codex `exec` has shell access). The runner -finds the engine and a Python ≥ 3.10 automatically: +finds the engine and a Python >= 3.10 automatically. ```bash # point at the repo if it isn't auto-detected from CWD: export SKILLOPT_SLEEP_REPO=/path/to/SkillOpt-Sleep -bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" --project "$(pwd)" + +bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" status --project "$(pwd)" +bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" harvest --project "$(pwd)" +bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" dry-run --project "$(pwd)" --backend mock +bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" run --project "$(pwd)" --backend codex +bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" adopt --project "$(pwd)" ``` -`` ∈ `status | dry-run | run | adopt | harvest`. Use `--backend codex` -for real improvement on the user's own Codex budget (default `mock` = no spend). +Actions are `status`, `harvest`, `dry-run`, `run`, and `adopt`. + +- Default backend is `mock`, which is deterministic and spends no API budget. +- `--backend codex` uses the user's Codex budget for real improvement. +- Keep `dry-run --backend mock` as the first smoke check unless the user + explicitly asked for a real optimization run. ## Steps 1. Run the requested action; capture stdout. -2. For `run`/`dry-run`: read the staged `report.md` it prints and show the user - the held-out baseline → candidate score and the exact proposed edits. -3. `run` only **stages** a proposal under `/.skillopt-sleep/staging/`; - nothing live changes until `adopt`. Offer `/sleep adopt`. -4. Never hand-edit the user's `AGENTS.md` / skills yourself — only `adopt` does, - and it backs up first. +2. For `dry-run` and `run`, report the held-out baseline -> candidate score, + gate action, task count, session count, and exact proposed edits. +3. If a staging directory is printed, read `report.md` before summarizing. +4. `run` only stages a proposal; nothing live changes until `adopt`. +5. Offer adoption only after the user has reviewed the staged proposal. +6. Never hand-edit the user's `AGENTS.md`, memory, or skills as a substitute + for `adopt`; adoption is the safety boundary and writes backups first. + +## Hard rules + +- Harvest is read-only. Do not edit archived sessions or raw transcripts. +- Keep raw secrets, credentials, private user data, and unsanitized transcript + contents out of messages, logs, generated artifacts, and commits. +- Show validation evidence before recommending adoption. +- Treat generated edits as proposals, not as source of truth. +- Do not rely on deprecated custom prompts or `/sleep` slash commands for this + Codex integration. This skill is the entrypoint. ## Validate ```bash +python -m skillopt_sleep dry-run --project "$(pwd)" --backend mock --json python -m skillopt_sleep.experiments.run_gbrain --backend codex \ --seeds brief-writer --data-root /path/to/gbrain-evals/eval/data/skillopt-v1 \ --nights 2 --limit-replay 3 --limit-holdout 3 ``` -A deficient skill goes 0.00 → 1.00 on a held-out set; the optimizer's edits are -gated on real-task performance. + +A deficient skill goes 0.00 -> 1.00 on a held-out set; the optimizer's edits +are gated on real-task performance.