Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ harvest session transcripts → mine recurring tasks → replay offline
| Platform | Folder | Install |
|---|---|---|
| **Claude Code** | [`plugins/claude-code`](plugins/claude-code) | `/plugin marketplace add ./plugins/claude-code` → `/sleep` |
| **Codex** | [`plugins/codex`](plugins/codex) | `bash plugins/codex/install.sh` → `/sleep` |
| **Codex** | [`plugins/codex`](plugins/codex) | `bash plugins/codex/install.sh` → `skillopt-sleep` skill |
| **Copilot** | [`plugins/copilot`](plugins/copilot) | register `plugins/copilot/mcp_server.py` as an MCP server |

**Validated on real models.** On the public
Expand Down
2 changes: 1 addition & 1 deletion docs/sleep/PR_DRAFT.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Synthesizes SkillOpt (validation-gated bounded text edits), Claude Dreams
Shipped as plugins for **three agents**, one engine + three thin shells:

- **Claude Code** — `.claude-plugin` + `/sleep` command + skill + hooks
- **Codex** — `~/.codex/prompts/sleep.md` + `~/.agents/skills` + `install.sh`
- **Codex** — user-level `skillopt-sleep` skill + shared runner + `install.sh`
- **Copilot** — a stdlib-only MCP server exposing `sleep_*` tools

## Design notes
Expand Down
4 changes: 2 additions & 2 deletions docs/sleep/plugin_load_test.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ from scratch for this test. Two forms were used:
| Shell | What was run | Result |
|---|---|---|
| **Claude Code** (`scripts/sleep.sh`) | `harvest`, full `run`, `adopt` | harvest found 2 sessions → 2 tasks; `run` staged a proposal; `adopt` honored the safety contract (no live change when nothing was accepted) |
| **Codex** (`install.sh` + shared runner) | `install.sh` into a temp HOME | placed `~/.codex/prompts/sleep.md` and `~/.agents/skills/skillopt-sleep/SKILL.md` correctly |
| **Codex** (`install.sh` + shared runner) | `install.sh` into a temp HOME | placed the user-level `~/.agents/skills/skillopt-sleep/SKILL.md` skill correctly and moved any legacy custom prompt aside instead of installing one |
| **Copilot** (`mcp_server.py`) | `initialize` → `tools/list` → `tools/call sleep_harvest` | 5 tools listed; `sleep_harvest` returned real engine output (2 sessions → 2 tasks) |

### Genuine improvement (real model, fresh persona)
Expand Down Expand Up @@ -71,6 +71,6 @@ Shell checks:
# Copilot MCP server
printf '%s\n' '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' \
| SKILLOPT_SLEEP_REPO="$(pwd)" python3 plugins/copilot/mcp_server.py
# Codex installer (into a throwaway HOME)
# Codex skill installer (into a throwaway HOME)
HOME=$(mktemp -d) bash plugins/codex/install.sh
```
2 changes: 1 addition & 1 deletion plugins/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ literature (short-term experience → long-term competence).
| Platform | Folder | Mechanism | Status |
|---|---|---|---|
| **Claude Code** | [`claude-code/`](claude-code) | `.claude-plugin` + `/sleep` command + skill + hooks | full, installable |
| **Codex** | [`codex/`](codex) | `~/.codex/prompts/sleep.md` + `~/.agents/skills` + `AGENTS.md` | full |
| **Codex** | [`codex/`](codex) | user-level `skillopt-sleep` skill + shared runner | full |
| **Copilot** | [`copilot/`](copilot) | MCP server (`sleep_*` tools) + `copilot-instructions` | full (MCP) |

All three call the **same** [`plugins/run-sleep.sh`](run-sleep.sh) → `python -m
Expand Down
31 changes: 19 additions & 12 deletions plugins/codex/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,28 +14,35 @@ as the Claude Code plugin (`skillopt_sleep`), wrapped for Codex.
## What Codex supports (and what we use)

Codex (`@openai/codex`) extends via **`AGENTS.md`** instructions, **skills** at
`~/.agents/skills/<name>/SKILL.md`, and **custom prompts** at
`~/.codex/prompts/<name>.md` (invoked as `/<name>`). This integration ships all
three, plus a shared runner.
`~/.agents/skills/<name>/SKILL.md`, and plugins that can distribute skills.
Custom prompts are deprecated in Codex, so this integration is skill-first: the
installed `skillopt-sleep` skill contains the launch commands and operating
rules. The shared runner remains a plain shell entrypoint that the skill calls.

## Install

```bash
git clone <repo-url> SkillOpt-Sleep
cd SkillOpt-Sleep
bash plugins/codex/install.sh # installs the /sleep prompt + skill
bash plugins/codex/install.sh # installs the skill
export SKILLOPT_SLEEP_REPO="$(pwd)" # so the runner is found from anywhere
```

If a previous install created `~/.codex/prompts/sleep.md`, the installer moves
that deprecated prompt aside with a `.skillopt-legacy*.bak` suffix.

Requires Python ≥ 3.10 and the `codex` CLI on PATH.

## Use

Mention `$skillopt-sleep` where Codex supports explicit skill mentions, or ask
Codex in natural language:

```text
/sleep status # what's happened
/sleep dry-run # safe preview, stages nothing
/sleep run # full cycle, stages a reviewed proposal (no live edits)
/sleep adopt # apply the staged proposal (with backup)
Use the skillopt-sleep skill to run status for this project.
Use the skillopt-sleep skill to run a dry-run for this project.
Use the skillopt-sleep skill to run the full cycle for this project with the Codex backend.
Use the skillopt-sleep skill to adopt the latest staged proposal.
```

Or call the engine directly:
Expand All @@ -53,7 +60,7 @@ identically — see [`../../docs/sleep/CONTROLLABLE_DREAMING.md`](../../docs/sle

- Codex's `exec` runs shell, so the real-tool-loop replay (e.g. the
`tool_called: search` benchmark seed) works natively.
- Codex's standalone *plugin-package manifest* format is not yet a stable public
spec; this integration uses the documented `AGENTS.md` + skills + prompts
mechanisms, which are stable. If/when a `codex plugin` package format ships,
we'll add a one-file manifest.
- This integration no longer installs a `.codex/prompts` slash command. Skills
are the reusable Codex workflow surface; mention `skillopt-sleep` explicitly
or ask for a sleep/dream/offline self-improvement run and Codex can load the
skill.
30 changes: 19 additions & 11 deletions plugins/codex/install.sh
Original file line number Diff line number Diff line change
@@ -1,24 +1,30 @@
#!/usr/bin/env bash
# Install the SkillOpt-Sleep Codex integration into the user's ~/.codex and
# ~/.agents directories. Idempotent; prints what it does.
# Install the SkillOpt-Sleep Codex integration as a user-level Codex skill.
# Idempotent; prints what it does.
set -euo pipefail

REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
AGENTS_SKILLS="${HOME}/.agents/skills"
LEGACY_PROMPT="$CODEX_HOME/prompts/sleep.md"

echo "[install] repo: $REPO_ROOT"

# 1) custom /sleep prompt
mkdir -p "$CODEX_HOME/prompts"
cp "$REPO_ROOT/plugins/codex/prompts/sleep.md" "$CODEX_HOME/prompts/sleep.md"
echo "[install] /sleep prompt -> $CODEX_HOME/prompts/sleep.md"

# 2) user-level skill
# 1) user-level skill
mkdir -p "$AGENTS_SKILLS/skillopt-sleep"
cp "$REPO_ROOT/plugins/codex/skills/skillopt-sleep/SKILL.md" "$AGENTS_SKILLS/skillopt-sleep/SKILL.md"
echo "[install] skill -> $AGENTS_SKILLS/skillopt-sleep/SKILL.md"

# 2) retire the old custom prompt entrypoint from previous installs
if [ -f "$LEGACY_PROMPT" ]; then
backup="${LEGACY_PROMPT}.skillopt-legacy.bak"
if [ -e "$backup" ]; then
backup="${LEGACY_PROMPT}.skillopt-legacy.$(date +%Y%m%d%H%M%S).bak"
fi
mv "$LEGACY_PROMPT" "$backup"
echo "[install] legacy prompt -> $backup"
fi

# 3) record the repo location so the runner is found from anywhere
echo "[install] add to your shell profile:"
echo " export SKILLOPT_SLEEP_REPO=\"$REPO_ROOT\""
Expand All @@ -29,8 +35,10 @@ cat <<EOF
[install] Optional — add this to ~/.codex/AGENTS.md so Codex always knows the tool:

## SkillOpt-Sleep
An offline self-improvement cycle is available. To run it:
\`bash "$REPO_ROOT/plugins/run-sleep.sh" status\`. Use \`/sleep\` for the guided flow.
Use the skillopt-sleep skill when I ask to run a sleep/dream/offline
self-improvement cycle. The runner is:
\`bash "$REPO_ROOT/plugins/run-sleep.sh" status --project "\$(pwd)"\`.

Done. Try: /sleep status
Done. Try asking Codex:
Use the skillopt-sleep skill to run status for this project.
EOF
21 changes: 0 additions & 21 deletions plugins/codex/prompts/sleep.md

This file was deleted.

84 changes: 64 additions & 20 deletions plugins/codex/skills/skillopt-sleep/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,93 @@
---
name: skillopt-sleep
description: Nightly offline self-evolution for a Codex agent. Reviews past sessions, replays recurring tasks, and consolidates validated memory + skills behind a held-out gate. Use when the user wants Codex to learn from past usage, run a "sleep"/"dream" cycle, or schedule offline self-optimization.
description: "Use when the user wants Codex to self-improve from past usage, asks about a nightly/offline 'sleep' or 'dream' cycle, wants Codex to review past sessions, learn preferences, consolidate memory/skills, run dry-run/run/adopt/status for SkillOpt-Sleep, or schedule offline self-optimization. Drives the skillopt_sleep engine: harvest past sessions -> mine recurring tasks -> replay offline -> consolidate validated memory + skills behind a held-out gate."
---

# SkillOpt-Sleep (Codex skill)
# SkillOpt-Sleep: offline self-evolution for a local Codex agent

This skill drives the `skillopt_sleep` engine — an offline "sleep cycle" that
makes a Codex agent better at the user's recurring work without retraining.
SkillOpt-Sleep gives the user's Codex agent a sleep cycle. While the user is
offline or on demand, it reviews past local sessions, re-runs recurring tasks
on the user's own budget, and consolidates what it learns into memory and
skills. It keeps only changes that pass a held-out validation gate, and live
files change only after the user explicitly adopts a staged proposal. There is
no model-weight training.

## When to use

Trigger when the user wants to: review past sessions, learn their preferences,
consolidate feedback into long-term memory/skills, run a nightly/offline
self-improvement cycle, or adopt a staged proposal.
Trigger when the user wants any of:

## How to run it
- Codex to learn from past sessions or get better the more they use it;
- a nightly/scheduled or on-demand sleep/dream/offline self-improvement run;
- to review past sessions and distill recurring tasks;
- to consolidate feedback into memory or managed skills;
- to run `status`, `harvest`, `dry-run`, `run`, or `adopt` for SkillOpt-Sleep.

## The cycle

1. **Harvest** - read local session transcripts according to the engine
configuration and normalize them into session digests.
2. **Mine** - turn digests into recurring `TaskRecord`s with outcomes and
checkable references where possible.
3. **Replay** - re-run mined tasks offline under the current skill and memory.
4. **Consolidate** - reflect on failures and propose bounded edits.
5. **Gate** - accept edits only when the held-out validation score improves.
6. **Stage** - write the proposal under
`<project>/.skillopt-sleep/staging/<date>/`; nothing live changes.
7. **Adopt** - only after explicit user approval, copy staged files over live
files with backups.

## How to drive it

Invoke the bundled runner via shell (Codex `exec` has shell access). The runner
finds the engine and a Python 3.10 automatically:
finds the engine and a Python >= 3.10 automatically.

```bash
# point at the repo if it isn't auto-detected from CWD:
export SKILLOPT_SLEEP_REPO=/path/to/SkillOpt-Sleep
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" <action> --project "$(pwd)"

bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" status --project "$(pwd)"
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" harvest --project "$(pwd)"
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" dry-run --project "$(pwd)" --backend mock
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" run --project "$(pwd)" --backend codex
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" adopt --project "$(pwd)"
```

`<action>` ∈ `status | dry-run | run | adopt | harvest`. Use `--backend codex`
for real improvement on the user's own Codex budget (default `mock` = no spend).
Actions are `status`, `harvest`, `dry-run`, `run`, and `adopt`.

- Default backend is `mock`, which is deterministic and spends no API budget.
- `--backend codex` uses the user's Codex budget for real improvement.
- Keep `dry-run --backend mock` as the first smoke check unless the user
explicitly asked for a real optimization run.

## Steps

1. Run the requested action; capture stdout.
2. For `run`/`dry-run`: read the staged `report.md` it prints and show the user
the held-out baseline → candidate score and the exact proposed edits.
3. `run` only **stages** a proposal under `<project>/.skillopt-sleep/staging/`;
nothing live changes until `adopt`. Offer `/sleep adopt`.
4. Never hand-edit the user's `AGENTS.md` / skills yourself — only `adopt` does,
and it backs up first.
2. For `dry-run` and `run`, report the held-out baseline -> candidate score,
gate action, task count, session count, and exact proposed edits.
3. If a staging directory is printed, read `report.md` before summarizing.
4. `run` only stages a proposal; nothing live changes until `adopt`.
5. Offer adoption only after the user has reviewed the staged proposal.
6. Never hand-edit the user's `AGENTS.md`, memory, or skills as a substitute
for `adopt`; adoption is the safety boundary and writes backups first.

## Hard rules

- Harvest is read-only. Do not edit archived sessions or raw transcripts.
- Keep raw secrets, credentials, private user data, and unsanitized transcript
contents out of messages, logs, generated artifacts, and commits.
- Show validation evidence before recommending adoption.
- Treat generated edits as proposals, not as source of truth.
- Do not rely on deprecated custom prompts or `/sleep` slash commands for this
Codex integration. This skill is the entrypoint.

## Validate

```bash
python -m skillopt_sleep dry-run --project "$(pwd)" --backend mock --json
python -m skillopt_sleep.experiments.run_gbrain --backend codex \
--seeds brief-writer --data-root /path/to/gbrain-evals/eval/data/skillopt-v1 \
--nights 2 --limit-replay 3 --limit-holdout 3
```
A deficient skill goes 0.00 → 1.00 on a held-out set; the optimizer's edits are
gated on real-task performance.

A deficient skill goes 0.00 -> 1.00 on a held-out set; the optimizer's edits
are gated on real-task performance.