Automatically improve agent guidance through iterative testing and scoring.
Autoagent is an OpenClaw skill that optimizes any agent guidance (prompts, AGENTS.md entries, skill definitions) using a Karpathy-style training loop with cron-triggered optimization.
- Setup - Asks where your guidance lives and what it should do
- Sandbox - Copies guidance to a test folder with fixtures
- Optimize - Runs every 5 minutes via cron:
- Analyzes current guidance
- Proposes improvement
- Tests with subagent
- Scores result
- Keeps or discards change
- Logs - Check scores.md for history
/autoagent
Every invocation starts fresh with interactive setup:
- Sandbox Location - Where should I create the folder? (absolute path)
- Success Criteria - Discussion to define what "good" looks like, then propose draft scoring.md for approval
- External Scripts - Any scripts/tools the guidance relies on?
- Cron Schedule - How often to run (default: 5 minutes)
Then creates a sandbox at your specified path with:
guidance-under-test.md- Original (read-only)current-guidance.md- Working versionfixtures/test-cases.json- Test inputsscoring.md- User-approved success criteriascores.md- Score historyscripts/- (optional) copies of referenced scripts
Each cron run:
- Analyze current guidance + score history
- Propose ONE specific edit
- Apply edit to current-guidance.md
- Run subagent with new guidance
- Score result (0-100)
- Log decision (keep/discard)
- Revert if score declined
- Check for plateau (auto-pause after 10 identical scores)
User: /autoagent
Agent: Where should I create the sandbox folder? Default: `../../autoagent-sandbox/` (resolves to `/clawd/autoagent-sandbox/`)
> news
Agent: Let's define how we'll measure success. What does a "good" result look like for this task?
> It should summarize news into morning briefs with bullet points
Agent: What format should the output be in?
> Markdown with sections for Top Stories, Industry News, and Quick Hits
Agent: [proposes draft scoring.md]
[User approves scoring]
Agent: Does the guidance rely on any scripts, tools, or external software?
> No
Agent: Run optimization every 5 minutes (default), or different interval?
> Default is fine
[Sandbox created at `../../autoagent-news/` → `/clawd/autoagent-news/`, cron started]
Agent: Optimization started at `/clawd/autoagent-news/`. I'll check back every 5 minutes. Monitor progress in `scores.md`.
[5 minutes later...]
- Score: 45 → 52 (kept change)
- Score: 52 → 48 (discarded)
- Score: 48 → 61 (kept change)
...
| File | Description |
|---|---|
SKILL.md |
Full skill definition |
setup-prompt.md |
Setup phase questions |
iteration-prompt.md |
Iteration loop instructions |
templates/ |
Score and fixture templates |
examples/ |
Test examples |
- OpenClaw with cron support
- LLM for guidance analysis
- File system access
MIT