Skip to content

feat(harness): PI + gptme + mini-SWE wave — 3 columns, board 32/45 (RFC-006 Phase 5)#57

Merged
explosivebit merged 2 commits into
mainfrom
feat/harness-wave-pi-gptme-mini
Jun 3, 2026
Merged

feat(harness): PI + gptme + mini-SWE wave — 3 columns, board 32/45 (RFC-006 Phase 5)#57
explosivebit merged 2 commits into
mainfrom
feat/harness-wave-pi-gptme-mini

Conversation

@explosivebit

Copy link
Copy Markdown
Contributor

What

Three model-agnostic harnesses — PI, gptme, mini-SWE-agent — built + smoked
in parallel by a 3-agent team (offloading per-harness Docker recipe-discovery),
integrated serially. Board now 32/45 scored, 8 harness columns.

Results (be_01)

harness scored notes
PI 3/4 devstral 7.46 · codestral 6.67 · qwen3-235b 6.17 (native tool_calls only)
gptme 1/4 codestral 7.17 (no turn cap → verify-loop timeouts elsewhere)
mini-SWE 2/4 qwen3-coder-30b 7.08 · qwen-3-14b 6.54 (textbased loop)

★ The tool-call-parser finding (explains the whole matrix)

A model's proxy/vLLM backend decides whether it emits native tool_calls or
text-format (XML/markdown fences). qwen3-coder-30b emits text → native-tool
harnesses (PI, opencode) no-op on it (this is why opencode is 1/4!). Native-tool
models on the proxy: devstral, codestral, qwen3-235b, glm-4-32b. Text-tolerant
harnesses (aider, goose, crush, gptme, mini-SWE-textbased) work on text-format models.

How

  • PI: models.json via config_files + PI_CODING_AGENT_DIR; native tool_calls.
  • gptme: env-only; tiktoken cache baked (no-egress); wall-clock-bounded.
  • mini-SWE: bare validator loop (L0+L2+L7); --environment-class local (no nested
    Docker) + litellm_textbased; MSWEA guards; trajectory → /tmp.
    • config_files launcher (from the opencode wave) handles PI/mini config.

Verify

734 tests green; ruff + mypy --strict clean; 15/15 stack specs valid.

Refs: rfc-006-stack-executor

🤖 Generated with Claude Code

explosivebit and others added 2 commits June 3, 2026 20:12
…006 Phase 5)

Three model-agnostic harnesses built + smoked IN PARALLEL by a 3-agent team
(offloading the per-harness Docker recipe-discovery from the main context). All
edit the real cwd; the patch is captured by host git-diff.

- pi (@earendil-works/pi-coding-agent): models.json via config_files +
  PI_CODING_AGENT_DIR; native tool_calls → runs devstral/codestral/qwen3-235b/
  glm-4-32b (qwen3-coder-30b emits text-format tool calls its proxy backend can't
  parse → pi no-ops; a real compat finding).
- gptme: env-only (OPENAI_BASE_URL + MODEL=local/<m>, prefix stripped); tiktoken
  cache baked (no-egress); no turn cap → wall-clock bounds it (patch written
  before the verify loop, so a timeout-kill still yields a valid patch).
- mini-SWE-agent: the bare validator loop (L0+L2+L7). --environment-class local
  (bash in /workspace, NO nested Docker) + litellm_textbased (fenced bash, not
  native tool_calls) + MSWEA_CONFIGURED/COST_TRACKING guards. Trajectory → /tmp.

Cross-harness finding: a model's PROXY BACKEND tool-call-parser determines native-
tool support — qwen3-coder-30b returns text-format calls (explains opencode 1/4);
devstral/codestral/qwen3-235b/glm-4-32b emit native tool_calls.

ruff + mypy --strict clean; 15/15 stack specs valid. supported_harnesses =
{aider, goose, opencode, crush, cline, pi, gptme, mini-swe}.

Refs: rfc-006-stack-executor

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 3-harness wave's scored columns (--add-stack each), merged into board.json:
- PI (native tool_calls): devstral 7.46, codestral 6.67, qwen3-235b 6.17 (3/4;
  glm-4-32b fail). Native-tool models only — the tool-parser finding holds.
- gptme (text-tolerant): codestral 7.17 (1/4 — no turn cap → verify-loop timeouts
  on the other models; wall-clock kills before a clean finish).
- mini-SWE (textbased loop): qwen3-coder-30b 7.08, qwen-3-14b 6.54 (2/4).
Board now 32/45 scored, 8 harness columns. Harness display name pi→PI.

Refs: rfc-006-stack-executor

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@explosivebit explosivebit merged commit 164f14f into main Jun 3, 2026
3 checks passed
@explosivebit explosivebit deleted the feat/harness-wave-pi-gptme-mini branch June 3, 2026 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant