feat(harness): cline CLI recipe integrated — column deferred (proxy responses friction)#56
Merged
Merged
Conversation
Sixth agentic harness (model-agnostic, cline). Edits the real cwd; smoked
end-to-end on qwen3-coder-30b (qwen-3-14b is too weak for its tool-use — it
hallucinated submit without writing; a real compat data point).
- infra/docker/harness-cline: node base + cline@3.0.15 (Bun binary, both arches).
- _cline_invocation recipe: UNIQUE two-step `sh -c "cline auth --provider
openai-native --baseurl <proxy>/v1 && cline --yolo '<prompt>' || true"`. A custom
base URL is seedable only via `cline auth` (persists to /tmp data dir, out of
/workspace); `|| true` masks cline's noisy exit code — the produced patch is the
real signal. The key is the literal $LITELLM_MASTER_KEY (never written to a file).
- stacks/cline/stack.yaml: command sh, L1+L2.
- tests: cline recipe-proven; supported_harnesses={aider,goose,opencode,crush,cline}.
ruff + mypy --strict clean; 13/13 stack specs valid.
Refs: rfc-006-stack-executor
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…proxy friction)
The cline recipe is integrated, but its column is DEFERRED, not shipped 0/4:
cline's `openai-native` provider uses the OpenAI **Responses API**, which our
OpenRouter-backed proxy rejects on multi-call tasks ("Invalid Responses API
request"). The trivial isolation smoke works (one chat-completions write lands),
but the real be_01 task (read → multi-edit → verify) hits the responses error
mid-run → no usable patch → 0/4. That's an INFRA/protocol friction, not a
harness×model incompatibility — shipping a 0/4 column would mislabel it. The
run-config (_CLINE_MODELS + _STACK_MODELS) is ready; the column runs once the
friction is resolved (route cline's models via a responses-compatible backend,
or find a cline chat-completions mode). Recipe/image/stack already on main.
Refs: rfc-006-stack-executor
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Sixth agentic harness — cline, model-agnostic. Recipe + image + stack
integrated; smoked end-to-end. The column is deferred (not shipped 0/4) — see below.
How
_cline_invocation: UNIQUE two-step — a custom base URL is seedable only viacline auth(no env/flag), provider idopenai-native(NOT openai-compatible,whose CLI wizard is buggy, cline #9656).
sh -c "cline auth … --baseurl <proxy>/v1 && cline --yolo '<prompt>' || true". Edits the real /workspace; state in /tmp;key referenced as literal
$LITELLM_MASTER_KEY(never written to a file).|| truemasks cline's noisy exit code — the produced patch is the signal.
stacks/cline/stack.yaml: commandsh, L1+L2.supported_harnessesnow includes cline.Why the column is deferred (honest)
cline's
openai-nativeprovider uses the OpenAI Responses API, which ourOpenRouter-backed proxy rejects on multi-call tasks (
Invalid Responses API request). The trivial isolation smoke works (a single chat-completions writelands — verified
File created successfully at: /workspace/hello.txtonqwen3-coder-30b), but the real
be_01task (read → multi-edit → verify) hits theresponses error mid-run → no usable patch → 0/4. That's an INFRA/protocol
friction, not a harness×model incompatibility — shipping a 0/4 column would
mislabel it on the compat matrix. The recipe + run-config are ready; the column
runs once the friction is resolved (route cline's models via a responses-compatible
backend, or a cline chat-completions mode). qwen-3-14b separately is too weak for
cline's tool-use (hallucinated submit without writing).
Verify
731 tests green; ruff + mypy --strict clean; 13/13 stack specs valid.
Refs: rfc-006-stack-executor
🤖 Generated with Claude Code