TL;DR
Agentic-Flow is shifting from "an AI agent orchestrator" to an agentic meta-harness — a runtime whose main job is to build, improve, and verify the harness around a model, not to be a model. The slogan: freeze the model, evolve the harness. The first pieces shipped in agentic-flow@2.1.0. This issue explains the move in plain language and invites feedback on the direction.
What's a "harness," and what's a "meta-harness"?
The model is the LLM (Claude, GPT, a local model, etc.).
The harness is everything around the model that turns it into a useful agent:
- how it plans a task,
- what context/files it's shown,
- how it reviews its own output,
- when it retries,
- which tools it can call,
- what it remembers,
- how "success" is scored.
A meta-harness is a system whose product is that harness — it chooses models, improves the harness, runs agents on it, and verifies the whole thing is safe and trustworthy.
Why move this way?
Because the measured lever in modern agentic systems is the harness, not a bigger model. A cheap model inside a well-built, self-improving harness can match an expensive model at a fraction of the cost. So instead of always reaching for a bigger model, we make the harness smarter.
The four pillars
| Pillar |
In plain terms |
| 🧭 Route |
Send each request to the cheapest model that's still good enough for it. |
| 🧬 Evolve |
Let the system improve its own harness and repair code automatically — same model, better results. |
| 🤝 Orchestrate |
Run the agents, tools, memory, and swarms on top. |
| 🔏 Verify |
A safety gate on every harness change, plus signed provenance so you can trust what shipped. |
What already shipped in 2.1.0
- Route — cost-optimal model routing: learn from your own eval logs and pick the cheapest model predicted to clear a quality bar. Measured: ~28.5% cheaper than always using the top model while keeping ~98% of answers above the bar.
- Evolve —
agentic-flow-repair: an autonomous "freeze the model, evolve the harness" loop that repairs a repo, gated by the repo's own tests in a shell-free, secret-scrubbed sandbox.
- Verify — harness MCP tools (
harness_repair / harness_manifest / harness_verify) and an Ed25519 witness manifest so you can sign your agent/harness config and detect tampering.
- Positioning — README + package now lead with the meta-harness identity (ADR-073/074/075/076).
Try it
npm i agentic-flow@2.1.0
# autonomous repair (deterministic, no Docker needed):
npx agentic-flow-repair ./your-repo --mock
import { CostOptimalRouter } from 'agentic-flow/router/cost-optimal';
import { repair } from 'agentic-flow/repair';
import { signFiles, verifySignedManifest } from 'agentic-flow/harness/provenance';
Honest scope
The in-package repair engine is fully working and tested. The headline SWE-bench-Lite "Test-Driven Repair" product numbers (~58–68%) come from the upstream @metaharness/darwin Docker harness — that's the documented deployment path, not bundled here.
Where we're heading (feedback welcome)
- Route: turn real usage into routing training data automatically; a native (FastGRNN) backend by default.
- Evolve: evolve agentic-flow's own agent policies against its benchmark suite; wire the full SWE-bench TDR path.
- Verify:
harness verify as a CI/pre-publish gate; key-management guidance.
- Docs: an end-to-end "build your own meta-harness" guide.
Questions for the community:
- Does the "harness, not the model, is the lever" framing match your experience?
- Which pillar is most useful to you first — routing, repair, or provenance?
- What would make you adopt cost-optimal routing in production (data format, integrations, guardrails)?
Background: ADR-073/074/075/076 in docs/adr/. Related packages: @metaharness/router, @metaharness/darwin.
TL;DR
Agentic-Flow is shifting from "an AI agent orchestrator" to an agentic meta-harness — a runtime whose main job is to build, improve, and verify the harness around a model, not to be a model. The slogan: freeze the model, evolve the harness. The first pieces shipped in
agentic-flow@2.1.0. This issue explains the move in plain language and invites feedback on the direction.What's a "harness," and what's a "meta-harness"?
The model is the LLM (Claude, GPT, a local model, etc.).
The harness is everything around the model that turns it into a useful agent:
A meta-harness is a system whose product is that harness — it chooses models, improves the harness, runs agents on it, and verifies the whole thing is safe and trustworthy.
Why move this way?
Because the measured lever in modern agentic systems is the harness, not a bigger model. A cheap model inside a well-built, self-improving harness can match an expensive model at a fraction of the cost. So instead of always reaching for a bigger model, we make the harness smarter.
The four pillars
What already shipped in 2.1.0
agentic-flow-repair: an autonomous "freeze the model, evolve the harness" loop that repairs a repo, gated by the repo's own tests in a shell-free, secret-scrubbed sandbox.harness_repair/harness_manifest/harness_verify) and an Ed25519 witness manifest so you can sign your agent/harness config and detect tampering.Try it
npm i agentic-flow@2.1.0 # autonomous repair (deterministic, no Docker needed): npx agentic-flow-repair ./your-repo --mockHonest scope
The in-package repair engine is fully working and tested. The headline SWE-bench-Lite "Test-Driven Repair" product numbers (~58–68%) come from the upstream
@metaharness/darwinDocker harness — that's the documented deployment path, not bundled here.Where we're heading (feedback welcome)
harness verifyas a CI/pre-publish gate; key-management guidance.Questions for the community:
Background: ADR-073/074/075/076 in
docs/adr/. Related packages:@metaharness/router,@metaharness/darwin.