feat(client,config): proper Opus 4.7 support — effort tier, 1h prompt-cache, thinking display by hakula139 · Pull Request #29 · hakula139/oxide-code

hakula139 · 2026-04-24T07:17:18Z

Summary

Treating Opus 4.7 as "just another model-name swap" leaves real capability on the table and sends a few wire fields that don't match what Claude Code itself sends. This PR makes claude-opus-4-7 a first-class target: the wire body now opts into the new agentic fields, the user-facing config gains effort and prompt_cache_ttl knobs, and max_tokens finally scales with intent instead of sitting at a hand-picked 32k.

Ground truth for every wire decision came from pointing claude -p at a local HTTP proxy and diffing the request body against what this client emits. That's what flagged a few non-obvious things the migration guide under-specified:

thinking.display defaults flipped to "omitted" on 4.7 — to keep the existing show_thinking UX working we now explicitly send "summarized" when the user opts in.
anthropic-beta is still gated per-model (interleaved thinking, context management, fine-grained tool streaming) — no blanket header stuffing.
context_management.edits with clear_thinking_20251015 is what Claude Code sends for 4.6+; we match.
cache_control.ttl: "1h" is opt-in via ENABLE_PROMPT_CACHING_1H — we default it on for agentic sessions where the system-prompt prefix is stable for hours.

Wire-body changes (`feat(client)`)

output_config.effort populated from resolved Config.effort, respecting per-model capability. Opus 4.7 gets xhigh; 4.6-class models get low..max; older models see no field.
thinking.display emitted on 4.7 — "summarized" when show_thinking = true, else omitted so the API picks its own default ("omitted"). 4.6 and older keep the legacy adaptive-thinking block unchanged.
context_management.edits = [{type: clear_thinking_20251015}] on 4.6+ (anything with context_management: true in the capability table).
CacheControl gained an optional ttl field; long-lived system-prompt breakpoints now carry ttl: "1h" when prompt_cache_ttl = 1h (the default).

Config surface (`feat(config)`)

New user-facing knobs, all optional, all with sensible per-model defaults:

Setting	Env	Default	Notes
`effort`	`ANTHROPIC_EFFORT`	per-model	`low`\|`medium`\|`high`\|`xhigh`\|`max`; clamped against model capability (Opus 4.7 unlocks `xhigh`; 4.6-class unlocks `max`).
`max_tokens`	`ANTHROPIC_MAX_TOKENS`	effort-derived	If unset, scales with the resolved effort so `max` actually buys room to think.
`prompt_cache_ttl`	`OX_PROMPT_CACHE_TTL`	`1h`	`5m`\|`1h`; opt-out to `5m` if short-lived caching is preferred.

Model-capability plumbing (`feat(model)`)

New effort_max flag — Opus 4.6+ and Sonnet 4.6+ (where the max tier is valid).
New effort_xhigh flag — Opus 4.7 only.
New Capabilities methods: accepts_effort, clamp_effort, default_effort. Config resolution asks the capability table instead of string-matching model IDs.
Table row for claude-opus-4-7 upgraded from "same as 4.6" to its actual superset (effort_xhigh = true).

Housekeeping (landed as separate commits)

Test-fn naming normalized: compute_betas_opus_46_* → compute_betas_opus_4_6_* (also _45 and _47). Easier to grep, consistent with model version strings.
let _ = expr; → _ = expr; sweep across the crate. Matches the idiom already used in most of the tree.
Struct-field reordering by concern: ClientConfig, Config, Capabilities, CreateMessageRequest, and CacheControl now group related fields adjacently (connection → model → request tuning → cache-control bits) so the shape of the file matches the conceptual layering.

Docs

docs/research/anthropic-api.md — new Agentic Request Body Fields section covering output_config.effort, context_management.edits, and cache_control.ttl, cross-linked to the migration guide.
docs/research/extended-thinking.md — new Display modes (Opus 4.7+) subsection explaining thinking.display and its coupling to show_thinking.
docs/guide/configuration.md — refreshed examples, tables, and env-var list; added an Authentication-section paragraph explaining why we prefer ANTHROPIC_API_KEY over an on-disk key (project-local ox.toml is easy to commit by accident). Also fixed a 4-space → 3-space indent under an ordered-list item to match the rest of the docs; markdownlint doesn't enforce this (MD007 only covers top-level unordered lists) so it's convention-only.

Test plan

cargo fmt --all --check
cargo build
cargo clippy -p oxide-code --all-targets -- -D warnings
cargo test -p oxide-code — 915 passed
pnpm lint — 0 errors across 16 files
pnpm spellcheck — 0 issues
Ground-truth wire capture via local HTTP proxy intercepting claude -p; diffed body shape against Opus 4.7 (effort, thinking.display, context_management, cache_control.ttl all match field-for-field).

Deferred

apiKeyHelper-style secret-manager indirection (shell-out for the API key) — called out in the Authentication section as the obvious next step if we want to move away from env / plaintext entirely. Not in scope here.
Runtime /model slash-command handoff for switching effort mid-session. The Effort enum and accepts_effort path are picker-ready; the reducer wiring is a follow-up.

Two tiny housekeeping items surfaced during the Opus 4.7 migration audit, landed together because they share the same scope and have zero behavior change: - Rename four test functions whose names compacted the model-family digit (`opus_46`, `sonnet_45`, `opus_47`) to the canonical underscore-separated form (`opus_4_6`, `sonnet_4_5`, `opus_4_7`), matching the API model id with `-` → `_` and the shape already used by neighbouring `compute_betas_haiku_4_5_*` and `model::tests::opus_4_7_*` fns. - Replace three `let _ = <expr>;` statements with `_ = <expr>;`. The bare-`_` form is a stable Rust statement that reads as "explicitly discard" without introducing a binding.

Opus 4.7's `output_config.effort` parameter introduces two new upper-bound levels — `xhigh` (new for 4.7) and `max` (Opus-only). Older models that accept `effort` at `low`/`medium`/`high` 400 on either of these. Rather than scatter per-model ceilings across request-building code, encode them in the capability table. - `effort_max` — Opus 4.6, Opus 4.7. - `effort_xhigh` — Opus 4.7 only. Flags are independent of each other and of the existing `effort` gate (which still decides whether `output_config.effort` is sent at all) — the matrix is small enough to enumerate and per-model allowlists in the upstream migration guide are not substring-shaped, so lifting them into the capability struct keeps the clamp a pure lookup. Also renames `opus_4_7_is_treated_as_4_6_equivalent` to `opus_4_7_uniquely_supports_xhigh` (the old name is a lie now that 4.7 diverges) and tightens the substring-predicate test's variable names (`expect_one_million` → `expect_context_1m`, `expect_thinking` → `expect_interleaved_thinking`) to match the field names they compare against. Fields are held behind `#[cfg_attr(not(test), expect(dead_code))]` until the next commit wires them into `Config::load`'s clamp.

Adds `[client].effort` / `ANTHROPIC_EFFORT` with five wire tokens (`low`, `medium`, `high`, `xhigh`, `max`) and resolves them once at `Config::load` against the per-model capability ceiling from `crate::model::Capabilities`. Callers read `Config.effort: Option<Effort>` and forward unchanged — no per-request clamping or model sniffing. Resolution: 1. `ANTHROPIC_EFFORT` env → `FromStr`. Unknown tokens fail loudly with an actionable message listing the five accepted ones. 2. Fallback to `[client].effort` (TOML, via `Deserialize`). 3. Fallback to `Capabilities::default_effort()` — `Xhigh` on 4.7, `High` on other effort-capable models, `None` elsewhere. 4. Clamp via `Capabilities::clamp_effort(pick)` — the highest supported level ≤ `pick`. Never silently escalates (an `xhigh` pick on Opus 4.6 clamps down to `high`, not up to `max`). Model rows without the `effort` capability produce `None` regardless of user pick. `max_tokens` default now scales with the resolved effort to match claude-code 2.1.119: 64 000 for `xhigh` / `max`, 32 000 for `high`, 16 384 otherwise. User overrides via env / TOML still win. `Config.effort` is gated behind `#[cfg_attr(not(test), expect(...))]` until the next commit wires it into `stream_message` — the `expect` will fire unfulfilled then, prompting removal. Also refactors the `Capabilities::{accepts_effort, clamp_effort, default_effort}` methods to take `self` by value (clippy catches the trivially-copy reference).

Anthropic silently dropped the default ephemeral cache TTL from 1 h to 5 m on 2026-03-06, quietly cutting prompt-caching savings from 80%+ to 40-55% on any session longer than 5 min. The hit-rate recovery is worth the write premium for real agent sessions, so make 1 h the default and let users opt down. - New `PromptCacheTtl { FiveMin, OneHour }` enum with explicit `#[serde(rename)]` tokens (`"5m"` / `"1h"`) matching the Anthropic wire format and claude-code's observed packet shape. - `Config.prompt_cache_ttl: PromptCacheTtl` (non-optional — every request carries a TTL, the only question is which). - Resolution: `OX_PROMPT_CACHE_TTL` env → `[client].prompt_cache_ttl` TOML → default `OneHour`. Invalid tokens fail loudly. - `CacheControl` grows an optional `ttl` field that serializes as `"1h"` only when set; 5 m keeps today's exact wire shape (no `ttl` key, matching the server default). `scope` gating stays unchanged — 3 P still drops global scope but keeps the TTL. - `static_prefix_cache_control(is_first_party, ttl)` forwards the config-resolved TTL; the 3 P wiremock test now pins `ttl: "1h"` rides through so a future refactor can't silently regress it.

The config / capability structs had grown to the point where related fields were scattered (e.g. `api_key` and `base_url` separated by `model` and `max_tokens`; `context_management` and `context_1m` separated by three `effort_*` flags). Regroup so adjacent lines stay conceptually related: - `ClientConfig` (TOML `[client]`) / `Config` (resolved): connection (`api_key` / `auth`, `base_url`) → model selection (`model`, `effort`) → request tuning (`max_tokens`, `prompt_cache_ttl`, `thinking`) → display (`show_thinking`). - `Capabilities` (model.rs): context-related flags stay together (`context_management` now adjacent to `context_1m`), then the three `effort_*` flags, then `structured_outputs`. Also updates every construction site (struct literals in tests, `ClientConfig::merge`, `Config::load`, the `MODELS` table, the `capability_flags_match_upstream_substring_predicates` assertion block) and the module-level docstring bullet list to match the new field order. No behavior change.

…splay Closes the four wire-level gaps between oxide-code and claude-code 2.1.119 observed in the packet capture. Everything was already resolved at `Config::load`; this commit just forwards the config values into the request body. - `output_config.effort`: `OutputConfig` is now a generic wrapper carrying `format` + `effort`, both skip-if-none. A new `OutputConfig::new(format, effort)` returns `None` when both are empty so we never ship a bare `{}`. `stream_message` populates `effort` from `Config.effort` (already clamped against the model ceiling at load); `complete` populates `format` for the existing structured-outputs path. - `context_management.edits`: new `ContextManagement { edits: [...] }` body field with a single `clear_thinking_20251015 keep=all` entry, gated on `Capabilities::context_management` — the same flag that drives the matching beta header, so body and header stay in sync. One-shot `complete` calls skip it, matching claude-code. - `thinking.display`: `ThinkingConfig::Adaptive` grows an optional `display: Option<ThinkingDisplay>` whose one variant `Summarized` opts 4.7 into streaming summarized thinking text. `Config::load` sets `Some(Summarized)` iff `show_thinking`; older models ignore the unknown field. `Omitted` is deliberately not modelled — the default (`None` → field absent) already yields 4.7's `omitted` wire value. New tests (6 streaming-body + 1 complete-body) pin each body field against a captured serialized request. The existing 3 P base-URL regression test now also asserts the TTL rides through, so future refactors can't silently lose one of the three cache-related wire bits.

- `docs/research/anthropic-api.md`: new "Agentic Request Body Fields" section covering `output_config.effort`, `context_management.edits`, and `cache_control.ttl` — the three body knobs that partner an existing beta header and are silent no-ops when shipped without it. Covers per-model ceilings, the body-header coupling invariant oxide-code maintains, and the 2026-03 TTL-default drop. Extends the per-model beta table's key rules to flag that `effort` + `context-management` need their body fields; extends Sources to document that the body-field research is empirical (live packet capture against a local SSE proxy on 2026-04-24). - `docs/research/extended-thinking.md`: new "Display modes (Opus 4.7+)" subsection in "Thinking Configuration" covering the silent default flip from `summarized` to `omitted` and how oxide-code couples `thinking.display` to `config.show_thinking`. Updates the closing paragraph to mention the field.

Documents the two new `[client]` knobs landed in the Opus 4.7 migration: `effort` (intelligence-vs-latency tier, maps to `output_config.effort`) and `prompt_cache_ttl` (cache duration with a 1 h default to counter Anthropic's 2026-03 default drop). - Rewrites the `[client]` config table to group the new keys with existing ones in a readable order (connection → model → tuning). - Adds per-model default tables and a tier guide for `effort`, lifted from the Opus 4.7 migration guide's recommendations. - Notes that `max_tokens` default now derives from the resolved `effort` (64 K / 32 K / 16 384 tiers) and that setting either env / TOML value overrides the derivation. - Links the `prompt_cache_ttl` default rationale back to the new Agentic Request Body Fields research section. - Calls out the `show_thinking`-to-`thinking.display` coupling on Opus 4.7 in the `[tui]` section. - Extends the env-variable table with `ANTHROPIC_EFFORT` and `OX_PROMPT_CACHE_TTL` so every knob has env parity.

Tighten the "prefer env var for secrets" note so the actual reason is visible: `ox.toml` is resolved by walking up from CWD, so a project-local copy is one stray `git add` from a leak. Also mirror what Claude Code does (env var or Keychain-backed OAuth; plaintext file only as a fallback). Fix the nested list under item 3. to 3-space indent, matching `docs/research/system-prompt.md` and CommonMark content-alignment. Markdownlint passes either way — MD007 only governs top-level unordered lists — so the convention has to be enforced by convention, not config.

codecov · 2026-04-24T07:18:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

The pattern `lookup(m).map(|info| info.capabilities).unwrap_or_default()` had grown to four call sites (two new in this PR, two pre-existing). Extract a named helper so the "unknown model → all-false capabilities" decay lives in one place and callers read as `capabilities_for(m)`. - `config::Config::load` — sets up the per-model effort / default-effort clamp at startup. - `client::anthropic::Client::stream_message` — gates the new `context_management.edits` body field on the cap flag. - `client::anthropic::compute_betas` — per-model beta-header set. - `client::anthropic::supports_structured_outputs` — swaps the `is_some_and` short-circuit for the same helper; behavior is identical since `Capabilities::default().structured_outputs == false`. No behavior change. The four sites now share one definition of "unknown model fallback" instead of re-stating it.

Three small cleanups on the `OutputConfig` body wrapper that went in with this PR's agentic-fields commit: - Drop the unused `#[derive(Default)]`. Every construction flows through `OutputConfig::new`; no call site invokes `::default()` and no struct literal builds one directly. - Fix the intra-doc link `[`Self::is_empty`]` — that method never existed; the emptiness check lives in `Self::new`. - Collapse `Self::new`'s if/else to a single `.then_some(...)`. Same behavior, same skip-serializing-if gates on the inner fields, two lines instead of six.

Aligns the wire `User-Agent: claude-cli/...` claim and the billing header's `cc_version=...` field with the version this PR's body-shape research was pinned against. Every wire comment and doc reference in this PR cites `claude-code 2.1.119`; the constant was the one place that still said 2.1.101. The `build_billing_header_includes_cc_version` test reads the constant via `format!`, so no test-side churn is needed. The `docs/research/system-prompt.md` "Based on v2.1.101" line stays — that research is tied to the version it was performed against, not to the currently-claimed wire identity.

…ility Two related tweaks in `Capabilities`: - `use crate::config::Effort;` was repeated 3× inside the `Capabilities` methods (plus once inside each of the two test fns). Hoist to the module scope — the `config → model::lookup` direction is the only crate-level dependency edge between these two modules, so pulling the type in at the top doesn't introduce a cycle. Signatures now read as bare `Effort` instead of fully-qualified `crate::config::Effort`. - `accepts_effort` drops from `pub(crate)` to private. Its only caller is `clamp_effort` three lines below; no test exercises it directly. Per CLAUDE.md §Visibility: default to the smallest visibility needed. No behavior change.

Two subsumed tests removed from the Opus 4.7 migration: - `effort_serialize_matches_wire_tokens` + `effort_parses_all_valid_tokens` in `config.rs` checked adjacent halves of the same round-trip over the same variant list. Merged into `effort_round_trips_through_serde_and_fromstr`, matching the existing `prompt_cache_ttl_round_trips_through_serde_and_fromstr` shape sibling in the same file. The rejection test stays — that's a distinct code path. - `default_effort_matches_observed_claude_code_wire` in `model.rs` duplicated the assertions in `load_effort_default_follows_model_ceiling` (config.rs), which exercises the same per-model map through `Config::load` — the only production caller of `default_effort`. Per CLAUDE.md §Testing: drop tests subsumed by more thorough ones. The section divider above the surviving `clamp_effort` test is tightened accordingly. 915 → 913 tests; coverage of the removed assertions is retained.

The rustdoc above `Config` enumerated which field went in which bucket (connection / model / tuning / display), which the eight field declarations immediately below already show. Per CLAUDE.md §Comments ("comment the WHY, not the WHAT"), this is narration rather than context — trim to a single-line summary. `ClientConfig`'s parallel narration in `config/file.rs` stays: it describes the TOML-key grouping a user will type, which is a genuinely different audience than Rust callers reading the field list.

The old `sse_body(&[&str])` form pushed each fixture to write the frame header inline, so every frame landed as a two-line raw string: r#"event: content_block_delta data: {...}"#, That forces the literal newline into the source and defeats rustfmt's column alignment. It also lets a caller silently omit the `data:` prefix or the `\n\n` terminator, which would produce a malformed test fixture that still "worked" until a parser change caught it. New form: pass `(event, data)` pairs, and let the helper emit the SSE format. Call sites read as a sequence of typed frames; the helper owns the `\n\n` terminator and the `event: ` / `data: ` line prefixes. JSON data stays verbatim so packet-capture faithfulness is preserved. `format_collect` would be the idiomatic concat, but clippy (nursery) flags it; use `writeln!` against a preallocated buffer instead.

hakula139 added 9 commits April 24, 2026 15:15

hakula139 self-assigned this Apr 24, 2026

hakula139 added the enhancement New feature or request label Apr 24, 2026

hakula139 added 7 commits April 24, 2026 15:48

hakula139 merged commit 582ba15 into main Apr 24, 2026
4 checks passed

hakula139 deleted the feat/opus-4-7-migration branch April 24, 2026 08:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(client,config): proper Opus 4.7 support — effort tier, 1h prompt-cache, thinking display#29

feat(client,config): proper Opus 4.7 support — effort tier, 1h prompt-cache, thinking display#29
hakula139 merged 16 commits intomainfrom
feat/opus-4-7-migration

hakula139 commented Apr 24, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hakula139 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Wire-body changes (feat(client))

Config surface (feat(config))

Model-capability plumbing (feat(model))

Housekeeping (landed as separate commits)

Docs

Test plan

Deferred

Uh oh!

codecov Bot commented Apr 24, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hakula139 commented Apr 24, 2026 •

edited

Loading

Wire-body changes (`feat(client)`)

Config surface (`feat(config)`)

Model-capability plumbing (`feat(model)`)