Conversation
Two tiny housekeeping items surfaced during the Opus 4.7 migration audit, landed together because they share the same scope and have zero behavior change: - Rename four test functions whose names compacted the model-family digit (`opus_46`, `sonnet_45`, `opus_47`) to the canonical underscore-separated form (`opus_4_6`, `sonnet_4_5`, `opus_4_7`), matching the API model id with `-` → `_` and the shape already used by neighbouring `compute_betas_haiku_4_5_*` and `model::tests::opus_4_7_*` fns. - Replace three `let _ = <expr>;` statements with `_ = <expr>;`. The bare-`_` form is a stable Rust statement that reads as "explicitly discard" without introducing a binding.
Opus 4.7's `output_config.effort` parameter introduces two new upper-bound levels — `xhigh` (new for 4.7) and `max` (Opus-only). Older models that accept `effort` at `low`/`medium`/`high` 400 on either of these. Rather than scatter per-model ceilings across request-building code, encode them in the capability table. - `effort_max` — Opus 4.6, Opus 4.7. - `effort_xhigh` — Opus 4.7 only. Flags are independent of each other and of the existing `effort` gate (which still decides whether `output_config.effort` is sent at all) — the matrix is small enough to enumerate and per-model allowlists in the upstream migration guide are not substring-shaped, so lifting them into the capability struct keeps the clamp a pure lookup. Also renames `opus_4_7_is_treated_as_4_6_equivalent` to `opus_4_7_uniquely_supports_xhigh` (the old name is a lie now that 4.7 diverges) and tightens the substring-predicate test's variable names (`expect_one_million` → `expect_context_1m`, `expect_thinking` → `expect_interleaved_thinking`) to match the field names they compare against. Fields are held behind `#[cfg_attr(not(test), expect(dead_code))]` until the next commit wires them into `Config::load`'s clamp.
Adds `[client].effort` / `ANTHROPIC_EFFORT` with five wire tokens
(`low`, `medium`, `high`, `xhigh`, `max`) and resolves them once at
`Config::load` against the per-model capability ceiling from
`crate::model::Capabilities`. Callers read `Config.effort:
Option<Effort>` and forward unchanged — no per-request clamping or
model sniffing.
Resolution:
1. `ANTHROPIC_EFFORT` env → `FromStr`. Unknown tokens fail loudly with
an actionable message listing the five accepted ones.
2. Fallback to `[client].effort` (TOML, via `Deserialize`).
3. Fallback to `Capabilities::default_effort()` — `Xhigh` on 4.7,
`High` on other effort-capable models, `None` elsewhere.
4. Clamp via `Capabilities::clamp_effort(pick)` — the highest
supported level ≤ `pick`. Never silently escalates (an `xhigh`
pick on Opus 4.6 clamps down to `high`, not up to `max`). Model
rows without the `effort` capability produce `None` regardless of
user pick.
`max_tokens` default now scales with the resolved effort to match
claude-code 2.1.119: 64 000 for `xhigh` / `max`, 32 000 for `high`,
16 384 otherwise. User overrides via env / TOML still win.
`Config.effort` is gated behind `#[cfg_attr(not(test), expect(...))]`
until the next commit wires it into `stream_message` — the `expect`
will fire unfulfilled then, prompting removal.
Also refactors the `Capabilities::{accepts_effort, clamp_effort,
default_effort}` methods to take `self` by value (clippy catches the
trivially-copy reference).
Anthropic silently dropped the default ephemeral cache TTL from 1 h
to 5 m on 2026-03-06, quietly cutting prompt-caching savings from
80%+ to 40-55% on any session longer than 5 min. The hit-rate
recovery is worth the write premium for real agent sessions, so
make 1 h the default and let users opt down.
- New `PromptCacheTtl { FiveMin, OneHour }` enum with explicit
`#[serde(rename)]` tokens (`"5m"` / `"1h"`) matching the Anthropic
wire format and claude-code's observed packet shape.
- `Config.prompt_cache_ttl: PromptCacheTtl` (non-optional — every
request carries a TTL, the only question is which).
- Resolution: `OX_PROMPT_CACHE_TTL` env → `[client].prompt_cache_ttl`
TOML → default `OneHour`. Invalid tokens fail loudly.
- `CacheControl` grows an optional `ttl` field that serializes as
`"1h"` only when set; 5 m keeps today's exact wire shape (no
`ttl` key, matching the server default). `scope` gating stays
unchanged — 3 P still drops global scope but keeps the TTL.
- `static_prefix_cache_control(is_first_party, ttl)` forwards the
config-resolved TTL; the 3 P wiremock test now pins `ttl: "1h"`
rides through so a future refactor can't silently regress it.
The config / capability structs had grown to the point where related fields were scattered (e.g. `api_key` and `base_url` separated by `model` and `max_tokens`; `context_management` and `context_1m` separated by three `effort_*` flags). Regroup so adjacent lines stay conceptually related: - `ClientConfig` (TOML `[client]`) / `Config` (resolved): connection (`api_key` / `auth`, `base_url`) → model selection (`model`, `effort`) → request tuning (`max_tokens`, `prompt_cache_ttl`, `thinking`) → display (`show_thinking`). - `Capabilities` (model.rs): context-related flags stay together (`context_management` now adjacent to `context_1m`), then the three `effort_*` flags, then `structured_outputs`. Also updates every construction site (struct literals in tests, `ClientConfig::merge`, `Config::load`, the `MODELS` table, the `capability_flags_match_upstream_substring_predicates` assertion block) and the module-level docstring bullet list to match the new field order. No behavior change.
…splay
Closes the four wire-level gaps between oxide-code and claude-code
2.1.119 observed in the packet capture. Everything was already
resolved at `Config::load`; this commit just forwards the config
values into the request body.
- `output_config.effort`: `OutputConfig` is now a generic wrapper
carrying `format` + `effort`, both skip-if-none. A new
`OutputConfig::new(format, effort)` returns `None` when both are
empty so we never ship a bare `{}`. `stream_message` populates
`effort` from `Config.effort` (already clamped against the model
ceiling at load); `complete` populates `format` for the existing
structured-outputs path.
- `context_management.edits`: new `ContextManagement { edits: [...] }`
body field with a single `clear_thinking_20251015 keep=all` entry,
gated on `Capabilities::context_management` — the same flag that
drives the matching beta header, so body and header stay in sync.
One-shot `complete` calls skip it, matching claude-code.
- `thinking.display`: `ThinkingConfig::Adaptive` grows an optional
`display: Option<ThinkingDisplay>` whose one variant `Summarized`
opts 4.7 into streaming summarized thinking text. `Config::load`
sets `Some(Summarized)` iff `show_thinking`; older models ignore
the unknown field. `Omitted` is deliberately not modelled — the
default (`None` → field absent) already yields 4.7's `omitted`
wire value.
New tests (6 streaming-body + 1 complete-body) pin each body field
against a captured serialized request. The existing 3 P base-URL
regression test now also asserts the TTL rides through, so future
refactors can't silently lose one of the three cache-related wire
bits.
- `docs/research/anthropic-api.md`: new "Agentic Request Body Fields" section covering `output_config.effort`, `context_management.edits`, and `cache_control.ttl` — the three body knobs that partner an existing beta header and are silent no-ops when shipped without it. Covers per-model ceilings, the body-header coupling invariant oxide-code maintains, and the 2026-03 TTL-default drop. Extends the per-model beta table's key rules to flag that `effort` + `context-management` need their body fields; extends Sources to document that the body-field research is empirical (live packet capture against a local SSE proxy on 2026-04-24). - `docs/research/extended-thinking.md`: new "Display modes (Opus 4.7+)" subsection in "Thinking Configuration" covering the silent default flip from `summarized` to `omitted` and how oxide-code couples `thinking.display` to `config.show_thinking`. Updates the closing paragraph to mention the field.
Documents the two new `[client]` knobs landed in the Opus 4.7 migration: `effort` (intelligence-vs-latency tier, maps to `output_config.effort`) and `prompt_cache_ttl` (cache duration with a 1 h default to counter Anthropic's 2026-03 default drop). - Rewrites the `[client]` config table to group the new keys with existing ones in a readable order (connection → model → tuning). - Adds per-model default tables and a tier guide for `effort`, lifted from the Opus 4.7 migration guide's recommendations. - Notes that `max_tokens` default now derives from the resolved `effort` (64 K / 32 K / 16 384 tiers) and that setting either env / TOML value overrides the derivation. - Links the `prompt_cache_ttl` default rationale back to the new Agentic Request Body Fields research section. - Calls out the `show_thinking`-to-`thinking.display` coupling on Opus 4.7 in the `[tui]` section. - Extends the env-variable table with `ANTHROPIC_EFFORT` and `OX_PROMPT_CACHE_TTL` so every knob has env parity.
Tighten the "prefer env var for secrets" note so the actual reason is visible: `ox.toml` is resolved by walking up from CWD, so a project-local copy is one stray `git add` from a leak. Also mirror what Claude Code does (env var or Keychain-backed OAuth; plaintext file only as a fallback). Fix the nested list under item 3. to 3-space indent, matching `docs/research/system-prompt.md` and CommonMark content-alignment. Markdownlint passes either way — MD007 only governs top-level unordered lists — so the convention has to be enforced by convention, not config.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
The pattern `lookup(m).map(|info| info.capabilities).unwrap_or_default()` had grown to four call sites (two new in this PR, two pre-existing). Extract a named helper so the "unknown model → all-false capabilities" decay lives in one place and callers read as `capabilities_for(m)`. - `config::Config::load` — sets up the per-model effort / default-effort clamp at startup. - `client::anthropic::Client::stream_message` — gates the new `context_management.edits` body field on the cap flag. - `client::anthropic::compute_betas` — per-model beta-header set. - `client::anthropic::supports_structured_outputs` — swaps the `is_some_and` short-circuit for the same helper; behavior is identical since `Capabilities::default().structured_outputs == false`. No behavior change. The four sites now share one definition of "unknown model fallback" instead of re-stating it.
Three small cleanups on the `OutputConfig` body wrapper that went in with this PR's agentic-fields commit: - Drop the unused `#[derive(Default)]`. Every construction flows through `OutputConfig::new`; no call site invokes `::default()` and no struct literal builds one directly. - Fix the intra-doc link `[`Self::is_empty`]` — that method never existed; the emptiness check lives in `Self::new`. - Collapse `Self::new`'s if/else to a single `.then_some(...)`. Same behavior, same skip-serializing-if gates on the inner fields, two lines instead of six.
Aligns the wire `User-Agent: claude-cli/...` claim and the billing header's `cc_version=...` field with the version this PR's body-shape research was pinned against. Every wire comment and doc reference in this PR cites `claude-code 2.1.119`; the constant was the one place that still said 2.1.101. The `build_billing_header_includes_cc_version` test reads the constant via `format!`, so no test-side churn is needed. The `docs/research/system-prompt.md` "Based on v2.1.101" line stays — that research is tied to the version it was performed against, not to the currently-claimed wire identity.
…ility Two related tweaks in `Capabilities`: - `use crate::config::Effort;` was repeated 3× inside the `Capabilities` methods (plus once inside each of the two test fns). Hoist to the module scope — the `config → model::lookup` direction is the only crate-level dependency edge between these two modules, so pulling the type in at the top doesn't introduce a cycle. Signatures now read as bare `Effort` instead of fully-qualified `crate::config::Effort`. - `accepts_effort` drops from `pub(crate)` to private. Its only caller is `clamp_effort` three lines below; no test exercises it directly. Per CLAUDE.md §Visibility: default to the smallest visibility needed. No behavior change.
Two subsumed tests removed from the Opus 4.7 migration: - `effort_serialize_matches_wire_tokens` + `effort_parses_all_valid_tokens` in `config.rs` checked adjacent halves of the same round-trip over the same variant list. Merged into `effort_round_trips_through_serde_and_fromstr`, matching the existing `prompt_cache_ttl_round_trips_through_serde_and_fromstr` shape sibling in the same file. The rejection test stays — that's a distinct code path. - `default_effort_matches_observed_claude_code_wire` in `model.rs` duplicated the assertions in `load_effort_default_follows_model_ceiling` (config.rs), which exercises the same per-model map through `Config::load` — the only production caller of `default_effort`. Per CLAUDE.md §Testing: drop tests subsumed by more thorough ones. The section divider above the surviving `clamp_effort` test is tightened accordingly. 915 → 913 tests; coverage of the removed assertions is retained.
The rustdoc above `Config` enumerated which field went in which
bucket (connection / model / tuning / display), which the eight
field declarations immediately below already show. Per CLAUDE.md
§Comments ("comment the WHY, not the WHAT"), this is narration
rather than context — trim to a single-line summary.
`ClientConfig`'s parallel narration in `config/file.rs` stays: it
describes the TOML-key grouping a user will type, which is a
genuinely different audience than Rust callers reading the field
list.
The old `sse_body(&[&str])` form pushed each fixture to write the
frame header inline, so every frame landed as a two-line raw string:
r#"event: content_block_delta
data: {...}"#,
That forces the literal newline into the source and defeats rustfmt's
column alignment. It also lets a caller silently omit the `data:`
prefix or the `\n\n` terminator, which would produce a malformed
test fixture that still "worked" until a parser change caught it.
New form: pass `(event, data)` pairs, and let the helper emit the
SSE format. Call sites read as a sequence of typed frames; the
helper owns the `\n\n` terminator and the `event: ` / `data: `
line prefixes. JSON data stays verbatim so packet-capture
faithfulness is preserved.
`format_collect` would be the idiomatic concat, but clippy (nursery)
flags it; use `writeln!` against a preallocated buffer instead.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Treating Opus 4.7 as "just another model-name swap" leaves real capability on the table and sends a few wire fields that don't match what Claude Code itself sends. This PR makes
claude-opus-4-7a first-class target: the wire body now opts into the new agentic fields, the user-facing config gainseffortandprompt_cache_ttlknobs, andmax_tokensfinally scales with intent instead of sitting at a hand-picked 32k.Ground truth for every wire decision came from pointing
claude -pat a local HTTP proxy and diffing the request body against what this client emits. That's what flagged a few non-obvious things the migration guide under-specified:thinking.displaydefaults flipped to"omitted"on 4.7 — to keep the existingshow_thinkingUX working we now explicitly send"summarized"when the user opts in.anthropic-betais still gated per-model (interleaved thinking, context management, fine-grained tool streaming) — no blanket header stuffing.context_management.editswithclear_thinking_20251015is what Claude Code sends for 4.6+; we match.cache_control.ttl: "1h"is opt-in viaENABLE_PROMPT_CACHING_1H— we default it on for agentic sessions where the system-prompt prefix is stable for hours.Wire-body changes (
feat(client))output_config.effortpopulated from resolvedConfig.effort, respecting per-model capability. Opus 4.7 getsxhigh; 4.6-class models getlow..max; older models see no field.thinking.displayemitted on 4.7 —"summarized"whenshow_thinking = true, else omitted so the API picks its own default ("omitted"). 4.6 and older keep the legacy adaptive-thinking block unchanged.context_management.edits = [{type: clear_thinking_20251015}]on 4.6+ (anything withcontext_management: truein the capability table).CacheControlgained an optionalttlfield; long-lived system-prompt breakpoints now carryttl: "1h"whenprompt_cache_ttl = 1h(the default).Config surface (
feat(config))New user-facing knobs, all optional, all with sensible per-model defaults:
effortANTHROPIC_EFFORTlow|medium|high|xhigh|max; clamped against model capability (Opus 4.7 unlocksxhigh; 4.6-class unlocksmax).max_tokensANTHROPIC_MAX_TOKENSmaxactually buys room to think.prompt_cache_ttlOX_PROMPT_CACHE_TTL1h5m|1h; opt-out to5mif short-lived caching is preferred.Model-capability plumbing (
feat(model))effort_maxflag — Opus 4.6+ and Sonnet 4.6+ (where themaxtier is valid).effort_xhighflag — Opus 4.7 only.Capabilitiesmethods:accepts_effort,clamp_effort,default_effort. Config resolution asks the capability table instead of string-matching model IDs.claude-opus-4-7upgraded from "same as 4.6" to its actual superset (effort_xhigh = true).Housekeeping (landed as separate commits)
compute_betas_opus_46_*→compute_betas_opus_4_6_*(also_45and_47). Easier to grep, consistent with model version strings.let _ = expr;→_ = expr;sweep across the crate. Matches the idiom already used in most of the tree.ClientConfig,Config,Capabilities,CreateMessageRequest, andCacheControlnow group related fields adjacently (connection → model → request tuning → cache-control bits) so the shape of the file matches the conceptual layering.Docs
docs/research/anthropic-api.md— new Agentic Request Body Fields section coveringoutput_config.effort,context_management.edits, andcache_control.ttl, cross-linked to the migration guide.docs/research/extended-thinking.md— new Display modes (Opus 4.7+) subsection explainingthinking.displayand its coupling toshow_thinking.docs/guide/configuration.md— refreshed examples, tables, and env-var list; added an Authentication-section paragraph explaining why we preferANTHROPIC_API_KEYover an on-disk key (project-localox.tomlis easy to commit by accident). Also fixed a 4-space → 3-space indent under an ordered-list item to match the rest of the docs; markdownlint doesn't enforce this (MD007 only covers top-level unordered lists) so it's convention-only.Test plan
cargo fmt --all --checkcargo buildcargo clippy -p oxide-code --all-targets -- -D warningscargo test -p oxide-code— 915 passedpnpm lint— 0 errors across 16 filespnpm spellcheck— 0 issuesclaude -p; diffed body shape against Opus 4.7 (effort,thinking.display,context_management,cache_control.ttlall match field-for-field).Deferred
apiKeyHelper-style secret-manager indirection (shell-out for the API key) — called out in the Authentication section as the obvious next step if we want to move away from env / plaintext entirely. Not in scope here./modelslash-command handoff for switching effort mid-session. TheEffortenum andaccepts_effortpath are picker-ready; the reducer wiring is a follow-up.