Skip to content

OAuth forward-mode refresh tokens + DCR refresh_token grant advertisement#88

Open
BorisTyshkevich wants to merge 3 commits intomainfrom
fix/oauth-register-refresh-token-grant
Open

OAuth forward-mode refresh tokens + DCR refresh_token grant advertisement#88
BorisTyshkevich wants to merge 3 commits intomainfrom
fix/oauth-register-refresh-token-grant

Conversation

@BorisTyshkevich
Copy link
Copy Markdown
Collaborator

@BorisTyshkevich BorisTyshkevich commented Apr 22, 2026

Summary

In forward mode altinity-mcp previously rejected grant_type=refresh_token outright, so MCP-client sessions died when the upstream ID token expired (~1 h with Auth0 defaults), forcing a full browser re-auth. Lengthening the upstream lifetime would weaken the very property forward mode exists for: ClickHouse validates the bearer end-to-end via token_processors.

This PR ships forward-mode refresh, opt-in behind a new upstream_offline_access flag. The MCP refresh token is a stateless JWE wrapping the upstream IdP's refresh token (mirroring the gating-mode pattern that already exists, keyed by gating_secret_key). The forward-mode invariant is preserved: the bearer that reaches ClickHouse is still the upstream-IdP-signed JWT.

The PR also includes a small companion fix to dynamic client registration — without it, strict OAuth clients (Claude.ai) never attempt grant_type=refresh_token regardless of what the server supports, because RFC 7591 §3.2.1 makes the registration response's grant_types authoritative for the client.

What changed

1. Forward-mode refresh tokens (new feature, opt-in)

When upstream_offline_access: true and forward mode:

  • offline_access is appended to the upstream authorize redirect.
  • The upstream refresh_token is captured and wrapped in a JWE keyed by gating_secret_key. The cleartext upstream refresh never leaves MCP. The MCP client receives only the opaque JWE.
  • On grant_type=refresh_token, MCP decrypts the JWE, calls upstream /oauth/token with grant_type=refresh_token, re-validates the new ID token via the existing JWKS path (or upstream userinfo when no id_token is returned, mirroring the callback's behavior), mints a new JWE around the rotated upstream refresh, and returns the new pair.
  • access_token reaching ClickHouse remains the upstream JWT verbatim. No CH-side change.

Default false. Three reasons:

  1. Stolen-token blast radius. Today in forward mode, a stolen MCP token is valid ≤ upstream ID-token TTL (~1 h). Turning refresh on raises that to refresh_token_ttl_seconds (default 30 d). Operators who chose forward mode for the short ceiling must consciously accept the new envelope.
  2. offline_access may not be enabled upstream. Auth0 requires it on the tenant API; Okta on the app grant types; Azure AD as exposed scope. Sending offline_access to a non-configured IdP can hard-fail or silently strip it. Default-off lets operators stage IdP first.
  3. Refresh-rotation policy is an upstream operator decision (often a separate identity team).

2. DCR grant_types includes refresh_token (one-line fix)

/.well-known/oauth-authorization-server already advertises refresh_token in grant_types_supported, and /oauth/token already accepts the grant. But /oauth/register returned \"grant_types\": [\"authorization_code\"], which is authoritative per RFC 7591. Strict clients (Claude.ai) honor the registration response and never attempt refresh.

-\"grant_types\": []string{\"authorization_code\"},
+\"grant_types\": []string{\"authorization_code\", \"refresh_token\"},

Without this, the new forward-mode refresh path is server-supported but client-unreachable.

3. Rename: gating primitives → JWE primitives

The encoder/decoder/secret helpers are no longer gating-mode-specific:

Before After
`encodeGatingArtifact` `encodeJWEArtifact`
`decodeGatingArtifact` `decodeJWEArtifact`
`mustGatingSecret` `mustJWESecret`
`oauthGatingSecret` `oauthJWESecret`

Pure mechanical search-and-replace; no behavior change. Internal call sites only — the `OAuthConfig.GatingSecretKey` config field and YAML key are preserved for backward compatibility with existing deployments.

Files changed

  • `pkg/config/config.go` — new `UpstreamOfflineAccess` field on `OAuthConfig`.
  • `pkg/jwe_auth/jwe_auth.go` — added `upstream_refresh_token` to the JWE claims whitelist.
  • `cmd/altinity-mcp/oauth_server.go` — rename + struct extension + scope-on-authorize + JWE mint on auth-code exchange + new `handleOAuthTokenRefreshForward` + DCR grant_types fix.
  • `cmd/altinity-mcp/oauth_server_test.go` — extended mock IdP with optional `refreshHandler` (strict single-use rotation, models Auth0/Okta-style reuse detection); new `TestOAuthForwardModeRefresh` + `TestOAuthAuthorizeOfflineAccessScope`.
  • `docs/oauth_authorization.md` — rewrote the forward-mode refresh section to describe the opt-in model, JWE wrapping, operator setup, and stateless limitations.

Test plan

  • `go build ./...` — clean
  • `go vet ./...` — clean
  • `TESTCONTAINERS_RYUK_DISABLED=true go test -run 'TestOAuth|TestRegisterOAuth|TestEncode|TestPKCE' ./cmd/altinity-mcp/` — pass
  • `go test ./pkg/jwe_auth/ ./pkg/config/` — pass
  • New tests cover: auth-code response includes refresh_token, refresh grants new upstream ID token + rotates, IdP rotation invalidates rotated-out MCP refresh (surfaces upstream invalid_grant), malformed refresh rejected, gating_secret rotation invalidates outstanding tokens, `offline_access` scope appended/omitted on authorize per flag/mode.
  • e2e image built: `ghcr.io/altinity/altinity-mcp:forward-mode-refresh-c54aa28` (multi-arch amd64/arm64).
  • End-to-end verify against Claude.ai + an Auth0 tenant with `offline_access` enabled and rotation: session survives past upstream ID-token TTL without re-auth prompt.

Backward compatibility

  • `upstream_offline_access` defaults `false` — existing forward-mode deployments unchanged.
  • `gating_secret_key` config field/YAML name preserved.
  • Gating mode untouched.
  • DCR change is additive (clients that didn't request `refresh_token` still won't get one).
  • No CH-side change. No helm-chart change beyond the new optional field.

Security envelope

Property Gating mode Forward + refresh (new) Forward (default)
Bearer reaching CH MCP-issued JWT Upstream-IdP JWT Upstream-IdP JWT
Who validates identity at CH MCP Upstream IdP via JWKS Upstream IdP via JWKS
Session lifetime refresh_token_ttl_seconds upstream IdP refresh policy upstream ID-token lifetime
Effect of stolen MCP refresh new MCP tokens until JWE exp new upstream ID tokens until JWE exp + IdP rotation policy if enabled n/a (no MCP refresh)
Mitigation rotate gating_secret_key rotate gating_secret_key + IdP-side reuse detection n/a

🤖 Generated with Claude Code

The /oauth/register endpoint returned only `authorization_code` in
`grant_types`, even though:

- `/.well-known/oauth-authorization-server` already advertised
  `refresh_token` via `grant_types_supported`
- the `/oauth/token` endpoint already accepted `grant_type=refresh_token`
- the auth_code exchange already minted a `refresh_token` in the
  response body

Per RFC 7591 §3.2.1, the `grant_types` list in the registration
response is authoritative for the client. Strict clients (observed with
Claude.ai) treat an omitted grant as forbidden and silently skip the
refresh flow — forcing end users to re-authorize through the browser on
every new chat session.

Include `refresh_token` alongside `authorization_code` so the
registration response matches server capabilities. Add a regression
assertion to the existing `dynamic_client_registration` subtest that
pins both grants in the response body.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
BorisTyshkevich added a commit that referenced this pull request Apr 22, 2026
Brings the refresh_token grant advertisement fix into the
interserver-auth test build so Claude.ai can exercise the full refresh
flow against gated cluster-secret deployments.

Will be dropped automatically once #88 merges to main and this branch
rebases onto main.
Forward mode previously rejected grant_type=refresh_token outright, so
MCP-client sessions died when the upstream ID token expired (~1 h with
Auth0 defaults). Lengthening the upstream lifetime would weaken the very
property that motivates forward mode (CH validates the bearer end-to-end
via token_processors).

When the new upstream_offline_access flag is set in forward mode:
- offline_access is appended to the upstream authorize redirect
- the upstream refresh token is captured and wrapped in a stateless JWE
  keyed by gating_secret_key; the JWE is returned to the MCP client as
  refresh_token (the cleartext upstream refresh never leaves MCP)
- on grant_type=refresh_token, MCP decrypts the JWE, calls the upstream
  /oauth/token with the cleartext, re-validates the new ID token via the
  existing JWKS path (or upstream userinfo when no id_token is returned),
  mints a new JWE around the rotated upstream refresh, and returns the
  new pair. The access_token reaching ClickHouse remains the upstream
  IdP's signed JWT verbatim.

The flag defaults to false so existing forward-mode deployments are
unaffected. Three reasons for the default: (1) refresh widens the stolen-
token blast radius from upstream ID-token TTL to refresh_token_ttl_seconds
(default 30 d), which operators must consciously accept; (2) offline_access
requires upstream IdP configuration that may not yet be in place; (3)
refresh-rotation policy is a separate operator decision often owned by the
identity team.

Also renames encodeGatingArtifact/decodeGatingArtifact/mustGatingSecret
/oauthGatingSecret to encodeJWEArtifact/decodeJWEArtifact/mustJWESecret
/oauthJWESecret since the primitives are now shared by both modes. The
config field name (gating_secret_key) is preserved for YAML backward
compatibility.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@BorisTyshkevich BorisTyshkevich changed the title Advertise refresh_token grant in dynamic client registration response OAuth forward-mode refresh tokens + DCR refresh_token grant advertisement Apr 26, 2026
When forwarding the upstream id_token as the bearer, the old code reported
upstream tokenResp.ExpiresIn (the access_token's TTL) as expires_in. Auth0
returns expires_in=86400 for the access_token while the id_token's own
exp is iat+3600, so MCP advertised a 24h lifetime to clients that were
in fact getting a 1h bearer. Claude.ai (and other MCP clients) schedule
proactive refresh from expires_in, so the session broke at the real
bearer expiry with `jwt::error::token_verification_exception, e.what() =
token expired` on the ClickHouse side.

Use the id_token's JWT exp (already extracted via ValidateUpstreamIdentity-
Token's OAuthClaims.ExpiresAt) when the bearer is the id_token; fall back
to tokenResp.ExpiresIn for the access_token-only path; 1h as last-resort
default. Applied in both handleOAuthCallback and handleOAuthTokenRefresh-
Forward.

Also adds INFO/WARN logging across the forward-mode auth-code + refresh
flow (dispatcher entry, upstream token exchange success/failure, JWE
mint, JWE decode, upstream refresh call) since the previous code was
silent on whether Auth0 returned a refresh_token, whether MCP minted
one, and whether refresh hits ever reached the handler — making
diagnostics like the one above significantly faster.

Tests: updated TestOAuthForwardModeBrowserLoginUsesUpstreamBearerToken
(prior assertion `expires_in <= 1800` codified the bug; now asserts
the id_token's real exp ~3600); added matching expires_in assertions
in TestOAuthForwardModeRefresh for both the auth-code and refresh
response paths to prevent regression.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant