Skip to content

Restore refresh-token silent renew with renew-or-signout on 401#5700

Draft
samsondav wants to merge 2 commits into
canton-network:mainfrom
Avro-Digital:sam-avro/issue-5682-renew-or-signout
Draft

Restore refresh-token silent renew with renew-or-signout on 401#5700
samsondav wants to merge 2 commits into
canton-network:mainfrom
Avro-Digital:sam-avro/issue-5682-renew-or-signout

Conversation

@samsondav
Copy link
Copy Markdown
Contributor

@samsondav samsondav commented May 27, 2026

Alternative to #5683 — same bug, different design. Posting both so maintainers can pick.

PR #5545 set automaticSilentRenew={false} on the react-oidc-context provider and removed offline_access from the scope request. Either change on its own would have been fine; together they leave the UIs with no refresh-token mechanism at all, so every access-token expiry forces a full re-login (5 minutes on a default Keycloak deployment).

This PR restores token renewal for both Keycloak and Auth0 without re-enabling the library's timer-based silent renew (which races the new 401 handler under refresh-token rotation, see "Why a single renewal path" below).

What changes

1. renewOrSignout in UserProvider (UserContext.tsx). On splice:auth-expired (fired from BaseApiMiddleware / LedgerApiContext on any 401), try one auth.signinSilent() before signing the user out. Single-flighted via a renewInFlight ref so concurrent 401s collapse to one grant. On success, react-query cache invalidation is deferred until the refreshed access token has propagated into userAccessToken state — a separate effect watches that state and fires queryClient.invalidateQueries() so the deferred invalidation doesn't race the state update.

2. Library's automaticSilentRenew stays false (matches what #5545 set). With Splice owning the renewal path, there is no need for the library's proactive timer — and re-enabling it would race the 401 handler (next section).

3. Optional enable_offline_scope: boolean on the rs256 AuthConfig for Auth0 deployments where the IdP only returns a refresh token when offline_access is in the scope. Default false: Keycloak issues a session-bound refresh token without it, and on Keycloak offline_access produces a long-lived offline refresh token (larger XSS surface in a browser-based SPA).

Why a single renewal path

An earlier revision of this PR re-enabled automaticSilentRenew={true} alongside renewOrSignout, on the theory that the proactive timer would cover normal expiry and the 401 handler would cover edge cases (suspended tabs, clock skew). Adversarial review surfaced a race: the library timer fires signinSilent at exp - 60s using the stored refresh token; a near-simultaneous user request 401s and the handler fires its own signinSilent. With IdPs that rotate refresh tokens (Auth0 by default, Keycloak when configured), one of the two concurrent grants is rejected — the refresh token was already exchanged. The handler's catch block then signs the user out even though the other renewal succeeded.

Dropping automaticSilentRenew to false and routing all renewal through the single-flighted handler eliminates the race. Trade: one extra 401 round-trip per token expiry per active session (the user's first request after expiry takes a renewal + retry instead of using a pre-refreshed token). That cost is bounded and the net UX is far better than the forced re-login this PR replaces.

How this compares to #5683

#5683 takes a different approach: keep automaticSilentRenew={true} and make enable_offline_scope a mandatory operator-set boolean. That forces the Auth0-vs-Keycloak decision at configuration time but leaves two cases open:

  • Auth0 without offline_access opt-in: no refresh token, silent renew has nothing to use, user is logged out at access-token TTL.
  • The same silent-renew vs concurrent-401 race documented above.

This PR keeps the schema field optional (no breaking config change) and uses the renewOrSignout design to handle both cases at runtime. Pick whichever shape fits.

Testing

New tests under apps/common/frontend/src/__tests__/auth/:

  • UserContextRenewOrSignout.test.tsx: signinSilent called when a refresh token is present; invalidateQueries fires after token state propagates (deferred path); removeUser called when no refresh token; failure path falls through to signout; invalidateQueries not called on the signout path; single-flight under concurrent expired events; listener removed on unmount.
  • AuthProvider.test.tsx: asserts automaticSilentRenew=false (the design intent).
  • authSchema.test.ts: parse behaviour of optional enable_offline_scope.
  • oidcAuthToProviderProps.test.ts: scope construction with and without the opt-in.

Existing 401-fallback coverage in BaseApiMiddleware.test.ts and LedgerApiClient.test.ts is untouched — the handler still fires splice:auth-expired on 401, only the handler's response changes.

apps/sv/frontend/src/__tests__/auth/redirect.test.tsx is updated to wrap UserProvider in QueryClientProvider (now required because UserProvider calls useQueryClient). apps/sv/frontend/package.json pins @tanstack/react-query: 5.90.20 directly to match the other apps and prevent singleton-resolution drift.

References

  • Refresh-token path: oidc-client-ts.js:2533 (if (user?.refresh_token) return this._useRefreshToken(state)).
  • Iframe fallback gate: oidc-client-ts.js:2540 (throws "No silent_redirect_uri configured" when no refresh token and no silent_redirect_uri — Splice never sets silent_redirect_uri, so the iframe path is unreachable).
  • Auth0 docs require offline_access to receive a refresh token: https://auth0.com/docs/secure/tokens/refresh-tokens/get-refresh-tokens

Fixes #5682

…on-network#5682)

PR canton-network#5545 set `automaticSilentRenew={false}` on the `react-oidc-context`
provider at the same time as it removed `offline_access` from the
scope request. With both in place, the UIs no longer use any refresh
token, so users are forced to re-authenticate every time the access
token expires - every 5 minutes against a default Keycloak deployment.

This change restores token renewal across both IdP shapes without
introducing the iframe-based silent-auth fallback that is broken by
default on Safari ITP and Firefox Total Cookie Protection.

Two layers:

1. Re-enable `automaticSilentRenew` on `OidcAuthProvider`. The library
   uses the `refresh_token` grant when one is present in the stored
   `User`. The iframe fallback (oidc-client-ts.js:2540) is structurally
   disabled because Splice does not pass `silent_redirect_uri`, so
   `signinSilent()` throws "No silent_redirect_uri configured" instead
   of attempting the iframe path.

2. Replace `signoutOnExpiry` with `renewOrSignout` in `UserProvider`.
   On `splice:auth-expired` (fired from `BaseApiMiddleware` /
   `LedgerApiContext` on any 401), the handler attempts one
   `auth.signinSilent()` before giving up and signing the user out.
   This recovers two scenarios:
   - The proactive `automaticSilentRenew` timer never fired (suspended
     tab, clock skew, or the access token expired before the
     `accessTokenExpiring` notification window).
   - A request was sent with a valid token but reached the server
     after expiry, producing a 401 concurrent with an in-flight silent
     renew.
   Single-flighted via `renewInFlight` so concurrent 401s from a
   fan-out collapse into one `signinSilent` call. On successful refresh
   we `invalidateQueries` so queries that already failed (react-query's
   `retryQuery` short-circuits on 401) pick up the new token.

`enable_offline_scope: boolean` is added as an optional rs256
`AuthConfig` field for Auth0 deployments where the IdP only returns a
refresh token when `offline_access` is explicitly requested. Default
false (Keycloak issues a session-bound refresh token without it; using
`offline_access` requests a long-lived offline refresh token, a larger
XSS surface for browser-based clients).

Tests:
- `AuthProvider.test.tsx`: asserts `automaticSilentRenew=true` and that
  `silent_redirect_uri` is not passed.
- `UserContextRenewOrSignout.test.tsx`: `signinSilent` is called when a
  refresh token is present; `removeUser` is called when not; failure
  path falls through to signout; concurrent 401 events collapse to a
  single `signinSilent` call.
- `authSchema.test.ts`: parse behavior of `enable_offline_scope`.
- `oidcAuthToProviderProps.test.ts`: scope construction with and
  without the opt-in, the `daml_ledger_api` no-openid carve-out, and
  no-leak of the field into provider props.

Fixes canton-network#5682

Signed-off-by: Sam Davies <sam@avrofi.com>
After running a full review fleet (correctness, security, test coverage,
style, cross-cutting, holistic, codex), several findings:

- Codex high finding: with `automaticSilentRenew={true}` AND the new
  401-driven `renewOrSignout`, both paths can fire `signinSilent`
  concurrently. With IdPs that rotate refresh tokens, the second call
  fails (refresh token already exchanged) and the catch block signs the
  user out even though the first renew succeeded. Fix: drop
  `automaticSilentRenew` to `false`. The 401-driven path becomes the
  only renewal mechanism, single-flighted via `renewInFlight`. Trade:
  one extra 401 round-trip per token expiry per active session;
  eliminates the rotation collision.

- Correctness + codex: `invalidateQueries()` was called immediately
  after `signinSilent` resolved, before `userAccessToken` state had
  propagated. Refetches went out through clients still holding the
  stale token and produced a second 401. Fix: defer invalidation via
  a `pendingInvalidation` ref set in renewOrSignout and consumed in a
  new effect that watches `userAccessToken`.

- Cross-cutting + codex critical: `apps/sv/frontend/.../redirect.test.tsx`
  rendered `UserProvider` without wrapping it in `QueryClientProvider`.
  After this PR's `useQueryClient()` call inside UserProvider, the test
  throws at render. Fix: wrap the test harness in QueryClientProvider.

- Cross-cutting: `apps/sv/frontend/package.json` did not declare
  `@tanstack/react-query` directly, relying on transitive hoist.
  Version drift could give two react-query instances in the bundle.
  Fix: pin `@tanstack/react-query: 5.90.20` matching the other apps.

- Style: `AuthProvider.tsx` comment said "structurally disabled" — a
  vague LLM-tell phrase. Rewritten to state the mechanical fact
  directly. The `renewOrSignout` block comment was trimmed from 10
  lines to 6.

- Test coverage: added assertions that `invalidateQueries` fires on
  successful refresh (after the deferred-effect path completes) and is
  NOT called on the signout path. Added a listener-cleanup test that
  ensures `signinSilent` is not invoked after unmount.

Signed-off-by: Sam Davies <sam@avrofi.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Starting from 0.6.5 all UIs require re-login after access token expires

5 participants