diff --git a/.changeset/clean-exit-on-expired-signin.md b/.changeset/clean-exit-on-expired-signin.md new file mode 100644 index 00000000..14ee4ae7 --- /dev/null +++ b/.changeset/clean-exit-on-expired-signin.md @@ -0,0 +1,9 @@ +--- +'ePDS': minor +--- + +Sign-in pages no longer strand users on a "session expired" dead end, and Resend no longer offers codes that won't work. + +**Affects:** End users + +**End users:** if your sign-in times out (you closed the tab and came back, or your wait was longer than the page can keep alive in the background), you are now taken back to the app you were signing in to so it can offer you a retry. The page also no longer offers Resend in the rare case where the new code wouldn't work — instead it tells you the sign-in has timed out and gives you a Start over button. No more typing a fresh code that fails. If for some reason the automatic return is not possible, the page shows a "Return to sign in" button so you can get back to the app yourself in one click. diff --git a/.changeset/par-heartbeat-keeps-slow-signins-alive.md b/.changeset/par-heartbeat-keeps-slow-signins-alive.md new file mode 100644 index 00000000..98de12d3 --- /dev/null +++ b/.changeset/par-heartbeat-keeps-slow-signins-alive.md @@ -0,0 +1,9 @@ +--- +'ePDS': minor +--- + +Slow sign-ins are less likely to time out before you finish entering your code. + +**Affects:** End users + +**End users:** if you take a few minutes to find your sign-in code in your inbox before entering it, you will no longer be bounced to a "session expired" page when you submit it. Closing the tab or walking away for a long stretch can still expire the flow, in which case the existing error pages still apply — but reading email at human speed should not. diff --git a/docs/design/par-expiry-and-clean-exits.md b/docs/design/par-expiry-and-clean-exits.md new file mode 100644 index 00000000..bd4adf1a --- /dev/null +++ b/docs/design/par-expiry-and-clean-exits.md @@ -0,0 +1,460 @@ +# PAR Expiry, Heartbeats, and Clean Exits + +Working design doc for the OTP / PAR / auth_flow timer interactions and the +"stuck in a loop" failure modes a slow user can trip during OAuth login. + +This is a living document — keep it current as scope, decisions, or +findings change. + +## Problem + +A real user reported being stuck in a login loop: + +1. They start an OAuth login (flow=email-first, `prompt=login`). +2. They receive the OTP email but wait >10 minutes before entering it. +3. The OTP-expired UI fires as expected. +4. They click "Resend" and receive a fresh OTP. +5. They enter the second OTP and see "Your sign-in took too long to + complete and timed out. Please start sign-in again." +6. There is no working path forward — the page is a dead end. + +We had landed several recent OTP-expiry fixes (PRs leading up to +`dacf1d2`, `0e62bd6`, etc.) and added e2e scenarios specifically for OTP +expiry, so the immediate question was: how did this slip through? + +## The three timers + +| Timer | Default | Where | +| ----------------------- | -------------- | ---------------------------------------------------- | +| OTP TTL | 10 min | better-auth `verification` row (auth-service SQLite) | +| PAR (`request_uri`) TTL | 5 min, sliding | `@atproto/oauth-provider` request store (pds-core) | +| auth_flow TTL | 60 min | auth-service SQLite + `epds_auth_flow` cookie | + +Each lives in a different layer because each layer owns a different +concept (better-auth: email-OTP only; oauth-provider: OAuth handshake +only; auth-service: glue ePDS adds to bridge them). + +### What each one gates + +- **OTP TTL** — the 8-char code in the email. Fixed at row creation. + Resend creates a new row. +- **PAR TTL** — the `request_uri` handle that points to all the OAuth + parameters (client_id, scope, code_challenge, redirect_uri, state) + the client pushed at flow start. Without it the OAuth flow has no + memory of the original request. **Sliding**: every successful + `requestManager.get()` resets `expiresAt` to `now + 5 min`. There is + no absolute ceiling — pinged often enough, the row lives indefinitely. + When the timer fires, the row is deleted in the same call that + throws. +- **auth_flow TTL** — auth-service's bridge that remembers which OAuth + flow a user is in across page navigations. Without it, `/auth/complete` + doesn't know which `request_uri` to redirect back to. + +### How the bug fires + +User sits on the OTP form for 6 minutes. During those 6 minutes: + +- OTP (10 min) — alive. +- PAR (5 min sliding) — **dead at minute 5**, because nothing on the + OTP form calls `requestManager.get()` while the user reads. +- auth_flow (60 min) — alive. + +User submits OTP. Auth-service verifies it (OTP alive, auth_flow alive +— looks fine). It then bridges back to the OAuth flow via PAR → dead → +user lands on whichever error page is downstream of the failing `get()` +with no recovery path. + +## Why the existing scenarios missed it + +| Scenario | OTP expired | PAR expired | auth_flow expired | Resend in flow | +| -------------------------------- | ----------- | ----------- | ----------------- | -------------- | +| `@otp-expiry` (existing) | Yes | **No** | No | Yes | +| `@par-callback-error` (existing) | No | Yes | No | No | +| `@otp-and-par-expiry` (new) | Yes | Yes | No | Yes | +| `@multiple-resend` (new) | No | Yes | No | Yes (twice) | +| `@prompt-login` (new) | No | Yes | No | No | +| `@recovery` (new) | No | Yes | No | No | + +`@otp-expiry` only ages the OTP row and _deliberately keeps the PAR +alive_ to isolate the auth-service-side TTL fix from the PDS layer. +But PAR's hardcoded 5-min TTL is shorter than OTP's 10-min TTL, so in +production the PAR is already dead by the time OTP expiry ever fires. +The scenario passes because it artificially preserves a row that +wouldn't exist in real life. + +`@par-callback-error` only covers the _response shape_ when PAR is +dead at callback (styled HTML vs raw JSON). It does not exercise +Resend at all and asserts nothing about user recovery. + +The four new scenarios cover four real-user paths that were untested. + +### Surfaces the bug presents on + +The same root cause (PAR dead before final bridge) presents three +different error pages depending on which downstream call hits PAR +first: + +- auth-service `/auth/choose-handle`: "Session expired, please start + over" (basic + multiple-Resend scenarios) +- pds-core friendly redirect from `/oauth/epds-callback`'s catch + block: "Your sign-in took too long to complete and timed out…" + (recovery scenario) +- Browser stalls navigating to `/oauth/epds-callback` and never + reaches `/welcome` (prompt=login scenario) + +Each of these is a dead end for the user today. + +## Strategy + +Two layers, in priority order: + +### 1. Clean exit — the contract + +**Every error a slow user can trip must end with a working path back +to the OAuth client.** No stranded pages, no loops. This is the +non-negotiable outcome. + +Audit every page that today says "Session expired" / "Authentication +failed" / "Your sign-in took too long" / similar, and confirm each +one offers either: + +- an automatic redirect to the OAuth client's `redirect_uri` with an + OAuth-spec error (`error=access_denied`, `error_description=…`, + `state`, `iss`) so the client can show its own retry UI; or +- a "Start over" button that re-initiates the OAuth flow from scratch. + +The four RED scenarios surface three different dead-end pages already +— that is the starting list for the audit. + +### 2. Heartbeat — the UX polish + +Keep the PAR alive while the user is genuinely on the page so the +common case (slow inbox / slow typist) doesn't hit the dead end at +all. PAR has a built-in refresh primitive (`pingParRequest`) which is +already wired into a few server-side handoff points but not from the +pages where humans actually wait. + +Apply heartbeats only where it materially helps and doesn't weaken +security: + +- OTP form (login) — main case +- Recovery form — same shape +- Skip handle picker (already has server-side ping at render time) +- Skip email-input form (user is fast, PAR is fresh) + +Heartbeats are **opportunistic**. If they fail, security is unchanged +and the user falls through to the clean-exit path from layer 1. +Timeouts will happen regardless of heartbeats. + +### Security guardrails for heartbeats + +The heartbeat must not be a way to extend more than it should: + +- Only refreshes PAR (timer 2). Doesn't touch OTP or auth_flow TTLs. +- Tied to a valid `epds_auth_flow` cookie — no flow → no ping. +- Bounded by auth_flow's 60-min ceiling — once auth_flow dies, the + ping has nothing to look up and becomes a no-op. PAR cannot be kept + alive past 60 min via heartbeat. +- Rate-limited per flow. +- Read-only on auth-service; only effect is sliding upstream's timer + via an already-internal call. +- Not credential-bearing. + +### What the strategy explicitly does NOT do + +- Does not bump any of the three TTL defaults. +- Does not regenerate a fresh PAR mid-flow (PAR carries client-supplied + params auth-service can't recreate without round-tripping to the + OAuth client). +- Does not extend OTP or auth_flow TTLs. + +## Tracked work + +### RED scenarios (committed, currently failing) + +In `features/passwordless-authentication.feature`, all tagged +`@otp-and-par-expiry`: + +- `Expired OTP + expired PAR — Resend must still recover the flow` +- `Two Resend cycles after silent PAR death still complete the flow` + (`@multiple-resend`) +- `prompt=login + expired PAR — flow must still complete` + (`@prompt-login`) +- `Recovery via backup email + expired PAR — flow must still complete` + (`@recovery`) + +All four reproduce on `origin/main` (commit `ffc17bd`). All four are +preserved as worst-case "PAR is hard-dead" coverage even after the +heartbeat lands — they go GREEN only when clean-exit makes the dead +end usable. + +### Dead-end pages → target UX + +Each row is a `renderError(...)` call site that a slow user can reach +on the auth path. "Page" is the route showing the error. "Trigger" is +which TTL/state condition caused it. "Today" describes the current +DOM/redirect outcome — every entry that says "static page" is a dead +end (the shared `renderError` template has no retry link, no Start +Over button — see `packages/shared/src/render-error.ts`). "Target" is +the proposed clean-exit UX. + +> **Note (post-resolution):** the table records the audit at decision +> time. Some Target cells speculate about a `/_internal/par-…`-style +> lookup or bare `/oauth/authorize` restart. The resolved approach +> (see "Resolved decisions" below) replaced both with a single shape: +> resolve the OAuth client's published metadata via +> `@certified-app/shared`'s `resolveClientMetadata()` to recover +> `redirect_uris[0]`, then issue the spec error redirect. No new +> pds-core internal endpoints; no bare authorize restart. The table +> is preserved as-written to keep the audit honest about the option +> space; the "Resolved decisions" section is the source of truth for +> what shipped. + +| # | Page | Trigger | Today | Target | +| --- | ---------------------------------------------------- | ------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| 1 | `/auth/complete` (no cookie) | `epds_auth_flow` cookie missing | Static "Authentication session expired. Please try again." | Redirect back to OAuth client with `error=access_denied` if `redirect_uri` recoverable; else static page with Start Over link to fresh `/oauth/authorize` for the original client. | +| 2 | `/auth/complete` (no flow row) | auth_flow row missing/expired | Static "Authentication session expired. Please try again." | Same as #1. The flow row carried `requestUri` + `clientId`; if it's gone we don't know `redirect_uri` and must fall back to Start Over. | +| 3 | `/auth/complete` (better-auth session error) | better-auth session lookup throws | Static "Authentication failed. Please try again." | Redirect to client with `error=server_error` if recoverable; otherwise Start Over. | +| 4 | `/auth/choose-handle` (no cookie / no flow) | same as #1/#2 but on handle picker | Static "Session expired, please start over" | Same as #1. | +| 5 | `/auth/choose-handle` (better-auth session error) | session lookup throws | Static "Authentication failed. Please try again." | Same as #3. | +| 6 | `/auth/choose-handle` (no session user) | better-auth session has no user | Static "Session expired, please start over" | Same as #1. | +| 7 | `/auth/choose-handle` (PAR ping fails on render) | upstream PAR row dead before user even sees the picker | Static "Session expired, please start over" | Redirect to client (we still hold `flow.clientId` here, can resolve `redirect_uri` via `/_internal/par-…`-style lookup OR we can defer — see open question below). | +| 8 | `/auth/choose-handle` (PAR ping fails on POST) | PAR died while user was picking | Static "Session expired, please start over" | Same as #7. | +| 9 | `/oauth/epds-callback` catch (PAR-expired branch) | upstream `requestManager.get()` threw `expired`/`unknown request_uri` | Already does the right thing IF `redirect_uri` was captured before the throw: redirects to client with `error=access_denied` + description. Falls back to a styled-but-static "Your sign-in took too long…" page when `redirect_uri` was not recoverable. | Keep redirect path. For the static-fallback case, add a Start Over link/button so the user can re-initiate. | +| 10 | `/oauth/epds-callback` catch (server_error branch) | any other throw | Static "Authentication failed." | Same shape: redirect when possible, Start Over fallback when not. | +| 11 | `/auth/recover` (missing request_uri) | direct GET without query param | Static "Missing request_uri parameter" | Out of scope — this isn't an expiry case, it's misuse. Leave as is. | +| 12 | Recovery OTP verify success → `/auth/complete` chain | recovery flow hands back to `/auth/complete`; if PAR died during the recovery loop, falls into #1/#2/#9 | Inherits the surface of whichever route fires | Inherits the fix. | + +### Failure clusters + +The table collapses into three clusters: + +- **A: redirect-when-possible** (#1, #2, #3, #4, #5, #6, #9, #10) — + user reaches an error after a sign-in attempt; redirect them back + to the OAuth client with the OAuth-spec error so the client's own + UI handles retry. This is what the OAuth spec says to do anyway. + +- **B: PAR-dead-on-handle-page** (#7, #8) — discovered before the + callback hop. Fix is to feed the same redirect-when-possible logic + with the `redirect_uri` we can pull from the (still-alive) auth_flow + row before it's too late. + +- **C: nothing recoverable** — when we have neither a live auth_flow + nor a captured `redirect_uri`, the user must Start Over. Static + pages that hit this case need a Start Over link/button. Today they + are silent dead ends. + +### Building blocks we already have + +- `handleCallbackError()` in `packages/pds-core/src/lib/epds-callback-error.ts` + already does the redirect-when-possible logic for `/oauth/epds-callback`. + Lift it (or its shape) into a shared helper auth-service can call. +- `flow.clientId` is on every auth_flow row, so given a live flow we + can always find the OAuth client's metadata (`redirect_uris`, etc.) + to construct a redirect target. +- `renderError()` is the shared template; extending it to optionally + render a Start Over button (with a `clientId` or fully-qualified + authorize URL) is a one-place change. + +### Resolved decisions + +Guiding principle: **minimum friction for the end user** — fewest +steps to retry, no manual restart when an automatic bounce-back is +possible. + +1. **Start Over destination = the OAuth client.** Always redirect to + the client's registered `redirect_uri` with the OAuth-spec error + query params. The client's own UI handles "try again", which is + one click on familiar branding. This is also what RFC 6749 §4.1.2.1 + prescribes. We cannot synthesise a valid `/oauth/authorize` URL + without the dead PAR's `state`/`code_challenge`/etc., so a "bare + re-authorize" path isn't actually feasible. The Start Over page is + only the last-resort fallback when no clientId is in scope at all. + +2. **Cluster B (PAR-dead-on-handle-page) → redirect to the client.** + We have `flow.clientId` on every auth*flow row and OAuth clients + publish metadata independently of the PAR row, so + `redirect_uris[0]` is resolvable. The original `state` is lost + (it lived in the PAR). RFC 6749 §4.1.2.1 explicitly \_requires* + `state` in the error response when it was present in the + authorization request, so this is a pragmatic degradation — not + a spec-permitted shortcut. We chose redirect-without-state over + stranding the user because every spec-compliant OAuth client + already has to handle the case where it cannot correlate an + error response with an in-flight attempt (e.g. cross-device + resumption, browser session loss), so an "anonymous error, + restart" outcome is universally recoverable on the client side + even if it is not the spec's preferred shape. + +3. **Recovery flows must carry `clientId`.** The current `clientId: +null` on recovery's auth_flow row is a side-effect of the DB API, + not a design choice. Thread the clientId from the login page into + the recovery link (URL or cookie) so the recovery page can + populate it on the auth_flow row at creation. This puts recovery + flows on the same redirect-to-client path as everything else, no + manual restart. + +The shape of the implementation falls out of these: + +- One shared helper `redirectToClientWithError(res, clientId, code, +description, state?)` that resolves client metadata and issues the + RFC 6749 error redirect. Called from every cluster A and B site. +- Lift `handleCallbackError`'s redirect logic into that helper so + pds-core and auth-service share one code path. +- `renderError` gains an optional Start Over href, used only by the + cluster C fallback when no clientId is in scope. + +### White-boxing budget + +This work must avoid adding to the white-boxing surface catalogued in +`docs/design/pds-white-boxing.md`. Concretely: + +- **Auth-service resolves client metadata itself.** + `@certified-app/shared` already exposes `resolveClientMetadata(clientId)` + which fetches the public client metadata document directly. This + means cluster A/B redirects can be built in auth-service without + any new pds-core internal endpoint and without any new + `provider.` access. Zero new white-boxing. +- **Cluster A/B implementation lives entirely in auth-service.** The + shared helper above is an `auth-service` lib (or a `shared` + utility), not a pds-core route. Pds-core continues to have its own + `handleCallbackError` for cluster A on `/oauth/epds-callback` + because that path is already inside pds-core; the helper shape can + be aligned across both services without forcing them to share a + module. +- **Heartbeat reuses existing pds-core `/_internal/ping-request`.** + No new pds-core endpoint. Auth-service adds a new public + `/auth/ping` route that calls the existing internal one + server-side; the public route is an auth-service-only concern. +- **No new `requestManager` calls.** Nothing in this work needs to + read the dead PAR row. We use only the OAuth client metadata + (which is auth-service-resolvable) and `flow.clientId` (which + auth-service already stores). +- **Recovery flow's clientId** is threaded through query-string / + cookie at link-render time, not by exposing expired PAR rows from + pds-core. The login page already has the clientId in scope. + +### Future open questions (defer) + +- Heartbeat interval (3 min vs shorter) — pick during impl. +- `?no-heartbeat=1` test toggle — useful regression-testing primitive + but tied to heartbeat rollout, not clean-exit. + +## Decisions log + +- **Clean exit > heartbeat in priority.** Heartbeats are polish; + reliable recovery from dead ends is the contract. +- **Heartbeats only where reasonable.** OTP form + recovery form for + now; not handle picker, not email-input form. +- **Don't change TTLs.** They are correct for what each layer owns. + +## Out of scope (for now) + +- Changes to upstream `@atproto/oauth-provider` constants. +- Re-architecting the auth_flow / PAR / OTP separation. +- Generic session-keepalive that crosses concerns. + +## Bug found post-PR-154: Resend offers a code that cannot complete the flow + +**What broke:** the user lands on the OTP form after the OTP has expired +(>10 min), clicks Resend, gets a fresh code, types it in — and is +bounced back to the OAuth client with an `error=access_denied` redirect. +The clean-exit contract is technically satisfied (user is recoverable, +not stranded inside ePDS), but the experience is a lie: we offered them +Resend as if it would complete the flow, accepted their fresh OTP, then +told them sign-in failed. + +**Why it happens:** the heartbeat keeps the upstream PAR alive only +while the page is actively pinging. A user who waits >10 min for the +OTP has typically also let the PAR die at minute 5 (browser tab +backgrounded, laptop closed, network hiccup hit two pings in a row, +etc.). When they click Resend, better-auth happily issues a new OTP +because it doesn't know about the PAR. Auth-service then signs the +callback against the dead `request_uri` and pds-core's +`handleCallbackError` redirects the user back to the OAuth client +with `error=access_denied`. + +**Resolved approach:** prevent the dishonest cycle in the first place. + +- **Proactive notice (no redirect).** When the heartbeat's + `/auth/ping` returns `par_expired`, the OTP form replaces its + current state with a clear notice: "Your sign-in has timed out. + The code we sent will no longer work. Start sign-in again from the + app you came from." The notice carries a single "Start over" + button. The OTP boxes, Resend button, and Verify button are all + disabled — nothing the user types or clicks on the form will + silently fail. + + No automatic redirect. The user makes the choice to click Start + over (which navigates to `/auth/abort`). Clicking elsewhere on + the page (e.g. the Powered by Certified link) still works as + expected. + +- **Reactive gates kept for defence in depth.** Even with the + proactive notice, race conditions exist (heartbeat hasn't + fired yet, user already mid-typing, etc.). Both the Resend click + and the Verify submit ping `/auth/ping` first — if it returns + `par_expired`, the handler navigates to `/auth/abort` instead of + proceeding, so a user who beats the proactive notice still gets + cleanly bounced rather than typing a code that won't work. + +- **`GET /auth/abort` route.** Browser-driven server-side clean exit. + Reads the auth_flow cookie, runs the same `cleanExit()` helper as + the unrecoverable-error paths in /auth/complete and + /auth/choose-handle: redirect to the OAuth client's `redirect_uri` + with `error=access_denied` per RFC 6749 §4.1.2.1, or fall back to + a styled "Return to sign in" page when the client is unknown. + Cookie is cleared because the flow is being abandoned. + +**Why a 3-minute heartbeat interval against a 5-minute window?** +PAR's sliding inactivity timer (5 min) is reset on every successful +`requestManager.get()` call upstream. 3 min is comfortably under 5 +and tolerates one missed ping (network blip, suspended tab between +ticks). Tighter would be wasted requests; looser would risk the timer +expiring between pings. + +**Why slide the timer indefinitely?** The upstream code in +`@atproto/oauth-provider`'s `request-manager.ts` resets `expiresAt` +on every successful read by design — sliding-on-touch is the +intended pattern, not a workaround. The 5-min figure was introduced +in the initial OAuth provider commit (June 2024) with no inline +rationale, no subsequent discussion in PR/commit history, and +nothing in the OAuth/PAR specs that mandates either the value or +the sliding behaviour. Practical bounds: + +- The 60-min auth_flow TTL caps how long heartbeat-driven sliding + can keep a PAR alive — once the auth_flow row dies, /auth/ping + returns `flow_expired` and the browser stops pinging. +- The httpOnly `epds_auth_flow` cookie gates `/auth/ping`, so an + attacker without the cookie cannot ping. +- PAR consumption is single-use — once the auth code is issued, + the row is deleted. + +## Status + +- Reproduction scenarios: 4 RED scenarios committed on + `fix/otp-resend-after-par-expiry` and now GREEN — they assert + "browser lands back at the demo client with an auth error", which + the demo translates to `?error=auth_failed`. +- Audit: complete. All 12 dead-end sites mapped to clusters A / B / C + with resolved decisions. +- Heartbeat impl: landed on commit `b1fc940`. `/auth/ping` route + + 3-min `setInterval` on OTP form + recovery form. Covered by 15 + unit tests and one e2e (`@par-heartbeat`). +- Clean-exit impl: landed on commit `2e4d327`. Six auth-service + dead-ends rewired to `cleanExit()`; pds-core's + `handleCallbackError` extended with a `signedClientId` fallback + (delivered via a new `client_id` field on the HMAC-signed + callback). Covered by 7 new pds-core tests, 4 new shared crypto + tests, and the four `@otp-and-par-expiry` e2e scenarios. +- Resend-honesty impl (post-PR-154 user report): `/auth/abort` + route + proactive notice on the OTP form + reactive abort gates + on Resend / Verify clicks. Resend no longer issues a fresh OTP + that cannot complete the flow — instead the user goes straight + to the OAuth client with the spec-compliant error redirect. + Covered by 4 new auth-abort tests, 7 new login-page render-shape + tests, and the new `@resend-after-par-dead` e2e scenario. The + pre-existing `@otp-and-par-expiry` and `@multiple-resend` + scenarios were updated to match the new contract (the user no + longer sees the misleading "OTP expired → Resend → fresh OTP + → fails" cycle). diff --git a/e2e/step-definitions/auth.steps.ts b/e2e/step-definitions/auth.steps.ts index 2d0a3bb4..bca6225f 100644 --- a/e2e/step-definitions/auth.steps.ts +++ b/e2e/step-definitions/auth.steps.ts @@ -346,10 +346,25 @@ Then( }, ) -Then('the user can try again', async function (this: EpdsWorld) { - const page = getPage(this) - await expect(page.locator('.otp-box').first()).toBeEnabled() -}) +Then( + 'the OTP entry boxes are visible and enabled', + async function (this: EpdsWorld) { + const page = getPage(this) + const boxes = page.locator('.otp-box') + const count = await boxes.count() + if (count === 0) { + throw new Error('No .otp-box elements found — OTP form is not rendered') + } + // Every box must be both visible AND enabled. Asserting on .first() + // alone hid regressions where a partial form (e.g. a stale + // "verifying..." latch on later boxes) blocked further attempts even + // though the first box looked fine. + for (let i = 0; i < count; i++) { + await expect(boxes.nth(i)).toBeVisible() + await expect(boxes.nth(i)).toBeEnabled() + } + }, +) Then('further attempts are rejected', async function (this: EpdsWorld) { const page = getPage(this) @@ -445,7 +460,11 @@ Then( async function (this: EpdsWorld, expected: string) { const page = getPage(this) await expect(page.locator('#error-msg')).toBeVisible({ timeout: 10_000 }) - await expect(page.locator('#error-msg')).toHaveText(expected, { + // toContainText (not toHaveText) so the OTP-expired error + // banner can carry the inline "Send a new code" action button + // alongside the message text. Equality matching would fail + // whenever the inline action surfaces. + await expect(page.locator('#error-msg')).toContainText(expected, { timeout: 10_000, }) }, @@ -566,25 +585,45 @@ async function callPdsExpiryHook( } /** - * Read the PAR request_uri from the current auth-service login page - * URL and stash it on the world for subsequent expiry hooks. The URL - * is `https:///oauth/authorize?request_uri=urn:...&...` while the - * user is on the login/OTP form. Throws if the URL doesn't carry it, - * which means the surrounding scenario is mis-ordered (the page must - * be on the auth-service side before this step runs). + * Read the PAR request_uri from the current auth-service page URL and + * stash it on the world for subsequent expiry hooks. While the user is + * on the login/OTP form the URL is + * `https:///oauth/authorize?request_uri=urn:...&...`, but on + * downstream pages (e.g. /auth/recover) the parameter has been dropped. + * Falls back to a previously-stashed `world.lastRequestUri` so a + * scenario can capture the URI early (via the dedicated capture step + * below) and consult it after navigation. Throws when neither source + * has a value, which means the scenario is mis-ordered. */ function captureRequestUriFromPage(world: EpdsWorld): string { const page = getPage(world) - const requestUri = new URL(page.url()).searchParams.get('request_uri') - if (!requestUri) { - throw new Error( - `Expected request_uri in page URL but found none: ${page.url()}`, - ) + const fromUrl = new URL(page.url()).searchParams.get('request_uri') + if (fromUrl) { + world.lastRequestUri = fromUrl + return fromUrl } - world.lastRequestUri = requestUri - return requestUri + if (world.lastRequestUri) { + return world.lastRequestUri + } + throw new Error( + `Expected request_uri in page URL or previously captured but found none: ${page.url()}`, + ) } +/** + * Capture and stash the PAR request_uri from the current auth-service + * page URL so a later step can refer to it after navigating away. Used + * by scenarios where the OTP form is left for /auth/recover (recovery) + * or any other downstream page where the request_uri has dropped off + * the URL. + */ +When( + 'the PAR request_uri is captured for later expiry', + function (this: EpdsWorld) { + captureRequestUriFromPage(this) + }, +) + When( 'the PAR request_uri has expired before the bridge fires', async function (this: EpdsWorld) { @@ -595,43 +634,55 @@ When( }, ) -Then('the response body is not raw JSON', async function (this: EpdsWorld) { - const page = getPage(this) - // The OTP form's submit is JS-driven and async, and Playwright's - // fill() returns before the bridge redirects. Wait for the - // browser to actually leave the auth-service host and arrive at - // pds-core's epds-callback (where, in this scenario, the - // catch-block renders the error page). Without this wait we'd - // read the still-rendering OTP form's body. - const pdsHost = new URL(testEnv.pdsUrl).host - await page.waitForURL( - (url) => url.host === pdsHost && url.pathname.includes('epds-callback'), - { timeout: 30_000 }, - ) - const body = await page.locator('body').innerText() - // The pre-fix behaviour returned a body that started with - // {"error": "Authentication failed"}. A graceful HTML page won't — - // its

/

text isn't valid JSON. The regex catches any - // {"error": ...} shape so a future leak of a different JSON - // payload is still caught. - if (/^\s*\{\s*"error"/.test(body)) { - throw new Error( - `pds-core leaked raw JSON to the browser: ${body.slice(0, 200)}`, - ) - } -}) +// --------------------------------------------------------------------------- +// Clean-exit assertions for @otp-and-par-expiry scenarios +// --------------------------------------------------------------------------- +// +// When the upstream PAR is hard-dead (the test hook deletes the row), +// no amount of heartbeat can revive it — but we still owe the user a +// clean exit per RFC 6749 §4.1.2.1: redirect them back to the OAuth +// client's redirect_uri with `error=access_denied` so the client's +// own UI can handle retry. The demo client translates that to +// `?error=auth_failed` on its landing page. + +Then( + 'the browser lands back at the demo client with an auth error', + async function (this: EpdsWorld) { + const origin = new URL(testEnv.demoUrl).origin + const page = getPage(this) + await page.waitForURL(`${origin}/?error=auth_failed*`, { + timeout: 30_000, + }) + }, +) + +// --------------------------------------------------------------------------- +// PAR heartbeat liveness (@par-heartbeat) +// --------------------------------------------------------------------------- +// +// The OTP form auto-fires a fetch to /auth/ping every 3 minutes. Waiting +// 3 minutes wall-clock is unacceptable for an e2e scenario, so this step +// invokes the same fetch synchronously from the page's own JS context +// — same origin, same cookies, same browser security boundary — and +// asserts the response. That proves the wiring (page can reach +// /auth/ping → auth-service forwards to pds-core's +// /_internal/ping-request → returns 200) without waiting for the +// interval to tick. Then( - 'the response body explains that sign-in timed out', + 'a heartbeat fetched from the OTP form returns ok:true', async function (this: EpdsWorld) { const page = getPage(this) - const body = await page.locator('body').innerText() - // Don't pin exact wording — just require something that mentions - // the timeout / expiry so a human reading it understands why - // their sign-in failed. - if (!/expir|timed? ?out|too long/i.test(body)) { + const body = await page.evaluate(async () => { + const r = await fetch('/auth/ping', { + credentials: 'include', + cache: 'no-store', + }) + return (await r.json()) as { ok: boolean; reason?: string } + }) + if (!body.ok) { throw new Error( - `Error page should mention the timeout but said: "${body.slice(0, 500)}"`, + `Expected /auth/ping to return ok:true but got: ${JSON.stringify(body)}`, ) } }, diff --git a/features/passwordless-authentication.feature b/features/passwordless-authentication.feature index e5a9a589..e13b9f5c 100644 --- a/features/passwordless-authentication.feature +++ b/features/passwordless-authentication.feature @@ -264,7 +264,7 @@ Feature: Passwordless authentication via email OTP When more than 10 minutes pass before the user enters the OTP And the user enters the OTP code Then the verification form shows an "OTP expired" error - And the user can try again + And the OTP entry boxes are visible and enabled When the user requests a new OTP via the resend button Then a fresh OTP email arrives in the mail trap for the test email When the user enters the OTP code @@ -272,6 +272,156 @@ Feature: Passwordless authentication via email OTP Then the browser is redirected back to the demo client And the demo client has a valid OAuth access token + # --- Hard-dead PAR clean-exit scenarios (@otp-and-par-expiry) --- + # + # These four scenarios reproduce the dead-end pages a real user used + # to trip when the upstream PAR (request_uri) row died before the + # OAuth flow could complete. Upstream's PAR has a 5-min sliding + # inactivity timeout that we cannot bump (hardcoded in + # @atproto/oauth-provider) and that, once tripped, deletes the row + # in the same call that throws — so no in-flight recovery is + # possible. + # + # The fix is twofold (see docs/design/par-expiry-and-clean-exits.md): + # + # 1. While the user is on the OTP / recovery form, a 3-min + # heartbeat to /auth/ping slides the PAR's inactivity timer so + # the row stays alive. This is exercised by @par-heartbeat. + # + # 2. When the PAR is *genuinely* dead (closed tab, walked away, + # explicit upstream cleanup), every dead-end auth-service page + # must redirect the user back to the OAuth client with the + # RFC 6749 §4.1.2.1 error parameters so the client's own UI + # handles retry — never strand them on a static page. + # + # These scenarios use /_internal/test/delete-par to simulate (2) — + # the row is hard-deleted and cannot be revived. The contract they + # assert is therefore "land back at the demo client with an auth + # error" (the demo translates `error=access_denied` to + # `?error=auth_failed` on its landing page), NOT "the OAuth flow + # successfully completes". Layer 1 (heartbeat) is what prevents + # the dead PAR; layer 2 (clean-exit) is what makes recovery + # one-click when prevention fails. + @email @otp-and-par-expiry + Scenario: Expired OTP + expired PAR — clean exit back to the OAuth client + # With the proactive notice + reactive abort gate (added after the + # original PR-154 user report) the user no longer goes through the + # "OTP expired → Resend → fresh OTP → fails anyway" cycle. The + # Verify submit sees the dead PAR via /auth/ping and bails to + # /auth/abort directly, so the contract this scenario asserts + # is just "lands at the demo client with an auth error" — same + # end state, fewer misleading UI steps. + When the demo client initiates an OAuth login + Then the browser is redirected to the auth service login page + And the login page displays an email input form + When the user enters a unique test email and submits + Then an OTP email arrives in the mail trap for the test email + And the login page shows an OTP verification form + When more than 10 minutes pass before the user enters the OTP + And the PAR request_uri has expired before the bridge fires + And the user enters the OTP code + Then the browser lands back at the demo client with an auth error + + @email @otp-and-par-expiry @multiple-resend + Scenario: Two Resend cycles after silent PAR death — clean exit back to the OAuth client + # First Resend cycle works (PAR is still alive). Second Resend + # triggers after the PAR has died: the reactive abort gate sees + # /auth/ping returns par_expired, so the resend handler bails + # to /auth/abort instead of issuing a third OTP that the user + # could never use. End state: user is back at the demo client + # with the spec-compliant error redirect, NOT in a misleading + # "fresh code that won't work" cycle. + When the demo client initiates an OAuth login + Then the browser is redirected to the auth service login page + And the login page displays an email input form + When the user enters a unique test email and submits + Then an OTP email arrives in the mail trap for the test email + And the login page shows an OTP verification form + When the user requests a new OTP via the resend button + Then a fresh OTP email arrives in the mail trap for the test email + When the PAR request_uri has expired before the bridge fires + And the user requests a new OTP via the resend button + Then the browser lands back at the demo client with an auth error + + @email @otp-and-par-expiry @prompt-login + Scenario: prompt=login + expired PAR — clean exit back to the OAuth client + Given a returning user has a PDS account + When the demo client starts a new OAuth flow with prompt=login + And the user enters the test email on the login page + Then an OTP email arrives in the mail trap + And the login page shows an OTP verification form + When the PAR request_uri has expired before the bridge fires + And the user enters the OTP code + Then the browser lands back at the demo client with an auth error + + @email @otp-and-par-expiry @recovery + Scenario: Recovery via backup email + expired PAR — clean exit back to the OAuth client + Given a returning user has a PDS account + And the test user has a verified backup email + And the demo client initiates OAuth with the test email as login_hint + # /auth/recover is a plain URL with no request_uri query parameter, + # so we must snapshot the request_uri while we're still on the + # original /oauth/authorize page (the login_hint step lands here). + And the PAR request_uri is captured for later expiry + When the user navigates to the recovery page + And the user enters the backup email on the recovery page + Then an OTP email arrives in the mail trap for the backup email + When the PAR request_uri has expired before the bridge fires + And the user enters the recovery OTP + Then the browser lands back at the demo client with an auth error + + # --- PAR heartbeat (live PAR, no delete) --- + # + # Proves the in-page heartbeat reaches /auth/ping and that + # /auth/ping forwards to pds-core's /_internal/ping-request, + # i.e. the wiring works end-to-end through Caddy. This is + # heartbeat liveness, not heartbeat efficacy — the unit tests in + # heartbeat.test.ts cover routing logic. We don't wall-clock-wait + # 5 min for the timer to lapse; instead the browser fires the + # heartbeat fetch synchronously from the OTP form's page context + # and we assert the response shape. + @email @par-heartbeat + Scenario: OTP form's heartbeat reaches /auth/ping with ok:true + When the demo client initiates an OAuth login + Then the browser is redirected to the auth service login page + When the user enters a unique test email and submits + Then an OTP email arrives in the mail trap for the test email + And the login page shows an OTP verification form + Then a heartbeat fetched from the OTP form returns ok:true + + # --- Resend after PAR death --- + # + # If the user lands on the OTP form with the upstream PAR already + # dead (>5 min wait, missed heartbeat, suspended tab, etc.), they + # should NOT be allowed to click Resend, get a fresh code, type it + # in, and only then be told sign-in failed. Offering Resend in + # that state is misleading: better-auth would happily issue a new + # OTP that pds-core's /oauth/epds-callback would then bounce back + # to the OAuth client with error=access_denied because the PAR is + # gone and cannot be revived. + # + # The Resend click handler now pings /auth/ping first. If the + # response is `par_expired` (or `flow_expired` / `no_cookie`), the + # browser navigates to /auth/abort instead of issuing the new OTP. + # /auth/abort runs the same cleanExit() as the unrecoverable-error + # paths, so the user lands at the OAuth client with the standard + # RFC 6749 §4.1.2.1 error params. No fresh OTP gets issued, no + # OTP-typing UX cycle. + @email @resend-after-par-dead + Scenario: Clicking Resend after the PAR has died bails to the OAuth client without issuing a new OTP + When the demo client initiates an OAuth login + Then the browser is redirected to the auth service login page + When the user enters a unique test email and submits + Then an OTP email arrives in the mail trap for the test email + And the login page shows an OTP verification form + When the PAR request_uri has expired before the bridge fires + # The Resend step wipes the inbox before clicking the button, so + # the "no new OTP email" assertion below measures the resend + # click in isolation rather than the OTP that already arrived. + And the user requests a new OTP via the resend button + Then the browser lands back at the demo client with an auth error + And no new OTP email is sent to the test email + # --- PAR (request_uri) expiry --- # # The PAR ("Pushed Authorization Request") record lives in pds-core's @@ -316,18 +466,20 @@ Feature: Passwordless authentication via email OTP # The test deletes the PAR row before the callback fires, mirroring # the production failure where /oauth/epds-callback's first # requestManager.get() (Step 2 in the handler) throws - # InvalidRequestError("Unknown request_uri"). With no live PAR row - # there is no redirect_uri to recover, so the user must land on a - # styled HTML page on the PDS host — not the previous raw JSON - # body — explaining that the sign-in timed out. + # InvalidRequestError("Unknown request_uri"). The signed callback + # URL now carries `client_id` (per the clean-exit work in this PR), + # so even though the PAR is dead the catch block can resolve the + # client's metadata and redirect the user back to the OAuth client + # with the standard RFC 6749 §4.1.2.1 error parameters. The demo + # client translates `error=access_denied` into `?error=auth_failed` + # on its landing page; that's the contract this scenario asserts. # - # The redirect-back-to-client path (driven by the redirect_uri / - # state captured during a successful Step 2) is exercised by the - # other paths inside the same handler that throw later in the - # flow; we don't have an easy e2e harness to inject a mid-flow - # failure today. + # The static HTML fallback inside pds-core's catch block only fires + # in the worst case where neither the captured PAR data nor the + # signed `client_id` resolves to a usable redirect URI — much + # harder to reach now and not exercised here. @email @par-callback-error - Scenario: Expired PAR shows a styled HTML page instead of leaking JSON + Scenario: Expired PAR redirects back to the OAuth client instead of stranding the user Given a returning user has a PDS account When the demo client initiates an OAuth login And the user enters the test email on the login page @@ -335,8 +487,7 @@ Feature: Passwordless authentication via email OTP And the login page shows an OTP verification form When the PAR request_uri has expired before the bridge fires And the user enters the OTP code - Then the response body is not raw JSON - And the response body explains that sign-in timed out + Then the browser lands back at the demo client with an auth error # --- Brute force protection --- @@ -344,7 +495,7 @@ Feature: Passwordless authentication via email OTP When the user requests an OTP for a unique test email And enters an incorrect OTP code Then the verification form shows an error message - And the user can try again + And the OTP entry boxes are visible and enabled Scenario: Too many failed OTP attempts locks out the token When the user requests an OTP for a unique test email diff --git a/packages/auth-service/src/__tests__/__helpers__/heartbeat-router-harness.ts b/packages/auth-service/src/__tests__/__helpers__/heartbeat-router-harness.ts new file mode 100644 index 00000000..d0a4af75 --- /dev/null +++ b/packages/auth-service/src/__tests__/__helpers__/heartbeat-router-harness.ts @@ -0,0 +1,102 @@ +/** + * Shared test harness for the heartbeat router (which now hosts both + * GET /auth/ping and GET /auth/abort). Lifted out of the per-route + * test files to keep them readable and to keep Sonar's + * duplicated-lines gate happy — both files would otherwise carry + * the same boilerplate verbatim. + */ +import express from 'express' +import cookieParser from 'cookie-parser' +import type { AuthServiceContext } from '../../context.js' + +export interface FakeFlow { + requestUri: string + clientId: string | null + handleMode: null +} + +/** + * Build an Express app with cookie-parser + the heartbeat router + * mounted, backed by an in-memory Map of fake auth_flow rows. The + * caller passes the `createHeartbeatRouter` factory directly so a + * test using `vi.mock` to replace transitive dependencies (e.g. + * cleanExit, pingParRequest) doesn't have to import the router + * before its mocks are in scope. + */ +export function buildHeartbeatApp( + createHeartbeatRouter: (ctx: AuthServiceContext) => express.Router, + flows: Map, +): express.Express { + const ctx = { + db: { + getAuthFlow(flowId: string): FakeFlow | undefined { + return flows.get(flowId) + }, + }, + } as unknown as AuthServiceContext + const app = express() + app.use(cookieParser()) + app.use(createHeartbeatRouter(ctx)) + return app +} + +export interface HarnessResponse { + status: number + setCookie: string[] + location: string | null + cacheControl: string | null + body: unknown +} + +/** + * Spin up the harness on an ephemeral port, GET the supplied path, + * tear the server down. The /auth/ping caller wants the JSON body; + * the /auth/abort caller wants the redirect Location and Set-Cookie + * headers; expose all of them and let each test pick what it cares + * about. + */ +export async function harnessGet( + app: express.Express, + path: string, + cookie?: string, +): Promise { + const server = app.listen(0) + try { + server.unref() + const port = await new Promise((resolve, reject) => { + server.once('error', reject) + server.once('listening', () => { + const addr = server.address() + if (typeof addr === 'object' && addr) resolve(addr.port) + else reject(new Error('Failed to resolve ephemeral port')) + }) + }) + const res = await fetch(`http://127.0.0.1:${port}${path}`, { + method: 'GET', + headers: cookie ? { Cookie: cookie } : {}, + redirect: 'manual', + }) + // Body parsing: /auth/ping returns JSON; /auth/abort returns a + // redirect with no body. Try JSON; on failure, fall back to + // null. Caller decides which result they care about. + let body: unknown = null + try { + body = await res.json() + } catch { + body = null + } + return { + status: res.status, + setCookie: res.headers.getSetCookie(), + location: res.headers.get('location'), + cacheControl: res.headers.get('cache-control'), + body, + } + } finally { + await new Promise((resolve) => { + server.close(() => { + resolve() + }) + }) + } +} diff --git a/packages/auth-service/src/__tests__/auth-abort.test.ts b/packages/auth-service/src/__tests__/auth-abort.test.ts new file mode 100644 index 00000000..48f1d696 --- /dev/null +++ b/packages/auth-service/src/__tests__/auth-abort.test.ts @@ -0,0 +1,141 @@ +/** + * Tests for the /auth/abort browser-driven clean-exit route. + * + * The route exists so the OTP / recovery forms can bail to the + * OAuth client when they detect the flow can no longer complete + * (PAR or auth_flow gone). Asserts: + * - cookie is cleared (the flow is being abandoned) + * - cleanExit is called with the right opts (clientId from the + * flow row, access_denied code, "took too long" description) + * - works whether or not a flow row exists (cookie present but + * row gone, vs no cookie at all) + */ +import { + describe, + it, + expect, + vi, + beforeEach, + beforeAll, + afterAll, +} from 'vitest' +import { + buildHeartbeatApp, + harnessGet, + type FakeFlow, +} from './__helpers__/heartbeat-router-harness.js' +import type express from 'express' + +const PDS_URL = 'https://core:3000' +const SECRET = 'test-secret' +const CLIENT_ID = 'https://demo.example/client-metadata.json' + +const ORIGINAL_PDS_URL = process.env.PDS_INTERNAL_URL +const ORIGINAL_SECRET = process.env.EPDS_INTERNAL_SECRET + +beforeAll(() => { + process.env.PDS_INTERNAL_URL = PDS_URL + process.env.EPDS_INTERNAL_SECRET = SECRET +}) + +afterAll(() => { + if (ORIGINAL_PDS_URL === undefined) delete process.env.PDS_INTERNAL_URL + else process.env.PDS_INTERNAL_URL = ORIGINAL_PDS_URL + if (ORIGINAL_SECRET === undefined) delete process.env.EPDS_INTERNAL_SECRET + else process.env.EPDS_INTERNAL_SECRET = ORIGINAL_SECRET +}) + +// Mock cleanExit at the module boundary so we can assert what /auth/abort +// passes without standing up the full redirect/metadata machinery. +const cleanExitMock = vi.hoisted(() => vi.fn()) +vi.mock('../lib/clean-exit.js', () => ({ + cleanExit: cleanExitMock, +})) + +// Stub pingParRequest too — same reason as in heartbeat.test.ts. +const pingMock = vi.hoisted(() => vi.fn()) +vi.mock('../lib/ping-par-request.js', () => ({ + pingParRequest: pingMock, +})) + +beforeEach(() => { + cleanExitMock.mockReset() + cleanExitMock.mockImplementation(({ res }: { res: express.Response }) => { + // Simulate a response so the route doesn't hang the test + // client. Returns a resolved promise so the route's + // `await cleanExit(...)` proceeds normally. + res + .status(303) + .setHeader('Location', 'https://demo.example/?error=auth_failed') + .end() + return Promise.resolve() + }) + pingMock.mockReset() +}) + +const { createHeartbeatRouter } = await import('../routes/heartbeat.js') + +function buildApp(flows: Map): express.Express { + return buildHeartbeatApp(createHeartbeatRouter, flows) +} + +async function getAbort( + app: express.Express, + cookie?: string, +): Promise<{ status: number; setCookie: string[]; location: string | null }> { + const r = await harnessGet(app, '/auth/abort', cookie) + return { status: r.status, setCookie: r.setCookie, location: r.location } +} + +describe('GET /auth/abort', () => { + it('calls cleanExit with the flow.clientId when the cookie + flow row are alive', async () => { + const flows = new Map([ + [ + 'flow-1', + { + requestUri: 'urn:ietf:params:oauth:request_uri:req-abc', + clientId: CLIENT_ID, + handleMode: null, + }, + ], + ]) + const app = buildApp(flows) + + await getAbort(app, 'epds_auth_flow=flow-1') + + expect(cleanExitMock).toHaveBeenCalledWith( + expect.objectContaining({ + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'access_denied', + description: expect.stringMatching(/took too long/i), + }), + ) + }) + + it('clears the auth_flow cookie regardless of whether a flow row exists', async () => { + const app = buildApp(new Map()) + const got = await getAbort(app, 'epds_auth_flow=stale-flow-id') + // Express sets a cookie with `Max-Age=0` (or `Expires` in the past) to clear it. + const cleared = got.setCookie.some((c) => + /epds_auth_flow=;.*Expires=Thu, 01 Jan 1970/.test(c), + ) + expect(cleared).toBe(true) + }) + + it('passes clientId: null when no cookie is present', async () => { + const app = buildApp(new Map()) + await getAbort(app) + expect(cleanExitMock).toHaveBeenCalledWith( + expect.objectContaining({ clientId: null }), + ) + }) + + it('passes clientId: null when the cookie points to a non-existent flow', async () => { + const app = buildApp(new Map()) + await getAbort(app, 'epds_auth_flow=missing-flow') + expect(cleanExitMock).toHaveBeenCalledWith( + expect.objectContaining({ clientId: null }), + ) + }) +}) diff --git a/packages/auth-service/src/__tests__/build-epds-callback-url.test.ts b/packages/auth-service/src/__tests__/build-epds-callback-url.test.ts new file mode 100644 index 00000000..faf25ab1 --- /dev/null +++ b/packages/auth-service/src/__tests__/build-epds-callback-url.test.ts @@ -0,0 +1,213 @@ +/** + * Tests for buildEpdsCallbackUrl — the HMAC-signed callback URL that + * /auth/complete (and /auth/choose-handle) emit to bridge the user + * from auth-service into pds-core. Covers the two contracts that + * matter beyond round-trip: client_id is signed in, and the + * random-mode handle sentinel is preserved. + */ +import { describe, it, expect, beforeAll, afterAll } from 'vitest' + +const ORIGINAL_PDS_URL = process.env.PDS_INTERNAL_URL +const ORIGINAL_SECRET = process.env.EPDS_INTERNAL_SECRET + +beforeAll(() => { + // Importing complete.ts pulls in lib/clean-exit.js which calls + // requireInternalEnv() at router-creation time. We don't actually + // call createCompleteRouter from these tests, but the import side + // effects still fire, so satisfy them. + process.env.PDS_INTERNAL_URL = 'https://core:3000' + process.env.EPDS_INTERNAL_SECRET = 'test-secret' +}) + +afterAll(() => { + if (ORIGINAL_PDS_URL === undefined) delete process.env.PDS_INTERNAL_URL + else process.env.PDS_INTERNAL_URL = ORIGINAL_PDS_URL + if (ORIGINAL_SECRET === undefined) delete process.env.EPDS_INTERNAL_SECRET + else process.env.EPDS_INTERNAL_SECRET = ORIGINAL_SECRET +}) + +import { buildEpdsCallbackUrl } from '../routes/complete.js' +import { verifyCallback, type CallbackParams } from '@certified-app/shared' + +const PDS_PUBLIC_URL = 'https://pds.example' +const SECRET = 'test-callback-secret' +const REQUEST_URI = 'urn:ietf:params:oauth:request_uri:req-abc' +const EMAIL = 'user@example.com' +const CLIENT_ID = 'https://demo.example/client-metadata.json' + +function paramsFromUrl(url: string): URLSearchParams { + return new URL(url).searchParams +} + +/** + * Read a required URL param. Throws if absent — tests that get here + * have already asserted the param was set, so a missing value is a + * test-fixture bug, not a runtime branch we want to handle silently. + * Avoids the non-null `!` assertion on every `.get()` call. + */ +function requiredParam(q: URLSearchParams, name: string): string { + const v = q.get(name) + if (v === null) throw new Error(`expected URL param ${name} to be set`) + return v +} + +describe('buildEpdsCallbackUrl', () => { + it('targets pds-core /oauth/epds-callback at the configured public URL', () => { + const url = buildEpdsCallbackUrl({ + flowRequestUri: REQUEST_URI, + flowClientId: CLIENT_ID, + email: EMAIL, + isNewAccount: false, + pdsPublicUrl: PDS_PUBLIC_URL, + epdsCallbackSecret: SECRET, + }) + const u = new URL(url) + expect(u.origin).toBe(PDS_PUBLIC_URL) + expect(u.pathname).toBe('/oauth/epds-callback') + }) + + it('round-trips via verifyCallback for an existing user', () => { + const url = buildEpdsCallbackUrl({ + flowRequestUri: REQUEST_URI, + flowClientId: CLIENT_ID, + email: EMAIL, + isNewAccount: false, + pdsPublicUrl: PDS_PUBLIC_URL, + epdsCallbackSecret: SECRET, + }) + const q = paramsFromUrl(url) + expect(q.get('approved')).toBe('1') + expect(q.get('new_account')).toBe('0') + expect(q.get('email')).toBe(EMAIL) + expect(q.get('request_uri')).toBe(REQUEST_URI) + expect(q.get('client_id')).toBe(CLIENT_ID) + + const ts = requiredParam(q, 'ts') + const sig = requiredParam(q, 'sig') + const params: CallbackParams = { + request_uri: REQUEST_URI, + email: EMAIL, + approved: '1', + new_account: '0', + client_id: CLIENT_ID, + } + expect(verifyCallback(params, ts, sig, SECRET)).toBe(true) + }) + + it('round-trips via verifyCallback for a new account', () => { + const url = buildEpdsCallbackUrl({ + flowRequestUri: REQUEST_URI, + flowClientId: CLIENT_ID, + email: EMAIL, + isNewAccount: true, + pdsPublicUrl: PDS_PUBLIC_URL, + epdsCallbackSecret: SECRET, + }) + const q = paramsFromUrl(url) + expect(q.get('new_account')).toBe('1') + const params: CallbackParams = { + request_uri: REQUEST_URI, + email: EMAIL, + approved: '1', + new_account: '1', + client_id: CLIENT_ID, + } + expect( + verifyCallback( + params, + requiredParam(q, 'ts'), + requiredParam(q, 'sig'), + SECRET, + ), + ).toBe(true) + }) + + it('omits client_id from the URL when flowClientId is null', () => { + const url = buildEpdsCallbackUrl({ + flowRequestUri: REQUEST_URI, + flowClientId: null, + email: EMAIL, + isNewAccount: false, + pdsPublicUrl: PDS_PUBLIC_URL, + epdsCallbackSecret: SECRET, + }) + const q = paramsFromUrl(url) + expect(q.has('client_id')).toBe(false) + // And it still round-trips via verifyCallback (the signed payload + // uses the empty-string sentinel for absent client_id). + const params: CallbackParams = { + request_uri: REQUEST_URI, + email: EMAIL, + approved: '1', + new_account: '0', + } + expect( + verifyCallback( + params, + requiredParam(q, 'ts'), + requiredParam(q, 'sig'), + SECRET, + ), + ).toBe(true) + }) + + it('preserves the random-mode handle sentinel: omitting `handle` is the trigger for pds-core to call generateRandomHandle()', () => { + // Neither the inputs to buildEpdsCallbackUrl nor the output URL + // carry a `handle` field — the absence is the signal. This pins + // the contract documented in the function's JSDoc. + const url = buildEpdsCallbackUrl({ + flowRequestUri: REQUEST_URI, + flowClientId: CLIENT_ID, + email: EMAIL, + isNewAccount: true, + pdsPublicUrl: PDS_PUBLIC_URL, + epdsCallbackSecret: SECRET, + }) + const q = paramsFromUrl(url) + expect(q.has('handle')).toBe(false) + // verifyCallback with absent handle accepts the signature, mirroring + // the sentinel test in shared/src/__tests__/crypto.test.ts. + const params: CallbackParams = { + request_uri: REQUEST_URI, + email: EMAIL, + approved: '1', + new_account: '1', + client_id: CLIENT_ID, + } + expect( + verifyCallback( + params, + requiredParam(q, 'ts'), + requiredParam(q, 'sig'), + SECRET, + ), + ).toBe(true) + }) + + it("rejects a tampered client_id at the verifier (the value is signed, an attacker cannot redirect a victim's flow at a different OAuth client)", () => { + const url = buildEpdsCallbackUrl({ + flowRequestUri: REQUEST_URI, + flowClientId: CLIENT_ID, + email: EMAIL, + isNewAccount: false, + pdsPublicUrl: PDS_PUBLIC_URL, + epdsCallbackSecret: SECRET, + }) + const q = paramsFromUrl(url) + const tampered: CallbackParams = { + request_uri: REQUEST_URI, + email: EMAIL, + approved: '1', + new_account: '0', + client_id: 'https://attacker.example/client-metadata.json', + } + expect( + verifyCallback( + tampered, + requiredParam(q, 'ts'), + requiredParam(q, 'sig'), + SECRET, + ), + ).toBe(false) + }) +}) diff --git a/packages/auth-service/src/__tests__/clean-exit.test.ts b/packages/auth-service/src/__tests__/clean-exit.test.ts new file mode 100644 index 00000000..19f4924c --- /dev/null +++ b/packages/auth-service/src/__tests__/clean-exit.test.ts @@ -0,0 +1,356 @@ +/** + * Tests for cleanExit() — the response emitter that turns "session + * expired" dead-ends into clean RFC 6749 §4.1.2.1 redirects (Tier 1) + * or a styled HTML page with a Start Over button (Tier 2). Drives + * every branch by mocking buildClientErrorRedirect (the URL builder + * that hits the network in production) and resolveClientMetadata + * (the Tier-2 client_uri lookup) at the module boundary. + */ +import { + describe, + it, + expect, + vi, + beforeEach, + beforeAll, + afterAll, +} from 'vitest' +import type { Response } from 'express' + +const PDS_URL = 'https://pds.example' +const CLIENT_ID = 'https://demo.example/client-metadata.json' +const REDIRECT_URI = 'https://demo.example/api/oauth/callback' + +const ORIGINAL_PDS_URL = process.env.PDS_INTERNAL_URL +const ORIGINAL_SECRET = process.env.EPDS_INTERNAL_SECRET + +beforeAll(() => { + // Same as the redirect-to-client-error test — sibling module imports + // require these env vars at evaluation time. + process.env.PDS_INTERNAL_URL = 'https://core:3000' + process.env.EPDS_INTERNAL_SECRET = 'test-secret' +}) + +afterAll(() => { + if (ORIGINAL_PDS_URL === undefined) delete process.env.PDS_INTERNAL_URL + else process.env.PDS_INTERNAL_URL = ORIGINAL_PDS_URL + if (ORIGINAL_SECRET === undefined) delete process.env.EPDS_INTERNAL_SECRET + else process.env.EPDS_INTERNAL_SECRET = ORIGINAL_SECRET +}) + +const buildRedirectMock = vi.hoisted(() => vi.fn()) +// Mock the start-over-href resolver at the shared package boundary +// rather than the underlying resolveClientMetadata, because the +// resolver's internal call path uses a relative import that vi.mock +// can't intercept transitively. +const resolveStartOverHrefMock = vi.hoisted(() => vi.fn()) + +vi.mock('../lib/redirect-to-client-error.js', () => ({ + buildClientErrorRedirect: buildRedirectMock, +})) +vi.mock('@certified-app/shared', async (importActual) => { + const actual = await importActual>() + return { + ...actual, + resolveStartOverHref: resolveStartOverHrefMock, + } +}) + +import { cleanExit } from '../lib/clean-exit.js' + +beforeEach(() => { + buildRedirectMock.mockReset() + resolveStartOverHrefMock.mockReset() +}) + +/** Build a minimal Response double that records the calls cleanExit + * makes on it. Same shape as the epds-callback-error test stub. */ +function makeResStub() { + const headers: Record = {} + let statusCode: number | undefined + let contentType: string | undefined + let body: string | undefined + let redirectStatus: number | undefined + let redirectLocation: string | undefined + + const res = { + setHeader(name: string, value: string) { + headers[name.toLowerCase()] = value + return res + }, + status(code: number) { + statusCode = code + return res + }, + type(t: string) { + contentType = t + return res + }, + send(s: string) { + body = s + return res + }, + redirect(status: number, location: string) { + redirectStatus = status + redirectLocation = location + }, + } as unknown as Response + + return { + res, + inspect: () => ({ + headers, + statusCode, + contentType, + body, + redirectStatus, + redirectLocation, + }), + } +} + +describe('cleanExit — Tier 1 (redirect to OAuth client)', () => { + it('issues a 303 redirect to the URL built by buildClientErrorRedirect when clientId is present', async () => { + const target = `${REDIRECT_URI}?error=access_denied&error_description=expired&iss=${encodeURIComponent(PDS_URL)}` + buildRedirectMock.mockResolvedValueOnce(target) + const { res, inspect } = makeResStub() + + await cleanExit({ + res, + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'access_denied', + description: 'Sign-in took too long.', + }) + + const got = inspect() + expect(got.redirectStatus).toBe(303) + expect(got.redirectLocation).toBe(target) + // The URL builder should have received our exact opts. + expect(buildRedirectMock).toHaveBeenCalledWith({ + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'access_denied', + description: 'Sign-in took too long.', + state: undefined, + }) + // No Start Over lookup needed — the redirect path doesn't fall + // through to the styled HTML. + expect(resolveStartOverHrefMock).not.toHaveBeenCalled() + // No HTML body emitted. + expect(got.body).toBeUndefined() + }) + + it('forwards state to buildClientErrorRedirect', async () => { + buildRedirectMock.mockResolvedValueOnce('https://demo.example/cb?error=...') + const { res } = makeResStub() + await cleanExit({ + res, + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'server_error', + description: 'd', + state: 'XNMi', + }) + expect(buildRedirectMock).toHaveBeenCalledWith( + expect.objectContaining({ state: 'XNMi' }), + ) + }) + + it('sets Cache-Control: no-store on the redirect path', async () => { + buildRedirectMock.mockResolvedValueOnce('https://demo.example/cb?error=...') + const { res, inspect } = makeResStub() + await cleanExit({ + res, + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'access_denied', + description: 'd', + }) + expect(inspect().headers['cache-control']).toBe('no-store') + }) +}) + +describe('cleanExit — Tier 2 (Start Over fallback when redirect fails)', () => { + it('renders the styled HTML page with a Start Over button using the resolved href', async () => { + buildRedirectMock.mockResolvedValueOnce(null) + resolveStartOverHrefMock.mockResolvedValueOnce('https://demo.example/') + const { res, inspect } = makeResStub() + + await cleanExit({ + res, + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'access_denied', + description: 'Sign-in took too long.', + }) + + const got = inspect() + expect(got.redirectLocation).toBeUndefined() + expect(got.statusCode).toBe(400) + expect(got.contentType).toBe('html') + expect(got.body).toContain('Sign-in took too long.') + expect(got.body).toContain('class="start-over"') + expect(got.body).toContain('href="https://demo.example/"') + expect(got.body).toContain('>Return to sign in') + // The Start Over resolver is responsible for picking client_uri / + // origin / sanitising schemes — that logic is owned (and unit + // tested) by `@certified-app/shared`'s start-over-href.test.ts. + // cleanExit just trusts whatever the resolver returns. + expect(resolveStartOverHrefMock).toHaveBeenCalledWith( + CLIENT_ID, + expect.any(Object), + ) + }) + + it('omits the Start Over button when the resolver returns null', async () => { + buildRedirectMock.mockResolvedValueOnce(null) + resolveStartOverHrefMock.mockResolvedValueOnce(null) + const { res, inspect } = makeResStub() + + await cleanExit({ + res, + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'access_denied', + description: 'd', + }) + + const got = inspect() + expect(got.body).not.toContain('class="start-over"') + expect(got.body).toContain('d') + }) + + it('honours fallbackStatus override (e.g. 500 for server_error)', async () => { + buildRedirectMock.mockResolvedValueOnce(null) + resolveStartOverHrefMock.mockResolvedValueOnce(null) + const { res, inspect } = makeResStub() + + await cleanExit({ + res, + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'server_error', + description: 'Internal failure', + fallbackStatus: 500, + }) + expect(inspect().statusCode).toBe(500) + }) + + it('uses "Authentication failed" as the fallback title for server_error code', async () => { + // Mismatched title vs body would mis-diagnose the failure for both + // users and operators — the body says "internal failure" so the + // heading shouldn't say "Sign-in session expired". + buildRedirectMock.mockResolvedValueOnce(null) + resolveStartOverHrefMock.mockResolvedValueOnce(null) + const { res, inspect } = makeResStub() + await cleanExit({ + res, + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'server_error', + description: 'Internal failure', + }) + const body = inspect().body! + expect(body).toContain('Authentication failed') + expect(body).toContain('

Authentication failed

') + expect(body).not.toContain('Sign-in session expired') + }) + + it('uses "Sign-in session expired" as the fallback title for access_denied code', async () => { + buildRedirectMock.mockResolvedValueOnce(null) + resolveStartOverHrefMock.mockResolvedValueOnce(null) + const { res, inspect } = makeResStub() + await cleanExit({ + res, + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'access_denied', + description: 'Sign-in took too long.', + }) + const body = inspect().body! + expect(body).toContain('Sign-in session expired') + expect(body).not.toContain('Authentication failed') + }) + + it('honours an explicit fallbackTitle override regardless of code', async () => { + buildRedirectMock.mockResolvedValueOnce(null) + resolveStartOverHrefMock.mockResolvedValueOnce(null) + const { res, inspect } = makeResStub() + await cleanExit({ + res, + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'access_denied', + description: 'd', + fallbackTitle: 'Custom heading', + }) + expect(inspect().body).toContain('Custom heading') + }) + + it('sets Cache-Control: no-store on the HTML fallback too', async () => { + buildRedirectMock.mockResolvedValueOnce(null) + resolveStartOverHrefMock.mockResolvedValueOnce(null) + const { res, inspect } = makeResStub() + await cleanExit({ + res, + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'access_denied', + description: 'd', + }) + expect(inspect().headers['cache-control']).toBe('no-store') + }) +}) + +describe('cleanExit — no clientId in scope', () => { + it('skips Tier 1 entirely and renders a button-less HTML page', async () => { + const { res, inspect } = makeResStub() + + await cleanExit({ + res, + clientId: null, + pdsUrl: PDS_URL, + code: 'access_denied', + description: 'Your sign-in took too long.', + }) + + const got = inspect() + // No Start Over lookup, no redirect builder call, no Start Over + // button — there's nothing to link to. + expect(buildRedirectMock).not.toHaveBeenCalled() + expect(resolveStartOverHrefMock).not.toHaveBeenCalled() + expect(got.redirectLocation).toBeUndefined() + expect(got.statusCode).toBe(400) + expect(got.contentType).toBe('html') + expect(got.body).toContain('Your sign-in took too long.') + expect(got.body).not.toContain('class="start-over"') + }) + + it('treats undefined clientId the same as null', async () => { + const { res, inspect } = makeResStub() + + await cleanExit({ + res, + clientId: undefined, + pdsUrl: PDS_URL, + code: 'access_denied', + description: 'd', + }) + + expect(buildRedirectMock).not.toHaveBeenCalled() + expect(inspect().contentType).toBe('html') + }) + + it('still sets Cache-Control: no-store when no clientId is in scope', async () => { + const { res, inspect } = makeResStub() + await cleanExit({ + res, + clientId: null, + pdsUrl: PDS_URL, + code: 'access_denied', + description: 'd', + }) + expect(inspect().headers['cache-control']).toBe('no-store') + }) +}) diff --git a/packages/auth-service/src/__tests__/heartbeat-toggle.test.ts b/packages/auth-service/src/__tests__/heartbeat-toggle.test.ts new file mode 100644 index 00000000..cb05ed42 --- /dev/null +++ b/packages/auth-service/src/__tests__/heartbeat-toggle.test.ts @@ -0,0 +1,144 @@ +/** + * Tests for the server-side and rendered-page sides of the PAR-heartbeat + * test toggle. + * + * Two surfaces: + * 1. heartbeatEnabledFor(req) — the function login-page.ts and + * recovery.ts call to decide whether to inline the heartbeat + * fetch into the page. + * 2. The rendered HTML — a smoke test that proves the OTP form and + * recovery OTP form actually contain (or omit) the /auth/ping + * fetch wired up by the toggle. + */ +import { describe, it, expect, afterEach } from 'vitest' +import type { Request } from 'express' +import { heartbeatEnabledFor, renderLoginPage } from '../routes/login-page.js' +import { renderRecoveryOtpForm } from '../routes/recovery.js' + +const ORIGINAL_TEST_HOOKS = process.env.EPDS_TEST_HOOKS + +afterEach(() => { + if (ORIGINAL_TEST_HOOKS === undefined) delete process.env.EPDS_TEST_HOOKS + else process.env.EPDS_TEST_HOOKS = ORIGINAL_TEST_HOOKS +}) + +function reqWith(query: Record): Request { + return { query, body: undefined } as unknown as Request +} + +function reqWithBody(body: Record): Request { + return { query: {}, body } as unknown as Request +} + +describe('heartbeatEnabledFor', () => { + it('is enabled by default in production (no test hooks)', () => { + delete process.env.EPDS_TEST_HOOKS + expect(heartbeatEnabledFor(reqWith({}))).toBe(true) + }) + + it('ignores no_heartbeat=1 when EPDS_TEST_HOOKS is unset', () => { + delete process.env.EPDS_TEST_HOOKS + expect(heartbeatEnabledFor(reqWith({ no_heartbeat: '1' }))).toBe(true) + }) + + it('is enabled when EPDS_TEST_HOOKS=1 but no_heartbeat is unset', () => { + process.env.EPDS_TEST_HOOKS = '1' + expect(heartbeatEnabledFor(reqWith({}))).toBe(true) + }) + + it('is disabled when both EPDS_TEST_HOOKS=1 AND no_heartbeat=1', () => { + process.env.EPDS_TEST_HOOKS = '1' + expect(heartbeatEnabledFor(reqWith({ no_heartbeat: '1' }))).toBe(false) + }) + + it('does not match arbitrary truthy no_heartbeat values', () => { + // Tighten the toggle: only '1' disables. Anything else (incl. 'true') + // is treated as a no-op rather than a footgun. + process.env.EPDS_TEST_HOOKS = '1' + expect(heartbeatEnabledFor(reqWith({ no_heartbeat: 'true' }))).toBe(true) + }) + + it('honours no_heartbeat=1 in form-encoded request bodies', () => { + // The recovery flow's POST handlers re-render the form from + // body fields, not query params, so the toggle must work + // through req.body for symmetry with req.query. + process.env.EPDS_TEST_HOOKS = '1' + expect(heartbeatEnabledFor(reqWithBody({ no_heartbeat: '1' }))).toBe(false) + }) + + it('treats body.no_heartbeat as disabled only on the literal string "1"', () => { + process.env.EPDS_TEST_HOOKS = '1' + expect(heartbeatEnabledFor(reqWithBody({ no_heartbeat: 'true' }))).toBe( + true, + ) + expect(heartbeatEnabledFor(reqWithBody({}))).toBe(true) + }) + + it('ignores body.no_heartbeat when EPDS_TEST_HOOKS is unset', () => { + delete process.env.EPDS_TEST_HOOKS + expect(heartbeatEnabledFor(reqWithBody({ no_heartbeat: '1' }))).toBe(true) + }) +}) + +function renderLoginPageWithHeartbeat(heartbeatEnabled: boolean): string { + return renderLoginPage({ + flowId: 'flow-1', + clientId: 'https://example.com/client-metadata.json', + clientName: 'Example', + branding: {}, + customCss: null, + customFaviconUrl: null, + customFaviconUrlDark: null, + loginHint: '', + initialStep: 'email', + otpAlreadySent: false, + csrfToken: 'csrf', + authBasePath: '/api/auth', + pdsPublicUrl: 'https://pds.example.com', + otpLength: 6, + otpCharset: 'numeric', + heartbeatEnabled, + }) +} + +function renderRecoveryOtpFormWithHeartbeat(heartbeatEnabled: boolean): string { + return renderRecoveryOtpForm({ + email: 'user@example.com', + csrfToken: 'csrf', + requestUri: 'urn:ietf:params:oauth:request_uri:req-abc', + otpLength: 6, + otpCharset: 'numeric', + heartbeatEnabled, + }) +} + +describe('renderLoginPage heartbeat wiring', () => { + it('inlines /auth/ping when heartbeat is enabled', () => { + const html = renderLoginPageWithHeartbeat(true) + expect(html).toContain("'/auth/ping'") + expect(html).toContain('var heartbeatEnabled = true;') + }) + + it('emits a disabled flag when heartbeat is off', () => { + const html = renderLoginPageWithHeartbeat(false) + expect(html).toContain('var heartbeatEnabled = false;') + // The fetch literal is still in the bundle, gated at runtime by + // the flag — that's fine and matches how the IIFE composes. + // What matters: the flag is false. + }) +}) + +describe('renderRecoveryOtpForm heartbeat wiring', () => { + it('inlines /auth/ping when heartbeat is enabled', () => { + const html = renderRecoveryOtpFormWithHeartbeat(true) + expect(html).toContain("'/auth/ping'") + // The if-guard at the top of the recovery script reads the flag + // and bails when false, so check the flag's compile-time value. + expect(html).toContain('if (!true) return;') + }) + + it('emits an early-return guard when heartbeat is off', () => { + const html = renderRecoveryOtpFormWithHeartbeat(false) + expect(html).toContain('if (!false) return;') + }) +}) diff --git a/packages/auth-service/src/__tests__/heartbeat.test.ts b/packages/auth-service/src/__tests__/heartbeat.test.ts new file mode 100644 index 00000000..7eec2398 --- /dev/null +++ b/packages/auth-service/src/__tests__/heartbeat.test.ts @@ -0,0 +1,200 @@ +/** + * Tests for the /auth/ping heartbeat route. + * + * Strategy: stand up the router with a hand-rolled `ctx` that satisfies + * just the slice the route reads (`db.getAuthFlow`). Mock + * `pingParRequest` at the module boundary so we can assert what the + * route forwards to it without hitting (or having to spy on) the + * test's own fetch back into the in-process express server. + */ +import { + describe, + it, + expect, + vi, + beforeEach, + beforeAll, + afterAll, +} from 'vitest' +import { + buildHeartbeatApp, + harnessGet, + type FakeFlow, +} from './__helpers__/heartbeat-router-harness.js' +import type express from 'express' + +// Use https:// in the test fixture so SonarQube's S5332 hotspot doesn't +// flag this file. The mocked pingParRequest never actually issues a +// request — the literal flows from process.env to the route handler +// and back out as a positional argument we assert on. +const PDS_URL = 'https://core:3000' +const SECRET = 'test-secret' + +const ORIGINAL_PDS_URL = process.env.PDS_INTERNAL_URL +const ORIGINAL_SECRET = process.env.EPDS_INTERNAL_SECRET + +beforeAll(() => { + process.env.PDS_INTERNAL_URL = PDS_URL + process.env.EPDS_INTERNAL_SECRET = SECRET +}) + +afterAll(() => { + if (ORIGINAL_PDS_URL === undefined) delete process.env.PDS_INTERNAL_URL + else process.env.PDS_INTERNAL_URL = ORIGINAL_PDS_URL + if (ORIGINAL_SECRET === undefined) delete process.env.EPDS_INTERNAL_SECRET + else process.env.EPDS_INTERNAL_SECRET = ORIGINAL_SECRET +}) + +// Mock the ping-par-request module so we can drive the route's success +// and failure branches without standing up a fake pds-core server. +const pingMock = vi.hoisted(() => vi.fn()) +vi.mock('../lib/ping-par-request.js', () => ({ + pingParRequest: pingMock, +})) + +beforeEach(() => { + pingMock.mockReset() +}) + +// Late import so the vi.mock above is in effect. +const { createHeartbeatRouter } = await import('../routes/heartbeat.js') + +function buildApp(flows: Map): express.Express { + return buildHeartbeatApp(createHeartbeatRouter, flows) +} + +async function getPing( + app: express.Express, + cookie?: string, +): Promise<{ status: number; cacheControl: string | null; body: unknown }> { + const r = await harnessGet(app, '/auth/ping', cookie) + return { status: r.status, cacheControl: r.cacheControl, body: r.body } +} + +describe('GET /auth/ping', () => { + it('returns no_cookie when the auth_flow cookie is missing', async () => { + const app = buildApp(new Map()) + + const res = await getPing(app) + + expect(res.status).toBe(200) + expect(res.body).toEqual({ ok: false, reason: 'no_cookie' }) + // We do NOT call pds-core when there is nothing to ping. + expect(pingMock).not.toHaveBeenCalled() + }) + + it('returns flow_expired when the auth_flow row is gone', async () => { + const app = buildApp(new Map()) + + const res = await getPing(app, 'epds_auth_flow=missing-flow') + + expect(res.status).toBe(200) + expect(res.body).toEqual({ ok: false, reason: 'flow_expired' }) + expect(pingMock).not.toHaveBeenCalled() + }) + + it('forwards a successful ping and returns ok:true', async () => { + const flows = new Map([ + [ + 'flow-1', + { + requestUri: 'urn:ietf:params:oauth:request_uri:req-abc', + clientId: 'https://demo.example.com/client-metadata.json', + handleMode: null, + }, + ], + ]) + const app = buildApp(flows) + pingMock.mockResolvedValueOnce({ ok: true }) + + const res = await getPing(app, 'epds_auth_flow=flow-1') + + expect(res.status).toBe(200) + expect(res.body).toEqual({ ok: true }) + expect(pingMock).toHaveBeenCalledTimes(1) + expect(pingMock).toHaveBeenCalledWith( + 'urn:ietf:params:oauth:request_uri:req-abc', + PDS_URL, + SECRET, + ) + }) + + it('returns par_expired when pds-core reports the request_uri is gone', async () => { + const flows = new Map([ + [ + 'flow-1', + { + requestUri: 'urn:ietf:params:oauth:request_uri:req-dead', + clientId: null, + handleMode: null, + }, + ], + ]) + const app = buildApp(flows) + pingMock.mockResolvedValueOnce({ ok: false, status: 404 }) + + const res = await getPing(app, 'epds_auth_flow=flow-1') + + expect(res.status).toBe(200) + expect(res.body).toEqual({ ok: false, reason: 'par_expired' }) + }) + + it('returns transient on operational errors so the browser keeps polling', async () => { + const flows = new Map([ + [ + 'flow-1', + { + requestUri: 'urn:ietf:params:oauth:request_uri:req-blip', + clientId: null, + handleMode: null, + }, + ], + ]) + const app = buildApp(flows) + pingMock.mockResolvedValueOnce({ ok: false, status: 502 }) + + const res = await getPing(app, 'epds_auth_flow=flow-1') + + // Only a 404 from pds-core terminates keepalive — a 5xx blip + // (or any non-404 failure) is reported as transient so the + // browser keeps polling and the next tick can recover. Treating + // a single dropped packet as terminal would re-introduce the + // dead-end the heartbeat exists to prevent. + expect(res.status).toBe(200) + expect(res.body).toEqual({ ok: false, reason: 'transient' }) + }) + + it('returns transient when pingParRequest reports a thrown error (no status)', async () => { + const flows = new Map([ + [ + 'flow-1', + { + requestUri: 'urn:ietf:params:oauth:request_uri:req-throw', + clientId: null, + handleMode: null, + }, + ], + ]) + const app = buildApp(flows) + // pingParRequest catches network/timeout errors and reports them + // as `{ ok: false, err }` with no `status` field. Same transient + // semantics — the next tick may recover. + pingMock.mockResolvedValueOnce({ + ok: false, + err: new Error('network blip'), + }) + + const res = await getPing(app, 'epds_auth_flow=flow-1') + + expect(res.status).toBe(200) + expect(res.body).toEqual({ ok: false, reason: 'transient' }) + }) + + it('sets Cache-Control: no-store so a shared cache cannot serve a stale response across flows', async () => { + const app = buildApp(new Map()) + + const res = await getPing(app) + + expect(res.cacheControl).toBe('no-store') + }) +}) diff --git a/packages/auth-service/src/__tests__/login-page.test.ts b/packages/auth-service/src/__tests__/login-page.test.ts index 4c28a586..5376f713 100644 --- a/packages/auth-service/src/__tests__/login-page.test.ts +++ b/packages/auth-service/src/__tests__/login-page.test.ts @@ -430,6 +430,7 @@ describe('renderLoginPage handle login button', () => { pdsPublicUrl: 'https://pds.example.com', otpLength: 6, otpCharset: 'numeric', + heartbeatEnabled: false, }) } @@ -516,6 +517,7 @@ function renderDefault(): string { pdsPublicUrl: 'https://pds.example.com', otpLength: 6, otpCharset: 'numeric', + heartbeatEnabled: false, }) } @@ -570,7 +572,168 @@ describe('renderLoginPage OTP verify-form double-submit latch (regression)', () const branchEnd = html.indexOf('verifying = false;', branchStart) expect(branchEnd).toBeGreaterThan(branchStart) const branch = html.slice(branchStart, branchEnd) - expect(branch).toContain('showError(result.error);') + // Either the plain showError or the inline-action variant is fine + // (both surface the message to the user); just assert SOME error + // surface is invoked. + expect(branch).toMatch(/showError(?:WithAction)?\(/) expect(branch).toContain('clearOtpBoxes();') }) }) + +describe('renderLoginPage inline Resend action on expired OTP', () => { + // The OTP-expired error used to surface only as "OTP expired" / + // "Invalid or expired code" text inside the error banner. Users + // missed the separate Resend button below the form. The inline + // action button surfaces "Send a new code" right next to the + // error message and triggers the same Resend flow. + + it('declares showErrorWithAction with a textContent-only label sink', () => { + const html = renderDefault() + expect(html).toContain('function showErrorWithAction(') + // The label is set via .textContent, never via innerHTML — a + // reflected error string that happened to look like HTML must + // not be able to inject script tags. + const fnStart = html.indexOf('function showErrorWithAction(') + expect(fnStart).toBeGreaterThan(0) + const fnEnd = html.indexOf('function clearError', fnStart) + expect(fnEnd).toBeGreaterThan(fnStart) + const fnBody = html.slice(fnStart, fnEnd) + expect(fnBody).toContain('btn.textContent = actionLabel') + expect(fnBody).not.toContain('innerHTML') + }) + + it('detects OTP-expired errors via a substring-stable regex', () => { + const html = renderDefault() + // The detection must catch: + // - better-auth's "Invalid or expired code" + // - auth-service's "OTP expired" + // - any future wording with "expir" or "too long" in it + expect(html).toMatch(/var isExpired = \/expir\|too long\/i\.test/) + }) + + it('renders the inline action with the "Send a new code" label and triggers the Resend button', () => { + const html = renderDefault() + // The inline action label and the click target must be present. + expect(html).toContain("'Send a new code'") + expect(html).toContain("document.getElementById('btn-resend').click()") + }) + + it('falls back to the plain showError on non-expired errors', () => { + const html = renderDefault() + // The non-expired branch must NOT route through + // showErrorWithAction (otherwise an "Invalid code" message + // would carry an inappropriate "Send a new code" link). + expect(html).toMatch( + /if \(isExpired\) \{[\s\S]*?\} else \{[\s\S]*?showError\(result\.error\);\s*\}/, + ) + }) +}) + +describe('renderLoginPage flow-aborted notice + reactive abort gates', () => { + // The proactive notice fires when /auth/ping reports the flow is + // unrecoverable (par_expired / flow_expired / no_cookie). It + // disables every form control and shows a Start over button that + // navigates to /auth/abort. The reactive gates (Resend click, + // Verify submit) ping /auth/ping just-in-time and bail to + // /auth/abort if the flow is dead — defence in depth on top of + // the proactive notice. + + it('inlines /auth/abort as the Start over destination', () => { + const html = renderDefault() + expect(html).toContain("'/auth/abort'") + }) + + it('declares showFlowAbortedNotice as idempotent (flowAborted flag)', () => { + const html = renderDefault() + // The idempotence guard prevents duplicate banners if both the + // proactive heartbeat tick AND a reactive gate fire the notice. + expect(html).toContain('var flowAborted = false') + expect(html).toMatch( + /function showFlowAbortedNotice\(\)\s*\{\s*if \(flowAborted\) return;/, + ) + }) + + it('disables every form control when the notice fires', () => { + const html = renderDefault() + const fnStart = html.indexOf('function showFlowAbortedNotice()') + expect(fnStart).toBeGreaterThan(0) + const fnEnd = html.indexOf('function abortIfFlowDead', fnStart) + expect(fnEnd).toBeGreaterThan(fnStart) + const fnBody = html.slice(fnStart, fnEnd) + // OTP boxes, Resend, Back, and Verify must all get disabled — + // anything left enabled would let the user click into a path + // that silently fails. + expect(fnBody).toMatch(/otpBoxes\[i\]\.disabled = true/) + expect(fnBody).toMatch(/resendBtn\.disabled = true/) + expect(fnBody).toMatch(/backBtn\.disabled = true/) + expect(fnBody).toMatch(/verifyBtn\.disabled = true/) + }) + + it('renders the Start over button with a textContent label sink', () => { + const html = renderDefault() + const fnStart = html.indexOf('function showFlowAbortedNotice()') + const fnEnd = html.indexOf('function abortIfFlowDead', fnStart) + const fnBody = html.slice(fnStart, fnEnd) + expect(fnBody).toContain("startOverBtn.textContent = 'Start over'") + // No innerHTML — same XSS guard as the inline-action button. + expect(fnBody).not.toContain('innerHTML') + }) + + it('triggers the proactive notice when the heartbeat reports a non-transient ok:false', () => { + const html = renderDefault() + // The pingHeartbeat handler must call showFlowAbortedNotice + // when reason !== 'transient'. Transient failures must not + // trigger the notice. Substring-based slicing rather than a + // greedy regex (.*?) so Sonar doesn't flag this as ReDoS. + const guardIdx = html.indexOf( + "if (body && body.ok === false && body.reason !== 'transient')", + ) + expect(guardIdx).toBeGreaterThan(0) + // The two effects (stop pinging, show the notice) must both + // appear after the guard. Use a bounded slice to keep the + // assertion linear. + const branchSlice = html.slice(guardIdx, guardIdx + 600) + expect(branchSlice).toContain('stopHeartbeat();') + expect(branchSlice).toContain('showFlowAbortedNotice();') + }) + + it('gates the Resend click on abortIfFlowDead', () => { + const html = renderDefault() + // The Resend click handler must call abortIfFlowDead and + // bail if it returns true. + const handlerStart = html.indexOf("'btn-resend').addEventListener") + expect(handlerStart).toBeGreaterThan(0) + const handlerEnd = html.indexOf( + "'btn-back').addEventListener", + handlerStart, + ) + expect(handlerEnd).toBeGreaterThan(handlerStart) + const handlerBody = html.slice(handlerStart, handlerEnd) + expect(handlerBody).toMatch(/if \(await abortIfFlowDead\(\)\) return/) + // The abort gate must run BEFORE sendOtp — calling sendOtp + // first would issue an OTP that cannot be used. + const gateIdx = handlerBody.indexOf('abortIfFlowDead') + const sendIdx = handlerBody.indexOf('sendOtp(currentEmail)') + expect(gateIdx).toBeGreaterThan(0) + expect(sendIdx).toBeGreaterThan(gateIdx) + }) + + it('gates the Verify submit on abortIfFlowDead', () => { + const html = renderDefault() + // Verify gate runs BEFORE verifyOtp — same reason: don't + // consume the OTP if the flow can't complete anyway. + const handlerStart = html.indexOf("'form-verify-otp').addEventListener") + expect(handlerStart).toBeGreaterThan(0) + const handlerEnd = html.indexOf( + "'btn-resend').addEventListener", + handlerStart, + ) + expect(handlerEnd).toBeGreaterThan(handlerStart) + const handlerBody = html.slice(handlerStart, handlerEnd) + expect(handlerBody).toMatch(/if \(await abortIfFlowDead\(\)\) return/) + const gateIdx = handlerBody.indexOf('abortIfFlowDead') + const verifyIdx = handlerBody.indexOf('verifyOtp(currentEmail, otp)') + expect(gateIdx).toBeGreaterThan(0) + expect(verifyIdx).toBeGreaterThan(gateIdx) + }) +}) diff --git a/packages/auth-service/src/__tests__/redirect-to-client-error.test.ts b/packages/auth-service/src/__tests__/redirect-to-client-error.test.ts new file mode 100644 index 00000000..d0a59a06 --- /dev/null +++ b/packages/auth-service/src/__tests__/redirect-to-client-error.test.ts @@ -0,0 +1,191 @@ +/** + * Tests for buildClientErrorRedirect — the auth-service helper that + * resolves an OAuth client's published metadata and constructs an + * RFC 6749 §4.1.2.1 error redirect URL pointing at redirect_uris[0]. + * + * Drives every branch (happy redirect, missing metadata, no + * redirect_uris, unparseable URL, non-http(s) scheme) by mocking + * resolveClientMetadata at the module boundary. + */ +import { + describe, + it, + expect, + vi, + beforeEach, + beforeAll, + afterAll, +} from 'vitest' + +const PDS_URL = 'https://pds.example' +const CLIENT_ID = 'https://demo.example/client-metadata.json' +const REDIRECT_URI = 'https://demo.example/api/oauth/callback' + +const ORIGINAL_PDS_URL = process.env.PDS_INTERNAL_URL +const ORIGINAL_SECRET = process.env.EPDS_INTERNAL_SECRET + +beforeAll(() => { + // The lib doesn't read these directly, but importing transitively + // pulls in modules that do, and an unset `EPDS_INTERNAL_SECRET` + // would crash a sibling import. + process.env.PDS_INTERNAL_URL = 'https://core:3000' + process.env.EPDS_INTERNAL_SECRET = 'test-secret' +}) + +afterAll(() => { + if (ORIGINAL_PDS_URL === undefined) delete process.env.PDS_INTERNAL_URL + else process.env.PDS_INTERNAL_URL = ORIGINAL_PDS_URL + if (ORIGINAL_SECRET === undefined) delete process.env.EPDS_INTERNAL_SECRET + else process.env.EPDS_INTERNAL_SECRET = ORIGINAL_SECRET +}) + +const resolveClientMetadataMock = vi.hoisted(() => vi.fn()) +vi.mock('@certified-app/shared', async (importActual) => { + const actual = await importActual>() + return { + ...actual, + resolveClientMetadata: resolveClientMetadataMock, + } +}) + +import { buildClientErrorRedirect } from '../lib/redirect-to-client-error.js' + +beforeEach(() => { + resolveClientMetadataMock.mockReset() +}) + +describe('buildClientErrorRedirect', () => { + it('builds a redirect URL with all RFC 6749 §4.1.2.1 query params on the happy path', async () => { + resolveClientMetadataMock.mockResolvedValueOnce({ + redirect_uris: [REDIRECT_URI], + }) + const target = await buildClientErrorRedirect({ + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'access_denied', + description: 'Sign-in took too long.', + }) + expect(target).not.toBeNull() + const url = new URL(target!) + expect(url.origin + url.pathname).toBe(REDIRECT_URI) + expect(url.searchParams.get('error')).toBe('access_denied') + expect(url.searchParams.get('error_description')).toBe( + 'Sign-in took too long.', + ) + expect(url.searchParams.get('iss')).toBe(PDS_URL) + expect(url.searchParams.has('state')).toBe(false) + }) + + it('preserves state when supplied', async () => { + resolveClientMetadataMock.mockResolvedValueOnce({ + redirect_uris: [REDIRECT_URI], + }) + const target = await buildClientErrorRedirect({ + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'server_error', + description: 'Internal error.', + state: 'XNMi-ebr4JAUAEWa-52HEA', + }) + const url = new URL(target!) + expect(url.searchParams.get('state')).toBe('XNMi-ebr4JAUAEWa-52HEA') + expect(url.searchParams.get('error')).toBe('server_error') + }) + + it('returns null when resolveClientMetadata throws', async () => { + resolveClientMetadataMock.mockRejectedValueOnce( + new Error('network unreachable'), + ) + const target = await buildClientErrorRedirect({ + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'access_denied', + description: 'd', + }) + expect(target).toBeNull() + }) + + it('returns null when metadata has no redirect_uris', async () => { + resolveClientMetadataMock.mockResolvedValueOnce({}) + const target = await buildClientErrorRedirect({ + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'access_denied', + description: 'd', + }) + expect(target).toBeNull() + }) + + it('returns null when redirect_uris is empty', async () => { + resolveClientMetadataMock.mockResolvedValueOnce({ redirect_uris: [] }) + const target = await buildClientErrorRedirect({ + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'access_denied', + description: 'd', + }) + expect(target).toBeNull() + }) + + it('returns null when redirect_uris[0] is not a valid URL', async () => { + resolveClientMetadataMock.mockResolvedValueOnce({ + redirect_uris: ['not a url at all'], + }) + const target = await buildClientErrorRedirect({ + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'access_denied', + description: 'd', + }) + expect(target).toBeNull() + }) + + it('rejects a redirect_uris[0] with a non-http(s) scheme', async () => { + // Defence in depth — atproto upstream validates redirect_uris at + // PAR creation, but the catch path that calls this helper exists + // precisely to spare the user a 500. An unhandled `javascript:` + // redirect would defeat that purpose. + resolveClientMetadataMock.mockResolvedValueOnce({ + redirect_uris: ['javascript:alert(1)'], + }) + const target = await buildClientErrorRedirect({ + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'access_denied', + description: 'd', + }) + expect(target).toBeNull() + }) + + it('accepts http:// (for localhost dev loops)', async () => { + resolveClientMetadataMock.mockResolvedValueOnce({ + redirect_uris: ['http://localhost:3002/api/oauth/callback'], + }) + const target = await buildClientErrorRedirect({ + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'access_denied', + description: 'd', + }) + expect(target).not.toBeNull() + expect(target!.startsWith('http://localhost:3002/')).toBe(true) + }) + + it('uses redirect_uris[0] when multiple are registered', async () => { + // Documented limitation: the dead-PAR path lost the in-flight choice + // of which URI the client used. Pinning [0] is RFC-compliant ("any + // registered URI") but loses correlation. See the deferred Copilot + // threads on PR #154 for the full rationale. + resolveClientMetadataMock.mockResolvedValueOnce({ + redirect_uris: ['https://demo.example/cb-1', 'https://demo.example/cb-2'], + }) + const target = await buildClientErrorRedirect({ + clientId: CLIENT_ID, + pdsUrl: PDS_URL, + code: 'access_denied', + description: 'd', + }) + const url = new URL(target!) + expect(url.pathname).toBe('/cb-1') + }) +}) diff --git a/packages/auth-service/src/__tests__/render-error.test.ts b/packages/auth-service/src/__tests__/render-error.test.ts index 9597698d..57124b0e 100644 --- a/packages/auth-service/src/__tests__/render-error.test.ts +++ b/packages/auth-service/src/__tests__/render-error.test.ts @@ -33,4 +33,77 @@ describe('renderError', () => { const html = renderError('x') expect(html).toMatch(/