Skip to content

Commit d48bfd1

Browse files
committed
feat: sandbox /api/fetch-url via an external Cloudflare Worker proxy
The `/api/fetch-url` endpoint proxies arbitrary user-supplied URLs so the browser can load attachments without CORS issues. Even with the existing SSRF defenses (HTTPS-only, private-IP blocking, DNS rebinding prevention via a custom undici Agent), the outbound fetch still runs in the main Node process — any bypass would expose API keys, MongoDB credentials, OIDC secrets, and the internal Docker network. Move the actual fetch into a Cloudflare Worker running in a V8 isolate on the edge. The Worker has no access to any of our secrets, databases, or internal network; a bypass inside the Worker reaches only the public internet. The main app delegates to the Worker only when `FETCH_PROXY_URL` is configured, so self-hosted users and local dev continue to work unchanged via the existing in-process path. - fetch-proxy/: new Cloudflare Worker (hostname validation, redirect re-validation, streamed byte cap, constant-time secret check, strict response headers). Ships with 20 vitest tests and a README covering deployment and local dev. - src/routes/api/fetch-url/+server.ts: delegate to the proxy when `FETCH_PROXY_URL` is set, mapping proxy headers back to the existing client-facing contract so consumers (`loadAttachmentsFromUrls.ts`, `UrlFetchModal.svelte`) need no changes. - src/routes/api/fetch-url/fetch-url.spec.ts: new server tests covering both the direct and delegated paths. - src/lib/server/config.ts, .env: register `FETCH_PROXY_URL` and `FETCH_PROXY_SECRET`. https://claude.ai/code/session_01GSrTH9N3bLfoWYhnJdSRyP
1 parent ce11de3 commit d48bfd1

File tree

13 files changed

+4023
-1
lines changed

13 files changed

+4023
-1
lines changed

.env

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,15 @@ MCP_FORWARD_HF_USER_TOKEN=
135135
EXA_API_KEY=
136136
# Timeout in milliseconds for MCP tool calls (default: 120000 = 2 minutes)
137137
MCP_TOOL_TIMEOUT_MS=
138+
139+
### Fetch proxy (optional) ###
140+
# When FETCH_PROXY_URL is set, /api/fetch-url delegates outbound fetches to a
141+
# sandboxed Cloudflare Worker (see fetch-proxy/README.md). This isolates the
142+
# actual fetch from this process's network, env vars, and database. Leaving
143+
# these blank falls back to the in-process SSRF-safe fetch path.
144+
FETCH_PROXY_URL=
145+
FETCH_PROXY_SECRET=
146+
138147
ENABLE_DATA_EXPORT=true
139148

140149
### Rate limits ###

fetch-proxy/.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
node_modules
2+
.wrangler
3+
.dev.vars
4+
dist

fetch-proxy/README.md

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# chat-ui fetch proxy
2+
3+
A sandboxed URL fetch proxy for `chat-ui`, deployed as a [Cloudflare Worker](https://developers.cloudflare.com/workers/). It runs in a V8 isolate on Cloudflare's edge — completely separated from the main app's network namespace, environment variables, and secrets.
4+
5+
## Why
6+
7+
The `/api/fetch-url` endpoint in chat-ui proxies arbitrary user-supplied URLs so the browser can load attachments without hitting CORS. Even though it has strong SSRF protections (private-IP blocking, DNS rebinding prevention), the actual outbound fetch still runs in the main Node.js process. Any bypass would expose `OPENAI_API_KEY`, `MONGODB_URL`, OIDC secrets, and the internal Docker network.
8+
9+
Running the fetch inside a Cloudflare Worker gives true network isolation for free: the Worker's egress is Cloudflare's edge network, which can't route to our internal services. A bypass inside the Worker reaches only the public internet.
10+
11+
## How it works
12+
13+
```
14+
┌─────────────────┐ HTTPS + secret ┌──────────────────┐ HTTPS ┌──────────┐
15+
│ Main App │ ────────────────► │ Cloudflare │ ──────────► │ External │
16+
│ (SvelteKit) │ │ Worker (V8) │ │ URL │
17+
└─────────────────┘ └──────────────────┘ └──────────┘
18+
```
19+
20+
The main app delegates to this Worker when `FETCH_PROXY_URL` is set in its environment. Otherwise it falls back to the existing in-process fetch path — so self-hosted users and local dev continue to work unchanged.
21+
22+
## API
23+
24+
### `GET /fetch?url=<encoded-url>`
25+
26+
Headers:
27+
28+
- `X-Proxy-Secret: <secret>` — must match the Worker's `FETCH_PROXY_SECRET` secret.
29+
30+
On success, returns the upstream body as `application/octet-stream` plus:
31+
32+
- `X-Original-Content-Type` — upstream `Content-Type`
33+
- `X-Original-Status` — upstream HTTP status
34+
- `X-Final-Url` — final URL after redirect following
35+
- `Content-Disposition` — passed through if the upstream supplied one
36+
37+
Enforced limits (configurable via `wrangler.toml`):
38+
39+
- `MAX_RESPONSE_BYTES` — default 10 MB
40+
- `FETCH_TIMEOUT_MS` — default 30 000
41+
- `MAX_REDIRECTS` — default 5
42+
43+
### `GET /health`
44+
45+
Returns `200 ok`. No auth required.
46+
47+
## Security
48+
49+
- **HTTPS only.** HTTP, FTP, file, javascript schemes are rejected.
50+
- **Hostnames only.** Raw IPv4 / IPv6 literals are rejected — a hostname is required. Cloudflare's edge network cannot route to RFC1918 addresses anyway, but we keep the string-level check as defence-in-depth.
51+
- **Redirect re-validation.** Every `Location` header is re-parsed and re-validated before we follow it. Max 5 hops by default.
52+
- **Size cap enforced mid-stream.** The body reader aborts the upstream connection as soon as the cumulative byte count exceeds `MAX_RESPONSE_BYTES`, so a large response never fully buffers.
53+
- **Constant-time secret comparison.** Avoids leaking secret length or prefix via timing.
54+
- **Strict response headers.** `CSP default-src 'none'`, `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`, `Referrer-Policy: no-referrer` on every response.
55+
56+
## Deploy
57+
58+
1. Install dependencies:
59+
```bash
60+
cd fetch-proxy
61+
npm install
62+
```
63+
2. Authenticate wrangler against your Cloudflare account:
64+
```bash
65+
npx wrangler login
66+
```
67+
3. Generate a strong random secret and upload it:
68+
```bash
69+
# macOS/Linux
70+
openssl rand -base64 48 | npx wrangler secret put FETCH_PROXY_SECRET
71+
```
72+
4. Deploy the Worker:
73+
```bash
74+
npx wrangler deploy
75+
```
76+
Wrangler will print the Worker URL (for example `https://chat-ui-fetch-proxy.<account>.workers.dev`).
77+
5. Configure the main chat-ui app. In `.env.local`:
78+
```bash
79+
FETCH_PROXY_URL=https://chat-ui-fetch-proxy.<account>.workers.dev
80+
FETCH_PROXY_SECRET=<the same value you put in step 3>
81+
```
82+
83+
## Develop locally
84+
85+
```bash
86+
cd fetch-proxy
87+
npm install
88+
npm run dev # starts wrangler dev on http://localhost:8787
89+
```
90+
91+
Create a `.dev.vars` file next to `wrangler.toml` with your local secret so `wrangler dev` can pick it up. Do not commit it — `.gitignore` already excludes it.
92+
93+
```
94+
FETCH_PROXY_SECRET=dev
95+
```
96+
97+
Quick smoke test:
98+
99+
```bash
100+
# Health check
101+
curl http://localhost:8787/health
102+
103+
# Authorized fetch
104+
curl -H "X-Proxy-Secret: dev" \
105+
"http://localhost:8787/fetch?url=https%3A%2F%2Fexample.com"
106+
107+
# Unauthorized
108+
curl -H "X-Proxy-Secret: wrong" \
109+
"http://localhost:8787/fetch?url=https%3A%2F%2Fexample.com"
110+
111+
# SSRF attempt
112+
curl -H "X-Proxy-Secret: dev" \
113+
"http://localhost:8787/fetch?url=https%3A%2F%2Flocalhost%2Fadmin"
114+
```
115+
116+
## Tests
117+
118+
```bash
119+
npm test
120+
```

0 commit comments

Comments
 (0)