Skip to content

perf: lazy-load snapshot polyfills (-2.65 MB / -26.5%)#34061

Open
nathanwhit wants to merge 11 commits into
denoland:mainfrom
nathanwhit:perf/lazy-load-snapshot
Open

perf: lazy-load snapshot polyfills (-2.65 MB / -26.5%)#34061
nathanwhit wants to merge 11 commits into
denoland:mainfrom
nathanwhit:perf/lazy-load-snapshot

Conversation

@nathanwhit
Copy link
Copy Markdown
Member

Summary

Lazifies a large fraction of the JS code currently baked into the CLI startup snapshot. End result:

Snapshot blob Delta
Before this stack (main) ~11.4 MB
After 7,331,556 bytes −~3.1 MB / −27% from main
(final commit of stack — web-streams lazification alone) 9,980,849 → 7,331,556 −2.65 MB

Verified with DENO_LOG_LAZY_LOAD=1 deno run hello.js: 0 lazy loads at startup, in both TTY and pipe stdout modes. A non-fetch/stream/fs.promises/node:repl program no longer pays parse cost for any of those subtrees.

What's now lazy

Web platform (final commit)

The 208 KB 06_streams.js polyfill and every ext module that pulls it:

  • ReadableStream / WritableStream / TransformStream and all their inner controllers/readers (13 stream classes)
  • Request / Response / fetch / EventSource (chain through 22_body.js06_streams.js)
  • caches / CacheStorage / Cache
  • CompressionStream / DecompressionStream
  • node:stream/web
  • Deno.serve / Deno.serveHttp / Deno.upgradeWebSocket / Deno.Command / Deno.run / Deno.spawn* / Deno.kill / Deno.openKv

Node polyfills (earlier in stack)

  • HTTP cluster: node:http / node:http2 / node:https / node:_http_* / node:internal/http*
  • Crypto cluster: node:crypto / node:internal/crypto/{cipher,hash,...}
  • Streams cluster: node:zlib, node:repl, node:internal/repl, node:readline, node:readline/promises
  • Process cluster: node:child_process, node:internal/child_process, node:dgram, node:cluster
  • TLS cluster: node:tls, node:_tls_common, node:_tls_wrap
  • Misc: node:fs/promises, node:assert/strict, node:internal/event_target, node:internal/fs/utils

Kept eager (loading them is on the hot path of every program):
node:stream, node:stream/promises, node:net, node:tty, node:module, node:process.

Overview of changes

Infrastructure (build(snapshot) + refactor(core))

  • DENO_SNAPSHOT_IMPORT_GRAPH=<file> env var: dump JSONL of every esm/lazy-script edge during snapshot build. Used to identify exactly which scripts are dragging which polyfills into the snapshot.
  • DENO_LOG_LAZY_LOAD=1 runtime env var: prints a stderr line each time a lazy_loaded_esm / lazy_loaded_js entry actually parses at runtime. Cache hits suppressed.
  • Captured __bootstrap in 01_core.js so deferred loadExtScript calls still find core/primordials/internals after 99_main.js deletes globalThis.__bootstrap.
  • Residual .ts transpile in build.rs: pre-transpile any lazy_loaded_js / lazy_loaded_esm file that wasn't consumed at snapshot time so the runtime loader receives parseable JS rather than TypeScript.
  • Lazy-ESM resolve fallback: in module_map, if static-import resolve fails, fall back to the lazy ESM source list before erroring (lets node:_http_* re-export work without eager registration).

Bug fixes pulled out of the lazification work

  • fix(ext/node): defer lazyLoadProcess() to deprecated() wrapper to break the assert.ts ↔ process.ts cycle exposed by lazification.
  • fix(core): drop the module-map borrow before recursively re-evaluating a lazy ESM module — the prior code held it across module.evaluate(scope) and panicked on RefCell::borrow_mut during recursive lazy_load_esm.

Final commit — web-streams chain

  • runtime/js/98_global_scope_shared.js: converts every streams-pulling global to propNonEnumerableLazyLoaded / wrapper-function form.
  • runtime/js/99_main.js: stops spreading denoNs with {...denoNs} (which invokes every getter); uses ObjectDefineProperties + getOwnPropertyDescriptors instead. Same fix for the unstable-feature merge loop. Wraps the wasm-streaming callback and defers registerDeclarativeServer to the addMainModuleHandler callback.
  • ext/web/13_message_port.js: drops top-level streams import; markNotSerializable registration moved into 06_streams.js itself (inverts the dep so message_port no longer drags streams).
  • ext/node/polyfills/01_require.js: lazifies internal/child_process (which pulled 40_process.js → 22_body.js) and stream/web (which pulled 14_compression.js).
  • ext/node/polyfills/internal/streams/fast-utf8-stream.js: replaces import * as fs from "node:fs" with createLazyLoader("node:fs"). The static import was re-entering node:fs's evaluating body and TDZ-trapping on lazyUtf8Stream().default.
  • ext/node/polyfills/internal/fs/{handle,promises}.ts: defers every top-level promisify(lazyFs().X) to first-call wrappers. Same TDZ cycle: node:fs's export const promises = mod.promises line re-triggers get promises while lazyInternalPromises().default is still in TDZ.

Outcome

Surface Improvement
Snapshot size 11.4 MB → 7.33 MB (−3.1 MB / −27%)
deno run empty.js startup parses 0 lazy loads in TTY and pipe modes
import 'node:crypto' cost Paid by users of crypto (3 lazy loads)
import 'node:http' cost Paid by users of http (9 lazy loads)
fetch('...') first-call cost Loads 26_fetch.js + 22_body.js + 06_streams.js on demand

Programs that don't touch streams/fetch/http/repl/Deno.serve no longer pay the parse cost.

Test plan

  • cargo test passes
  • cargo test --test node_compat passes (down from 43 → ~38 fails, the remainder are pre-existing on main: IO Safety violation in fork and the v8 weak-handle GC flake in test-repl-tab-complete-buffer)
  • DENO_LOG_LAZY_LOAD=1 deno run empty.js prints 0 lazy loads (TTY and pipe)
  • Smokes: Deno.serve, fetch, new ReadableStream/Request/Response, structuredClone(new ReadableStream()) rejection, fs.promises.readdir/readFile, node:child_process.spawn, node:stream/web

Adds an env-var-gated graph collector that records every ESM static import,
op_lazy_load_esm call, and loadExtScript call observed during snapshot
creation. Set DENO_SNAPSHOT_IMPORT_GRAPH=<path> when building to emit one
JSON edge per line at <path>; otherwise it's a no-op. Each record carries
{from, to, kind} where kind is "esm" | "lazy_esm" | "lazy_script". The
caller of lazy_esm/lazy_script edges is recovered by walking the v8 stack
and skipping the ext:core/01_core.js wrapper frame. Used to identify
modules that anchor entire subtrees in the snapshot's static closure.
Every lazy_loaded_js script's IIFE preamble destructures
globalThis.__bootstrap to get at core/primordials/internals, but
runtime bootstrap (runtime/js/99_main.js) deletes that property to hide
internals from user code. So a script that's residual (or otherwise
loaded after bootstrap completes) sees `undefined` and throws.

Capture the bootstrap object into a closure variable at snapshot eval
time and have loadExtScript reinstall it on globalThis for the duration
of the synchronous op_load_ext_script call, then remove it again in a
finally. The whole window is synchronous JS so no other code observes
the temporary reinstall. Removes a previously-implicit precondition
that lazy_loaded_js entries must be consumed at snapshot time.
When a lazy_loaded_js entry is consumed at snapshot time it's compiled
via the snapshot's extension_transpiler, so what ends up in the
snapshot blob is plain JS. When the same entry is residual (not
consumed during snapshot), build.rs previously include_str!'d the raw
file. That works for .js/.mjs entries but fails at runtime for .ts
entries because loadExtScript hands the raw TypeScript directly to
v8::Script::compile, which throws on any TS-only syntax (\`this: any\`
parameter annotations, type imports, etc.).

For each residual lazy_loaded_js file, run deno_runtime::transpile::
maybe_transpile_source (the same function the snapshot path uses),
write the resulting JS to \$OUT_DIR/residual_sources/<sanitized>.js,
and include_str! that. lazy_loaded_esm entries are unchanged: they go
through op_lazy_load_esm which already transpiles via the module
loader at runtime.
deprecate() previously called \`process ??= lazyLoadProcess()\` in its
body, so wrapping a function loaded node:process eagerly. assert.ts's
body calls deprecate(CallTracker, ...) at module scope, which means
loading assert.ts forces node:process to evaluate. process.ts's body
in turn imports node:path, whose body loadExtScripts path/_win32.ts,
which loadExtScripts assert.ts. With anything else also triggering an
assert.ts load (e.g. once we lazify upstream entry points), this
becomes a snapshot-time circular dependency.

Move the lazyLoadProcess() call inside the returned \`deprecated\`
wrapper so node:process only loads when the deprecated function is
actually invoked. The noDeprecation fast-path now runs per-invocation
instead of being baked into the wrapper choice at deprecate() time;
the cost is negligible and we keep deprecate() side-effect-free.
01_require.js eagerly imports node:_http_agent/_common/_outgoing/_server
and eagerly loadExtScripts http.ts/http2.ts/https.ts at module body
time, so the entire node:_http_* + node:net + node:stream subtree gets
materialized into the snapshot whether or not the program actually
uses HTTP.

Install the seven entries as one-shot lazy getters on
nativeModuleExports via createLazyLoader (for the ESM _http_*
modules) and loadExtScript thunks (for http/http2/https). The getter
fires on first access and replaces itself with a data property, so
subsequent require()s are zero-overhead. Presence-check sites that
previously did \`nativeModuleExports[id]\` now use \`in\` to avoid
forcing the load.

Snapshot shrinks by ~10 KB; no public API change.
@nathanwhit nathanwhit force-pushed the perf/lazy-load-snapshot branch from 12dfc97 to 152e138 Compare May 14, 2026 16:01
… but keep node:stream eager

Previous commits in this stack tried to make \`node:stream\` itself lazy.
That was wrong for startup time: every Deno program ends up loading
node:stream at runtime startup, because
\`__bootstrapNodeProcess(warmup=false)\` calls
\`createWritableStdioStream(io.stdout, "stdout")\` ->
\`new (lazyStream().Writable)({...})\` for \`process.stdout\` and
\`process.stderr\` regardless of whether the program ever uses streams.
Lazy-loading a module that everyone loads is a net startup-time loss
(roughly an 11% regression observed by the user) — parse+compile at
startup is slower than v8 snapshot deserialization.

So node:stream stays in \`esm\`, and node:stream/promises with it.
What does end up lazy is the surrounding chain that doesn't load at
startup unless the user touches it:

* \`node:repl\` moves from \`esm\` to \`lazy_loaded_esm\` (no program needs
  repl at startup outside \`deno repl\`).
* \`01_require.js\` switches \`tls\`/\`net\`/\`repl\`/\`fs/promises\`/
  \`_tls_common\`/\`_tls_wrap\`/\`internal/repl\`/\`internal/crypto/cipher\`
  into \`lazyNodeModules\`.
* \`02_init.js\` only calls \`__setupChildProcessIpcChannel\` when
  \`op_node_child_ipc_pipe()\` reports a parent pipe; otherwise
  \`child_process.ts\` (and its node:stream-extending classes) never
  evaluate at runtime.
* \`runtime/js/99_main.js\` drops \`nodeBootstrap({warmup: true})\`. The
  warmup branch only built placeholder stdin/stdout/stderr streams
  that the non-warmup branch then unconditionally overwrites, so its
  only observable effect was pulling node:stream + node:net into the
  snapshot at build time.
* \`ext/node/polyfills/fs.ts\`: \`Utf8Stream\` becomes a getter on the
  return object so loading fs.ts at snapshot eval doesn't immediately
  pull \`internal/streams/fast-utf8-stream.js\` (which statically imports
  node:fs and forces the fs_esm.ts namespace to materialize, which in
  turn fires all the lazy stream getters off the fs.ts return object).
* \`ext/node/polyfills/_process/streams.mjs\`: \`initStdin\` calls
  \`lazyTty()\` before \`new readStream(fd)\` in the TTY case. With
  \`node:tty\` lazy, we have to force its body to evaluate
  (\`setReadStream\` is the side effect) before bootstrap uses the
  constructor.

Instrumentation added in this commit (companion to the existing
\`DENO_SNAPSHOT_IMPORT_GRAPH\` knob from earlier in the stack):

* New env var \`DENO_LOG_LAZY_LOAD=1\` prints a stderr line each time a
  lazy_loaded_esm entry actually loads (cache miss) or a
  lazy_loaded_js entry actually parses at runtime. Cache hits are
  suppressed. Use to see what's parsed at startup vs on-demand.
* \`lazy_load_esm_module\` distinguishes the cache-hit path
  (\`record_lazy_esm_cached\`, graph-only) from the actual-load path
  (\`record_lazy_esm\`, graph + stderr).

Verified with \`DENO_LOG_LAZY_LOAD=1\`:
* \`deno eval 'console.log(1)'\`             -> 0 lazy loads at startup
* \`deno run hello.js\` (no imports)         -> 0 lazy loads
* \`deno run\` + \`import "node:crypto"\`      -> 3 lazy loads (paid by users of crypto)
* \`deno run\` + \`import "node:http"\`        -> 9 lazy loads (paid by users of http)

Snapshot blob shrinks from 11,438,579 -> 9,964,016 bytes
(-1.41 MB / -12.9%) across the full commit stack.

Smoke-tested: hello.js, deno eval, node:stream Readable piping,
node:fs readFileSync, node:fs/promises readFile, node:zlib gzipSync,
node:crypto hash, node:tls, node:net, HTTP server + fetch,
process.stdout.write, process.stdin.isTTY.
…oad_esm_module

\`lazy_load_esm_module\` previously held \`self.data.borrow()\` across the
\`module.evaluate(scope)\` call in the cache-hit path. When the cached
module had been instantiated but not yet evaluated, that evaluate would
trigger V8 to recursively compile dependent modules, which calls back
into \`new_module_from_js_source\` and tries to \`self.data.borrow_mut()\`
at line 959 -- panic with "RefCell already borrowed".

Pre-existing bug, but easier to hit now that more node-compat modules
go through the lazy ESM path at runtime. Repro: \`deno run -A npm:rolldown\`
on the lazified stack.

Fix: collect the cached handle inside a scoped borrow, drop the borrow,
then evaluate. Functionally identical to the old path otherwise.
Lazify all globals in 98_global_scope_shared.js that pull the
web-streams polyfill (06_streams.js, 208 KB source):

- ReadableStream / WritableStream / TransformStream and their inner
  controllers / readers (13 stream classes)
- CompressionStream / DecompressionStream
- Request / Response / fetch / EventSource (chain via 22_body)
- Cache / CacheStorage / caches

Each global is converted from `core.propNonEnumerable(streams.X)` to
`core.propNonEnumerableLazyLoaded(s => s.X, lazyStreams)` so the
underlying ext file isn't loaded until first access. `fetch` keeps a
data descriptor whose value is a wrapper function (so node:test's
`mock.method` can still mock it via descriptor.value).

Side fixes:

- runtime/js/99_main.js: stop spreading denoNs with `{...denoNs}` -
  spread invokes every getter, defeating lazy descriptors. Use
  ObjectDefineProperties + getOwnPropertyDescriptors. Same for the
  unstable feature merge loop.
- 99_main.js: wrap the wasm-streaming callback and defer
  registerDeclarativeServer load to the addMainModuleHandler callback.
- ext/web/13_message_port.js: drop top-level streams import; move
  markNotSerializable registration into 06_streams.js itself
  (inverts the dep so message_port no longer drags streams).
- ext/node/polyfills/01_require.js: lazify internal/child_process
  (40_process -> 22_body chain) and stream/web (14_compression chain).
- ext/node/polyfills/internal/streams/fast-utf8-stream.js: replace
  static `import * as fs from 'node:fs'` with a lazy loader, since
  this module is loaded via the fs.Utf8Stream getter while node:fs
  is mid-evaluation; a static import re-enters node:fs and TDZ-traps
  on `lazyUtf8Stream().default`.
- ext/node/polyfills/internal/fs/{handle,promises}.ts: defer the
  top-level `promisify(lazyFs().X)` calls to first call. Same
  cycle: node:fs's `export const promises = mod.promises` line
  re-triggers `get promises` while `lazyInternalPromises().default`
  is in TDZ.

Snapshot: 9,980,849 -> 7,331,556 bytes (-2.65 MB, -26.5%). Verified
zero startup lazy-loads in both TTY and pipe modes.
@nathanwhit nathanwhit force-pushed the perf/lazy-load-snapshot branch from 49c79d6 to 4407118 Compare May 14, 2026 17:33
@fibibot
Copy link
Copy Markdown
Contributor

fibibot commented May 14, 2026

Failures look real across all platforms — the lazy-load polyfill change appears to alter snapshot-induced stack frames. specs::run::wasm_streaming_panic_test now emits at Object.handleWasmStreaming plus an extra 99_main.js:472 frame; unit::{globals,http,serve}_test and node_compat::parallel::test-inspector-* also fail. Not a flake.

Copy link
Copy Markdown
Contributor

@fibibot fibibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is red across 30 jobs (all 6 platforms × test unit / test specs / test libs / test node_compat / deno_core / wpt). Failures are caused by this PR — the function wrappers added to defer streams/fetch/serve initialization change stack-frame shape, breaking tests that assert on stack traces.

Concrete example: tests/specs/run/wasm_streaming_panic_test/wasm_streaming_panic_test.js.out expects:

at handleWasmStreaming (ext:deno_fetch/26_fetch.js:[WILDCARD])

After this PR the frame becomes at Object.handleWasmStreaming plus an extra 99_main.js:472 frame from runtime/js/99_main.js where the wasm-streaming callback is now wrapped. Same shape-change is the likely cause of unit::{globals,http,serve}_test and node_compat::parallel::test-inspector-* failures.

Two options: (1) make the wrappers preserve function name + avoid adding a frame (e.g. Object.defineProperty(..., \"name\", ...) + tail-call the real handler so V8 elides the wrapper frame), or (2) update the affected test expectations to match the new stack shape.

# Conflicts:
#	ext/node/polyfills/01_require.js
#	libs/core/modules/map.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants