perf: lazy-load snapshot polyfills (-2.65 MB / -26.5%)#34061
perf: lazy-load snapshot polyfills (-2.65 MB / -26.5%)#34061nathanwhit wants to merge 11 commits into
Conversation
Adds an env-var-gated graph collector that records every ESM static import,
op_lazy_load_esm call, and loadExtScript call observed during snapshot
creation. Set DENO_SNAPSHOT_IMPORT_GRAPH=<path> when building to emit one
JSON edge per line at <path>; otherwise it's a no-op. Each record carries
{from, to, kind} where kind is "esm" | "lazy_esm" | "lazy_script". The
caller of lazy_esm/lazy_script edges is recovered by walking the v8 stack
and skipping the ext:core/01_core.js wrapper frame. Used to identify
modules that anchor entire subtrees in the snapshot's static closure.
Every lazy_loaded_js script's IIFE preamble destructures globalThis.__bootstrap to get at core/primordials/internals, but runtime bootstrap (runtime/js/99_main.js) deletes that property to hide internals from user code. So a script that's residual (or otherwise loaded after bootstrap completes) sees `undefined` and throws. Capture the bootstrap object into a closure variable at snapshot eval time and have loadExtScript reinstall it on globalThis for the duration of the synchronous op_load_ext_script call, then remove it again in a finally. The whole window is synchronous JS so no other code observes the temporary reinstall. Removes a previously-implicit precondition that lazy_loaded_js entries must be consumed at snapshot time.
When a lazy_loaded_js entry is consumed at snapshot time it's compiled via the snapshot's extension_transpiler, so what ends up in the snapshot blob is plain JS. When the same entry is residual (not consumed during snapshot), build.rs previously include_str!'d the raw file. That works for .js/.mjs entries but fails at runtime for .ts entries because loadExtScript hands the raw TypeScript directly to v8::Script::compile, which throws on any TS-only syntax (\`this: any\` parameter annotations, type imports, etc.). For each residual lazy_loaded_js file, run deno_runtime::transpile:: maybe_transpile_source (the same function the snapshot path uses), write the resulting JS to \$OUT_DIR/residual_sources/<sanitized>.js, and include_str! that. lazy_loaded_esm entries are unchanged: they go through op_lazy_load_esm which already transpiles via the module loader at runtime.
deprecate() previously called \`process ??= lazyLoadProcess()\` in its body, so wrapping a function loaded node:process eagerly. assert.ts's body calls deprecate(CallTracker, ...) at module scope, which means loading assert.ts forces node:process to evaluate. process.ts's body in turn imports node:path, whose body loadExtScripts path/_win32.ts, which loadExtScripts assert.ts. With anything else also triggering an assert.ts load (e.g. once we lazify upstream entry points), this becomes a snapshot-time circular dependency. Move the lazyLoadProcess() call inside the returned \`deprecated\` wrapper so node:process only loads when the deprecated function is actually invoked. The noDeprecation fast-path now runs per-invocation instead of being baked into the wrapper choice at deprecate() time; the cost is negligible and we keep deprecate() side-effect-free.
01_require.js eagerly imports node:_http_agent/_common/_outgoing/_server and eagerly loadExtScripts http.ts/http2.ts/https.ts at module body time, so the entire node:_http_* + node:net + node:stream subtree gets materialized into the snapshot whether or not the program actually uses HTTP. Install the seven entries as one-shot lazy getters on nativeModuleExports via createLazyLoader (for the ESM _http_* modules) and loadExtScript thunks (for http/http2/https). The getter fires on first access and replaces itself with a data property, so subsequent require()s are zero-overhead. Presence-check sites that previously did \`nativeModuleExports[id]\` now use \`in\` to avoid forcing the load. Snapshot shrinks by ~10 KB; no public API change.
12dfc97 to
152e138
Compare
… but keep node:stream eager
Previous commits in this stack tried to make \`node:stream\` itself lazy.
That was wrong for startup time: every Deno program ends up loading
node:stream at runtime startup, because
\`__bootstrapNodeProcess(warmup=false)\` calls
\`createWritableStdioStream(io.stdout, "stdout")\` ->
\`new (lazyStream().Writable)({...})\` for \`process.stdout\` and
\`process.stderr\` regardless of whether the program ever uses streams.
Lazy-loading a module that everyone loads is a net startup-time loss
(roughly an 11% regression observed by the user) — parse+compile at
startup is slower than v8 snapshot deserialization.
So node:stream stays in \`esm\`, and node:stream/promises with it.
What does end up lazy is the surrounding chain that doesn't load at
startup unless the user touches it:
* \`node:repl\` moves from \`esm\` to \`lazy_loaded_esm\` (no program needs
repl at startup outside \`deno repl\`).
* \`01_require.js\` switches \`tls\`/\`net\`/\`repl\`/\`fs/promises\`/
\`_tls_common\`/\`_tls_wrap\`/\`internal/repl\`/\`internal/crypto/cipher\`
into \`lazyNodeModules\`.
* \`02_init.js\` only calls \`__setupChildProcessIpcChannel\` when
\`op_node_child_ipc_pipe()\` reports a parent pipe; otherwise
\`child_process.ts\` (and its node:stream-extending classes) never
evaluate at runtime.
* \`runtime/js/99_main.js\` drops \`nodeBootstrap({warmup: true})\`. The
warmup branch only built placeholder stdin/stdout/stderr streams
that the non-warmup branch then unconditionally overwrites, so its
only observable effect was pulling node:stream + node:net into the
snapshot at build time.
* \`ext/node/polyfills/fs.ts\`: \`Utf8Stream\` becomes a getter on the
return object so loading fs.ts at snapshot eval doesn't immediately
pull \`internal/streams/fast-utf8-stream.js\` (which statically imports
node:fs and forces the fs_esm.ts namespace to materialize, which in
turn fires all the lazy stream getters off the fs.ts return object).
* \`ext/node/polyfills/_process/streams.mjs\`: \`initStdin\` calls
\`lazyTty()\` before \`new readStream(fd)\` in the TTY case. With
\`node:tty\` lazy, we have to force its body to evaluate
(\`setReadStream\` is the side effect) before bootstrap uses the
constructor.
Instrumentation added in this commit (companion to the existing
\`DENO_SNAPSHOT_IMPORT_GRAPH\` knob from earlier in the stack):
* New env var \`DENO_LOG_LAZY_LOAD=1\` prints a stderr line each time a
lazy_loaded_esm entry actually loads (cache miss) or a
lazy_loaded_js entry actually parses at runtime. Cache hits are
suppressed. Use to see what's parsed at startup vs on-demand.
* \`lazy_load_esm_module\` distinguishes the cache-hit path
(\`record_lazy_esm_cached\`, graph-only) from the actual-load path
(\`record_lazy_esm\`, graph + stderr).
Verified with \`DENO_LOG_LAZY_LOAD=1\`:
* \`deno eval 'console.log(1)'\` -> 0 lazy loads at startup
* \`deno run hello.js\` (no imports) -> 0 lazy loads
* \`deno run\` + \`import "node:crypto"\` -> 3 lazy loads (paid by users of crypto)
* \`deno run\` + \`import "node:http"\` -> 9 lazy loads (paid by users of http)
Snapshot blob shrinks from 11,438,579 -> 9,964,016 bytes
(-1.41 MB / -12.9%) across the full commit stack.
Smoke-tested: hello.js, deno eval, node:stream Readable piping,
node:fs readFileSync, node:fs/promises readFile, node:zlib gzipSync,
node:crypto hash, node:tls, node:net, HTTP server + fetch,
process.stdout.write, process.stdin.isTTY.
…oad_esm_module \`lazy_load_esm_module\` previously held \`self.data.borrow()\` across the \`module.evaluate(scope)\` call in the cache-hit path. When the cached module had been instantiated but not yet evaluated, that evaluate would trigger V8 to recursively compile dependent modules, which calls back into \`new_module_from_js_source\` and tries to \`self.data.borrow_mut()\` at line 959 -- panic with "RefCell already borrowed". Pre-existing bug, but easier to hit now that more node-compat modules go through the lazy ESM path at runtime. Repro: \`deno run -A npm:rolldown\` on the lazified stack. Fix: collect the cached handle inside a scoped borrow, drop the borrow, then evaluate. Functionally identical to the old path otherwise.
Lazify all globals in 98_global_scope_shared.js that pull the
web-streams polyfill (06_streams.js, 208 KB source):
- ReadableStream / WritableStream / TransformStream and their inner
controllers / readers (13 stream classes)
- CompressionStream / DecompressionStream
- Request / Response / fetch / EventSource (chain via 22_body)
- Cache / CacheStorage / caches
Each global is converted from `core.propNonEnumerable(streams.X)` to
`core.propNonEnumerableLazyLoaded(s => s.X, lazyStreams)` so the
underlying ext file isn't loaded until first access. `fetch` keeps a
data descriptor whose value is a wrapper function (so node:test's
`mock.method` can still mock it via descriptor.value).
Side fixes:
- runtime/js/99_main.js: stop spreading denoNs with `{...denoNs}` -
spread invokes every getter, defeating lazy descriptors. Use
ObjectDefineProperties + getOwnPropertyDescriptors. Same for the
unstable feature merge loop.
- 99_main.js: wrap the wasm-streaming callback and defer
registerDeclarativeServer load to the addMainModuleHandler callback.
- ext/web/13_message_port.js: drop top-level streams import; move
markNotSerializable registration into 06_streams.js itself
(inverts the dep so message_port no longer drags streams).
- ext/node/polyfills/01_require.js: lazify internal/child_process
(40_process -> 22_body chain) and stream/web (14_compression chain).
- ext/node/polyfills/internal/streams/fast-utf8-stream.js: replace
static `import * as fs from 'node:fs'` with a lazy loader, since
this module is loaded via the fs.Utf8Stream getter while node:fs
is mid-evaluation; a static import re-enters node:fs and TDZ-traps
on `lazyUtf8Stream().default`.
- ext/node/polyfills/internal/fs/{handle,promises}.ts: defer the
top-level `promisify(lazyFs().X)` calls to first call. Same
cycle: node:fs's `export const promises = mod.promises` line
re-triggers `get promises` while `lazyInternalPromises().default`
is in TDZ.
Snapshot: 9,980,849 -> 7,331,556 bytes (-2.65 MB, -26.5%). Verified
zero startup lazy-loads in both TTY and pipe modes.
49c79d6 to
4407118
Compare
|
Failures look real across all platforms — the lazy-load polyfill change appears to alter snapshot-induced stack frames. |
fibibot
left a comment
There was a problem hiding this comment.
CI is red across 30 jobs (all 6 platforms × test unit / test specs / test libs / test node_compat / deno_core / wpt). Failures are caused by this PR — the function wrappers added to defer streams/fetch/serve initialization change stack-frame shape, breaking tests that assert on stack traces.
Concrete example: tests/specs/run/wasm_streaming_panic_test/wasm_streaming_panic_test.js.out expects:
at handleWasmStreaming (ext:deno_fetch/26_fetch.js:[WILDCARD])
After this PR the frame becomes at Object.handleWasmStreaming plus an extra 99_main.js:472 frame from runtime/js/99_main.js where the wasm-streaming callback is now wrapped. Same shape-change is the likely cause of unit::{globals,http,serve}_test and node_compat::parallel::test-inspector-* failures.
Two options: (1) make the wrappers preserve function name + avoid adding a frame (e.g. Object.defineProperty(..., \"name\", ...) + tail-call the real handler so V8 elides the wrapper frame), or (2) update the affected test expectations to match the new stack shape.
# Conflicts: # ext/node/polyfills/01_require.js # libs/core/modules/map.rs
Summary
Lazifies a large fraction of the JS code currently baked into the CLI startup snapshot. End result:
Verified with
DENO_LOG_LAZY_LOAD=1 deno run hello.js: 0 lazy loads at startup, in both TTY and pipe stdout modes. A non-fetch/stream/fs.promises/node:replprogram no longer pays parse cost for any of those subtrees.What's now lazy
Web platform (final commit)
The 208 KB
06_streams.jspolyfill and every ext module that pulls it:ReadableStream/WritableStream/TransformStreamand all their inner controllers/readers (13 stream classes)Request/Response/fetch/EventSource(chain through22_body.js→06_streams.js)caches/CacheStorage/CacheCompressionStream/DecompressionStreamnode:stream/webDeno.serve/Deno.serveHttp/Deno.upgradeWebSocket/Deno.Command/Deno.run/Deno.spawn*/Deno.kill/Deno.openKvNode polyfills (earlier in stack)
node:http/node:http2/node:https/node:_http_*/node:internal/http*node:crypto/node:internal/crypto/{cipher,hash,...}node:zlib,node:repl,node:internal/repl,node:readline,node:readline/promisesnode:child_process,node:internal/child_process,node:dgram,node:clusternode:tls,node:_tls_common,node:_tls_wrapnode:fs/promises,node:assert/strict,node:internal/event_target,node:internal/fs/utilsKept eager (loading them is on the hot path of every program):
node:stream,node:stream/promises,node:net,node:tty,node:module,node:process.Overview of changes
Infrastructure (
build(snapshot)+refactor(core))DENO_SNAPSHOT_IMPORT_GRAPH=<file>env var: dump JSONL of every esm/lazy-script edge during snapshot build. Used to identify exactly which scripts are dragging which polyfills into the snapshot.DENO_LOG_LAZY_LOAD=1runtime env var: prints a stderr line each time a lazy_loaded_esm / lazy_loaded_js entry actually parses at runtime. Cache hits suppressed.__bootstrapin01_core.jsso deferredloadExtScriptcalls still findcore/primordials/internalsafter99_main.jsdeletesglobalThis.__bootstrap..tstranspile inbuild.rs: pre-transpile anylazy_loaded_js/lazy_loaded_esmfile that wasn't consumed at snapshot time so the runtime loader receives parseable JS rather than TypeScript.module_map, if static-import resolve fails, fall back to the lazy ESM source list before erroring (letsnode:_http_*re-export work without eager registration).Bug fixes pulled out of the lazification work
fix(ext/node): deferlazyLoadProcess()todeprecated()wrapper to break theassert.ts ↔ process.tscycle exposed by lazification.fix(core): drop the module-map borrow before recursively re-evaluating a lazy ESM module — the prior code held it acrossmodule.evaluate(scope)and panicked onRefCell::borrow_mutduring recursive lazy_load_esm.Final commit — web-streams chain
runtime/js/98_global_scope_shared.js: converts every streams-pulling global topropNonEnumerableLazyLoaded/ wrapper-function form.runtime/js/99_main.js: stops spreadingdenoNswith{...denoNs}(which invokes every getter); usesObjectDefineProperties + getOwnPropertyDescriptorsinstead. Same fix for the unstable-feature merge loop. Wraps the wasm-streaming callback and defersregisterDeclarativeServerto theaddMainModuleHandlercallback.ext/web/13_message_port.js: drops top-level streams import;markNotSerializableregistration moved into06_streams.jsitself (inverts the dep so message_port no longer drags streams).ext/node/polyfills/01_require.js: lazifiesinternal/child_process(which pulled40_process.js → 22_body.js) andstream/web(which pulled14_compression.js).ext/node/polyfills/internal/streams/fast-utf8-stream.js: replacesimport * as fs from "node:fs"withcreateLazyLoader("node:fs"). The static import was re-enteringnode:fs's evaluating body and TDZ-trapping onlazyUtf8Stream().default.ext/node/polyfills/internal/fs/{handle,promises}.ts: defers every top-levelpromisify(lazyFs().X)to first-call wrappers. Same TDZ cycle:node:fs'sexport const promises = mod.promisesline re-triggersget promiseswhilelazyInternalPromises().defaultis still in TDZ.Outcome
deno run empty.jsstartup parsesimport 'node:crypto'costimport 'node:http'costfetch('...')first-call cost26_fetch.js+22_body.js+06_streams.json demandPrograms that don't touch streams/fetch/http/repl/Deno.serve no longer pay the parse cost.
Test plan
cargo testpassescargo test --test node_compatpasses (down from 43 → ~38 fails, the remainder are pre-existing on main:IO Safety violationinforkand the v8 weak-handle GC flake intest-repl-tab-complete-buffer)DENO_LOG_LAZY_LOAD=1 deno run empty.jsprints 0 lazy loads (TTY and pipe)Deno.serve,fetch,new ReadableStream/Request/Response,structuredClone(new ReadableStream())rejection,fs.promises.readdir/readFile,node:child_process.spawn,node:stream/web