Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use version of htmlrewriter which does not make use of asyncify, which looks to have a potential memory leak under high load #2721

Merged
merged 6 commits into from
Dec 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion deno.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,6 @@
},
"imports": {
"@netlify/edge-functions": "https://edge.netlify.com/v1/index.ts"
}
},
"importMap": "./edge-runtime/vendor/import_map.json"
}
2 changes: 1 addition & 1 deletion edge-runtime/lib/middleware.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import type { Context } from '@netlify/edge-functions'

import { ElementHandlers } from '../vendor/deno.land/x/[email protected]/index.ts'
import type { ElementHandlers } from '../vendor/deno.land/x/[email protected]/src/index.ts'

type NextDataTransform = <T>(data: T) => T

Expand Down
7 changes: 5 additions & 2 deletions edge-runtime/lib/response.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
import type { Context } from '@netlify/edge-functions'
import { HTMLRewriter } from '../vendor/deno.land/x/[email protected]/index.ts'
import {
HTMLRewriter,
type TextChunk,
} from '../vendor/deno.land/x/[email protected]/src/index.ts'

import { updateModifiedHeaders } from './headers.ts'
import type { StructuredLogger } from './logging.ts'
Expand Down Expand Up @@ -79,7 +82,7 @@ export const buildResponse = async ({

if (response.dataTransforms.length > 0) {
rewriter.on('script[id="__NEXT_DATA__"]', {
text(textChunk) {
text(textChunk: TextChunk) {
// Grab all the chunks in the Next data script tag
buffer += textChunk.text
if (textChunk.lastInTextNode) {
Expand Down
2 changes: 1 addition & 1 deletion edge-runtime/vendor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,6 @@ import 'https://deno.land/[email protected]/node/util.ts'
import 'https://deno.land/[email protected]/path/mod.ts'

import 'https://deno.land/x/[email protected]/index.ts'
import 'https://deno.land/x/[email protected]/index.ts'
import 'https://deno.land/x/[email protected]/src/index.ts'

import 'https://v1-7-0--edge-utils.netlify.app/logger/mod.ts'
14 changes: 14 additions & 0 deletions src/build/functions/edge.ts
Original file line number Diff line number Diff line change
Expand Up @@ -85,13 +85,27 @@ const writeHandlerFile = async (ctx: PluginContext, { matchers, name }: NextDefi
JSON.stringify(minimalNextConfig),
)

const htmlRewriterWasm = await readFile(
join(
ctx.pluginDir,
'edge-runtime/vendor/deno.land/x/[email protected]/pkg/htmlrewriter_bg.wasm',
),
)

// Writing the function entry file. It wraps the middleware code with the
// compatibility layer mentioned above.
await writeFile(
join(handlerDirectory, `${handlerName}.js`),
`
import { decode as _base64Decode } from './edge-runtime/vendor/deno.land/[email protected]/encoding/base64.ts';
import { init as htmlRewriterInit } from './edge-runtime/vendor/deno.land/x/[email protected]/src/index.ts'
import {handleMiddleware} from './edge-runtime/middleware.ts';
import handler from './server/${name}.js';

await htmlRewriterInit({ module_or_path: _base64Decode(${JSON.stringify(
htmlRewriterWasm.toString('base64'),
)}).buffer });
Comment on lines +105 to +107
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use a uint8array instead of a base64 string here, it would save on memory usage by 33%

I did this in a similar package I made specifically for CSP support - I could do the same for the htmlrewriter package if we want:

await Deno.writeTextFile(
    "./pkg/embedded-wasm.ts",
    `export const wasmBinary = Uint8Array.from(${
      JSON.stringify(
        Array.from(
          await Deno.readFile(
            `./pkg/${wasmFile}`,
          ),
        ),
      )
    });`,
  );

https://github.com/netlify/csp_nonce_html_transformer/blob/main/scripts/build.ts#L13C3-L25


Comment on lines +100 to +108
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bundling wasm file as-is doesn't seem to work, so instead we inline it using same method we do for wasm modules used by users in middleware

parts.push(`import { decode as _base64Decode } from "${base64ModulePathRelativeToOutputFile}";`)
for (const wasmChunk of wasm ?? []) {
const data = await readFile(join(srcDir, wasmChunk.filePath))
parts.push(
`const ${wasmChunk.name} = _base64Decode(${JSON.stringify(
data.toString('base64'),
)}).buffer`,
)
}

and pass module to init function of htmlrewriter (argumentless init doesn't work after vendoring)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna have to just trust you on this one 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://gist.github.com/pieh/b205a685e518ef62afaab392e61e064f this is standalone script using this method that can be tested.

You can also test that passing crap to init first arg like:

await htmlRewriterInit({
  module_or_path: 'wat',
})

would result in errors:

error: Uncaught (in promise) TypeError: Invalid URL: 'wat'
    module_or_path = fetch(module_or_path);
                     ^
    at getSerialization (ext:deno_url/00_url.js:98:11)
    at new URL (ext:deno_url/00_url.js:405:27)
    at new Request (ext:deno_fetch/23_request.js:329:25)
    at ext:deno_fetch/26_fetch.js:320:27
    at new Promise (<anonymous>)
    at fetch (ext:deno_fetch/26_fetch.js:316:18)
    at __wbg_init (https://deno.land/x/[email protected]/pkg/htmlrewriter.js:1207:22)
    at file:///Users/misiek/dev/pgs-next-runtime/edge-runtime/test.ts:11:7

so the argument has effect.

Commenting out init call also result in errors:

error: Uncaught (in promise) TypeError: Cannot read properties of undefined (reading 'htmlrewriter_new')
    const ret = wasm.htmlrewriter_new(
                     ^
    at new HTMLRewriter (https://deno.land/x/[email protected]/pkg/htmlrewriter.js:781:22)
    at Object.start (https://deno.land/x/[email protected]/src/index.ts:53:20)
    at Module.invokeCallbackFunction (ext:deno_webidl/00_webidl.js:981:16)
    at new TransformStream (ext:deno_web/06_streams.js:6214:16)
    at HTMLRewriter.transform (https://deno.land/x/[email protected]/src/index.ts:48:29)
    at rewriter (file:///Users/misiek/dev/pgs-next-runtime/edge-runtime/test.ts:39:6)
    at file:///Users/misiek/dev/pgs-next-runtime/edge-runtime/test.ts:43:27

export default (req, context) => handleMiddleware(req, context, handler);
`,
)
Expand Down
22 changes: 21 additions & 1 deletion tools/build.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
import { createWriteStream } from 'node:fs'
import { cp, rm } from 'node:fs/promises'
import { resolve, join } from 'node:path'
import { join, resolve } from 'node:path'
import { Readable } from 'stream'
import { finished } from 'stream/promises'

import { build, context } from 'esbuild'
import { execaCommand } from 'execa'
Expand Down Expand Up @@ -94,6 +97,23 @@ async function vendorDeno() {
console.log(`📦 Vendoring Deno modules into '${vendorDest}'...`)

await execaCommand(`deno vendor ${vendorSource} --output=${vendorDest} --force`)

// htmlrewriter contains wasm files and those don't currently work great with vendoring
// see https://github.com/denoland/deno/issues/14123
// to workaround this we copy the wasm files manually
Comment on lines +101 to +103
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😢 I left a comment on that issue, as it seems like it should be reopened since Deno landed WASM support

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deno landed it, but in 2.1 - https://deno.com/blog/v2.1#first-class-wasm-support , so support is not guaranteed :( otherwise we could migrate to imports in https://github.com/netlify/htmlrewriter/blob/edb477af3d08359c8f25edd8d301df69ffcc8e4b/pkg/htmlrewriter.js#L1184 to make use of it

const filesToDownload = ['https://deno.land/x/[email protected]/pkg/htmlrewriter_bg.wasm']
await Promise.all(
filesToDownload.map(async (urlString) => {
const url = new URL(urlString)

const destination = join(vendorDest, url.hostname, url.pathname)

const res = await fetch(url)
if (!res.ok) throw new Error('Failed to fetch .wasm file to vendor', { cause: err })
const fileStream = createWriteStream(destination, { flags: 'wx' })
await finished(Readable.fromWeb(res.body).pipe(fileStream))
}),
)
}

const args = new Set(process.argv.slice(2))
Expand Down
Loading