Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: parallelize page setup #334

Merged
merged 2 commits into from
May 21, 2021
Merged

perf: parallelize page setup #334

merged 2 commits into from
May 21, 2021

Conversation

ascorbic
Copy link
Contributor

@ascorbic ascorbic commented May 20, 2021

This PR refactors the page setup so that each of the funcitons returns an array of jobs, rather than processing the pages at that time. This means we can run them in parallel, which we do with a queue. I don't have a large site to test this with, so it's hard to see what the effects are: the test sites are all under half a second already.

There is quite a bit of duplication still, with helpers that are almost identical, so there's scope for more refactoring.

Fixes #152

@ascorbic ascorbic requested a review from lindsaylevine May 20, 2021 12:51
@ascorbic ascorbic added the type: feature code contributing to the implementation of a feature and/or user facing functionality label May 20, 2021
Copy link

@lindsaylevine lindsaylevine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't see any big question marks besides the one map in the promise.all. this is so dope!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! thanks so much for doing this and so quickly too!!!!!! 🤯

@@ -31,7 +31,7 @@ const copyDynamicImportChunks = async (functionPath) => {
// Chunks are copied into the nextPage directory, as a sibling to pages or api.
// This matches the Next output, so that imports work correctly
const copyPath = join(functionPath, 'nextPage', 'chunks')
logTitle('💼 Copying WB5 dynamic import chunks to', copyPath)
// logTitle('💼 Copying WB5 dynamic import chunks to', copyPath)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

y

jobs.push({ type: 'function', filePath, functionsPath })
}),
)
return jobs
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what's happening here - why do we have a map (which returns nothing) inside an awaited Promise.all, only to return jobs outside of it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callback in the map is async (because it has to await isRouteWithFallback) , so it automatically returns a promise which resolves (to void) when it's complete. This means that the map returns an array of promises (one for each page). Passing these to an awaited Promise.all means that it waits until all of the functions are complete. With an async callback like that, the map will immediately return the array of promises, before the callback has executed for each of the elements. It would be somehting like this. Apologies if you know all this:

const result = pages.map(async () => {
await something()
})

// We get here immediately, before the callbacks execute
console.log(result)
// [ <unresolved Promise>, <unresolved Promise>, <unresolved Promise>, ...]

// ...the callbacks execute in parallel, and resolve one by one. we might wait here a while
const done = await Promise.all(result)
// once we're here, the jobs are all complete
console.log(result)

// [ <resolved Promise>, <resolved Promise>, <resolved Promise>, ...]

console.log(done)

// [void, void, void, ...]

return
// 

The reason I'm using the jobs array from outside of the map is so that I can conditionally add elements. If I returned them, I'd end up with an array with undefined elements for the pages we're skipping. We could then filter that array, but that means we'd need to loop through them again. Pushing onto an array outside means we only need to loop through once. I could use reduce for the same effect, but I generally find that's harder to read.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh ok i dont think i processed that the map callback is async. ok wow this is wild!!!

...(await withoutPropsSetup(publishPath)),
]

return runJobsQueue(jobs)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be returned? just curious

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because runJobsQueue returns a Promise, returning it means we can await the return value outside. It's similar to just awaiting runJobsQueue() before returning, but saves us creating another Promise.

const runJobsQueue = (jobs) => {
console.log(`Building ${jobs.length} pages`)

const queue = fastq.promise(processPage, 4)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the significance of 4 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None! I should make it a constant. I need to make a massive site so I can benchmark it properly and find a good value.

Copy link

@lindsaylevine lindsaylevine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!!!!!!!

@ascorbic ascorbic requested a review from lindsaylevine May 21, 2021 08:23
Copy link

@lindsaylevine lindsaylevine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!!!!!!!! x2

@ascorbic ascorbic merged commit 72d34ce into main May 21, 2021
@ascorbic ascorbic deleted the mk/page-function-worker branch May 21, 2021 11:03
serhalp pushed a commit that referenced this pull request Apr 5, 2024
* chore: run Next.js e2e tests again many versions

* chore: use version in task name

* chore: handle old test manifest format

* chore: fix syntax

* chore: cache playwright and allow prerelease for semver check

* chore: only publish results for main branch

* chore: name cache step

* chore: use js for e2e manifest

* chore: better playwright cache handling

* chore: annotate some skipped tests

* chore: don't cache playwright

* chore: use js filter file

* chore: notify Slack for test results

* chore: fix hashfiles target

* chore: update paths

* chore: enable junit for older Next.js

* chore: add retry manifest and fix syntax

* chore: format with prettier

* chore: manage concurrency

* chore: notify Slack

* chore: conditions for slack notify

* chore: split with timings

* chore: fix command

* chore: temporarily just run latest

* chore: fix slack command

* chore: fewer groups for scheduled runs

* chore: use bot

* chore: switch back to webhook

* chore: set total groups

---------

Co-authored-by: ascorbic <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature code contributing to the implementation of a feature and/or user facing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve speed by parallelizing I/O
2 participants