feat: introduce parallelization worker for chunked generation processing #507

ovflowd · 2025-12-04T15:18:30Z

This PR introduces a new implementation for parallelization of input processing per worker:

Updated the WorkerPool to accept specific child worker entry points
Introduced a new child worker for processing chunked/partitions of input for a given generator
Updated the generator interface to support chunk processing (optional)
Updated generator entrypoint for handling parallelization
Updated numerous generators to support chunked processing

This PR is still missing unit tests as I believe there will be a few rounds of feedback before we reach a final version.

vercel · 2025-12-04T15:18:35Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Updated (UTC)
api-docs-tooling	Ready	Preview	Dec 5, 2025 1:27pm

codecov · 2025-12-04T15:19:25Z

Codecov Report

❌ Patch coverage is 68.79756% with 205 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.71%. Comparing base (260cc1d) to head (b933b0a).
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/generators/legacy-html/index.mjs	27.94%	49 Missing ⚠️
src/threading/parallel.mjs	61.81%	41 Missing and 1 partial ⚠️
src/generators/jsx-ast/index.mjs	35.93%	41 Missing ⚠️
src/generators.mjs	3.12%	31 Missing ⚠️
src/generators/legacy-json/index.mjs	42.85%	16 Missing ⚠️
src/generators/metadata/index.mjs	54.54%	10 Missing ⚠️
src/generators/web/utils/bundle.mjs	50.00%	9 Missing ⚠️
src/threading/index.mjs	92.10%	3 Missing ⚠️
src/generators/orama-db/index.mjs	0.00%	2 Missing ⚠️
bin/commands/generate.mjs	94.11%	1 Missing ⚠️
... and 1 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #507      +/-   ##
==========================================
+ Coverage   74.35%   75.71%   +1.36%     
==========================================
  Files         112      115       +3     
  Lines       10747    11181     +434     
  Branches      722      756      +34     
==========================================
+ Hits         7991     8466     +475     
+ Misses       2753     2711      -42     
- Partials        3        4       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

This PR introduces a comprehensive parallelization infrastructure for processing documentation generation tasks across multiple worker threads. The implementation enables both generator-level and chunk-level parallelization to improve performance when processing large documentation sets.

Key changes:

Refactored WorkerPool to support configurable worker scripts and improved queue processing
Introduced ParallelWorker abstraction for chunk-level parallelization within generators
Updated generator interface with optional processChunk method for parallel processing
Migrated multiple generators (metadata, legacy-json, legacy-html, jsx-ast, ast-js) to support chunked processing

Reviewed changes

Copilot reviewed 18 out of 20 changed files in this pull request and generated 16 comments.

Show a summary per file

File	Description
src/threading/index.mjs	Refactored WorkerPool to accept configurable worker scripts and simplified API by moving thread count to constructor
src/threading/parallel.mjs	New module providing ParallelWorker abstraction for distributing work across chunk workers with optimal chunking strategy
src/threading/generator-worker.mjs	New worker script for executing full generators in worker threads, replacing the old worker.mjs
src/threading/chunk-worker.mjs	New worker script for processing individual chunks via generator's processChunk method
src/threading/worker.mjs	Removed - replaced by generator-worker.mjs and chunk-worker.mjs for more flexible parallelization
src/generators/types.d.ts	Added ParallelWorker interface, chunkSize option, and optional processChunk method to generator metadata
src/utils/generators.mjs	Extracted reusable utility functions (getHeadNodes, getSortedHeadNodes, buildDocPages) from individual generators
src/generators/metadata/index.mjs	Implemented processChunk for parallel processing of API doc files
src/generators/legacy-json/index.mjs	Refactored to support chunk-based parallel processing with worker threads
src/generators/legacy-html/index.mjs	Significant refactor introducing processNode helper, template utilities, and chunk-based parallelization
src/generators/legacy-html/utils/template.mjs	New utility module for template caching and replacement logic extraction
src/generators/jsx-ast/index.mjs	Refactored to support chunk processing and moved shared utilities to common module
src/generators/ast-js/index.mjs	Added chunk processing support for parallel JavaScript file parsing
src/generators/web/index.mjs	Updated with better comments explaining why it doesn't implement processChunk (requires all entries for bundling)
src/generators/web/utils/processing.mjs	Fixed JSDoc parameter documentation for options parameter
src/generators/web/utils/bundle.mjs	Added jsx-runtime aliases and node_modules resolution configuration for external execution
src/generators/legacy-html-all/index.mjs	Updated to use getRemarkRehype instead of getRemarkRehypeWithShiki
src/generators/orama-db/index.mjs	Minor formatting improvements (blank lines)
src/generators.mjs	Updated to create separate worker pools for generators and chunks, with ParallelWorker instances
bin/commands/generate.mjs	Added chunkSize parameter and improved thread count calculation for optimal performance

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/threading/index.mjs

src/generators/ast-js/index.mjs

src/generators/legacy-html/index.mjs

src/threading/parallel.mjs

src/generators.mjs

src/threading/parallel.mjs

src/generators/ast-js/index.mjs

src/generators/web/utils/bundle.mjs

src/threading/chunk-worker.mjs

src/generators/metadata/index.mjs

src/generators.mjs

avivkeller

(Blocking) This PR contains a lot of unrelated changes, which makes it hard to identify what's part of the chunking and what's not. While I'm admittedly a victim of making this same mistake my PRs, it would be nice to isolate this to parallelization.

(Non-blocking, at the moment) Additionally, I feel like this creates a lot of workers¹, and in the current format (if I understand correctly), every worker receives the entire inputs' AST structure and options. That's a lot of serialization[^2].

It might be more memory efficient to use a batch promise for the chunks (i.e. p-limit), which keeps them on the same worker, and avoids the serialization.

According to the current configuration, on my machine, up to 100 workers ↩

ovflowd · 2025-12-04T23:09:01Z

(Blocking) This PR contains a lot of unrelated changes, which makes it hard to identify what's part of the chunking and what's not. While I'm admittedly a victim of making this same mistake my PRs, it would be nice to isolate this to parallelization.

Could you point out what these unrelated changes are? All changes here are purely related to the parallelization.

ovflowd · 2025-12-04T23:10:29Z

It might be more memory efficient to use a batch promise for the chunks (i.e. p-limit), which keeps them on the same worker, and avoids the serialization.

That's exactly what this PR is doing, it has batch sizes. Also the parallelization will never run more workers than the allowed amount of threads. I have no idea where you got 100 "workers"

avivkeller · 2025-12-04T23:18:35Z

Could you point out what these unrelated changes are? All changes here are purely related to the parallelization.

That's exactly what this PR is doing, it has batch sizes. Also the parallelization will never run more workers than the allowed amount of threads. I have no idea where you got 100 "workers"

Ahh, I got the chunk size with the worker thread confused (thus I did 5x20 = 100)

src/threading/parallel.mjs

src/generators/web/index.mjs

src/threading/parallel.mjs

src/generators.mjs

ovflowd · 2025-12-04T23:31:29Z

#507 (files) contains a lot of refactors (i.e. adding the Promise.all)

#507 (files)

#507 (files)

#507 (files)

#507 (files)

#507 (files)

And similar

Appreciate you pointing these out, I acknowledge that, the issues of asking AI to simplify the changes, it touches unrelated things. Let me work on that.

ovflowd · 2025-12-04T23:52:24Z

@avivkeller re-review? 🙇

avivkeller

There's still a fair amount of unrelated changes, but regarding the related changes: looks good with a few non-blocking notes

Almost there!

.nvmrc

package.json

src/generators.mjs

src/generators/metadata/index.mjs

src/generators/web/utils/bundle.mjs

src/threading/index.mjs

src/threading/parallel.mjs

avivkeller · 2025-12-05T11:40:58Z

#507 (files) contains a lot of refactors (i.e. adding the Promise.all)

#507 (files)

#507 (files)

#507 (files)

#507 (files)

#507 (files)

And similar

Appreciate you pointing these out, I acknowledge that, the issues of asking AI to simplify the changes, it touches unrelated things. Let me work on that.

If and when I use AI, I usually tell it that if a simplification change can stand on its own, it's likely unrelated, and the model should reconsider whether or not it's needed.

ovflowd · 2025-12-05T11:42:29Z

#507 (files) contains a lot of refactors (i.e. adding the Promise.all)

#507 (files)

#507 (files)

#507 (files)

#507 (files)

#507 (files)

And similar

Appreciate you pointing these out, I acknowledge that, the issues of asking AI to simplify the changes, it touches unrelated things. Let me work on that.

If and when I use AI, I usually tell it that if a simplification change can stand on its own, it's likely unrelated, and the model should reconsider whether or not it's needed.

Yeah, on my work MacBook I have system prompts and other settings for that, this work was done on my personal Desktop, gotta configure ol' VS Code to not do unrelated things.

src/generators/web/utils/bundle.mjs

feat: introduce parallelization worker for chunked generation processing

4932c4e

Copilot AI review requested due to automatic review settings December 4, 2025 15:18

ovflowd requested a review from a team as a code owner December 4, 2025 15:18

Copilot started reviewing on behalf of ovflowd December 4, 2025 15:18 View session

Copilot finished reviewing on behalf of ovflowd December 4, 2025 15:21

Copilot AI reviewed Dec 4, 2025

View reviewed changes

avivkeller reviewed Dec 4, 2025

View reviewed changes

src/generators.mjs Outdated Show resolved Hide resolved

avivkeller requested changes Dec 4, 2025

View reviewed changes

avivkeller reviewed Dec 4, 2025

View reviewed changes

src/threading/parallel.mjs Outdated Show resolved Hide resolved

avivkeller reviewed Dec 4, 2025

View reviewed changes

chore: self review and code review

0261be7

ovflowd requested a review from avivkeller December 4, 2025 23:52

vercel bot deployed to Preview December 4, 2025 23:52 View deployment

chore: more optimizations

f34eb76

vercel bot deployed to Preview December 5, 2025 00:52 View deployment

chore: more self review

a05fb38

vercel bot deployed to Preview December 5, 2025 01:06 View deployment

chore: added oxc instead of acorn for faster parsing of javascript

cdbf816

vercel bot had a problem deploying to Preview December 5, 2025 01:18 Failure

chore: fix version

4edb564

vercel bot deployed to Preview December 5, 2025 01:23 View deployment

chore: fix linting

ad3accb

vercel bot deployed to Preview December 5, 2025 01:26 View deployment

avivkeller reviewed Dec 5, 2025

View reviewed changes

avivkeller approved these changes Dec 5, 2025

View reviewed changes

chore: code review

e069593

avivkeller reviewed Dec 5, 2025

View reviewed changes

src/generators/web/utils/bundle.mjs Outdated Show resolved Hide resolved

vercel bot deployed to Preview December 5, 2025 12:09 View deployment

chore: fix tests and back to acorn

b933b0a

vercel bot deployed to Preview December 5, 2025 13:27 View deployment

ovflowd enabled auto-merge (squash) December 5, 2025 13:28

ovflowd merged commit 26d5760 into main Dec 5, 2025
19 checks passed

ovflowd deleted the feat/parallelization-worker branch December 5, 2025 13:28

feat: introduce parallelization worker for chunked generation processing #507

feat: introduce parallelization worker for chunked generation processing #507

Uh oh!

Conversation

ovflowd commented Dec 4, 2025

Uh oh!

vercel bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

avivkeller left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Footnotes

Uh oh!

ovflowd commented Dec 4, 2025

Uh oh!

ovflowd commented Dec 4, 2025

Uh oh!

avivkeller commented Dec 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ovflowd commented Dec 4, 2025

Uh oh!

ovflowd commented Dec 4, 2025

Uh oh!

avivkeller left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

avivkeller commented Dec 5, 2025

Uh oh!

ovflowd commented Dec 5, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel bot commented Dec 4, 2025 •

edited

Loading

codecov bot commented Dec 4, 2025 •

edited

Loading

avivkeller left a comment •

edited

Loading

avivkeller left a comment •

edited

Loading