Skip to content

Conversation

@ovflowd
Copy link
Member

@ovflowd ovflowd commented Dec 4, 2025

This PR introduces a new implementation for parallelization of input processing per worker:

  • Updated the WorkerPool to accept specific child worker entry points
  • Introduced a new child worker for processing chunked/partitions of input for a given generator
  • Updated the generator interface to support chunk processing (optional)
  • Updated generator entrypoint for handling parallelization
  • Updated numerous generators to support chunked processing

This PR is still missing unit tests as I believe there will be a few rounds of feedback before we reach a final version.

Copilot AI review requested due to automatic review settings December 4, 2025 15:18
@ovflowd ovflowd requested a review from a team as a code owner December 4, 2025 15:18
@vercel
Copy link

vercel bot commented Dec 4, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Updated (UTC)
api-docs-tooling Ready Ready Preview Dec 5, 2025 1:27pm

@codecov
Copy link

codecov bot commented Dec 4, 2025

Codecov Report

❌ Patch coverage is 68.79756% with 205 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.71%. Comparing base (260cc1d) to head (b933b0a).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/generators/legacy-html/index.mjs 27.94% 49 Missing ⚠️
src/threading/parallel.mjs 61.81% 41 Missing and 1 partial ⚠️
src/generators/jsx-ast/index.mjs 35.93% 41 Missing ⚠️
src/generators.mjs 3.12% 31 Missing ⚠️
src/generators/legacy-json/index.mjs 42.85% 16 Missing ⚠️
src/generators/metadata/index.mjs 54.54% 10 Missing ⚠️
src/generators/web/utils/bundle.mjs 50.00% 9 Missing ⚠️
src/threading/index.mjs 92.10% 3 Missing ⚠️
src/generators/orama-db/index.mjs 0.00% 2 Missing ⚠️
bin/commands/generate.mjs 94.11% 1 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #507      +/-   ##
==========================================
+ Coverage   74.35%   75.71%   +1.36%     
==========================================
  Files         112      115       +3     
  Lines       10747    11181     +434     
  Branches      722      756      +34     
==========================================
+ Hits         7991     8466     +475     
+ Misses       2753     2711      -42     
- Partials        3        4       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot finished reviewing on behalf of ovflowd December 4, 2025 15:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a comprehensive parallelization infrastructure for processing documentation generation tasks across multiple worker threads. The implementation enables both generator-level and chunk-level parallelization to improve performance when processing large documentation sets.

Key changes:

  • Refactored WorkerPool to support configurable worker scripts and improved queue processing
  • Introduced ParallelWorker abstraction for chunk-level parallelization within generators
  • Updated generator interface with optional processChunk method for parallel processing
  • Migrated multiple generators (metadata, legacy-json, legacy-html, jsx-ast, ast-js) to support chunked processing

Reviewed changes

Copilot reviewed 18 out of 20 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
src/threading/index.mjs Refactored WorkerPool to accept configurable worker scripts and simplified API by moving thread count to constructor
src/threading/parallel.mjs New module providing ParallelWorker abstraction for distributing work across chunk workers with optimal chunking strategy
src/threading/generator-worker.mjs New worker script for executing full generators in worker threads, replacing the old worker.mjs
src/threading/chunk-worker.mjs New worker script for processing individual chunks via generator's processChunk method
src/threading/worker.mjs Removed - replaced by generator-worker.mjs and chunk-worker.mjs for more flexible parallelization
src/generators/types.d.ts Added ParallelWorker interface, chunkSize option, and optional processChunk method to generator metadata
src/utils/generators.mjs Extracted reusable utility functions (getHeadNodes, getSortedHeadNodes, buildDocPages) from individual generators
src/generators/metadata/index.mjs Implemented processChunk for parallel processing of API doc files
src/generators/legacy-json/index.mjs Refactored to support chunk-based parallel processing with worker threads
src/generators/legacy-html/index.mjs Significant refactor introducing processNode helper, template utilities, and chunk-based parallelization
src/generators/legacy-html/utils/template.mjs New utility module for template caching and replacement logic extraction
src/generators/jsx-ast/index.mjs Refactored to support chunk processing and moved shared utilities to common module
src/generators/ast-js/index.mjs Added chunk processing support for parallel JavaScript file parsing
src/generators/web/index.mjs Updated with better comments explaining why it doesn't implement processChunk (requires all entries for bundling)
src/generators/web/utils/processing.mjs Fixed JSDoc parameter documentation for options parameter
src/generators/web/utils/bundle.mjs Added jsx-runtime aliases and node_modules resolution configuration for external execution
src/generators/legacy-html-all/index.mjs Updated to use getRemarkRehype instead of getRemarkRehypeWithShiki
src/generators/orama-db/index.mjs Minor formatting improvements (blank lines)
src/generators.mjs Updated to create separate worker pools for generators and chunks, with ParallelWorker instances
bin/commands/generate.mjs Added chunkSize parameter and improved thread count calculation for optimal performance

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@avivkeller avivkeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Blocking) This PR contains a lot of unrelated changes, which makes it hard to identify what's part of the chunking and what's not. While I'm admittedly a victim of making this same mistake my PRs, it would be nice to isolate this to parallelization.

(Non-blocking, at the moment) Additionally, I feel like this creates a lot of workers1, and in the current format (if I understand correctly), every worker receives the entire inputs' AST structure and options. That's a lot of serialization[^2].

It might be more memory efficient to use a batch promise for the chunks (i.e. p-limit), which keeps them on the same worker, and avoids the serialization.

Footnotes

  1. According to the current configuration, on my machine, up to 100 workers

@ovflowd
Copy link
Member Author

ovflowd commented Dec 4, 2025

(Blocking) This PR contains a lot of unrelated changes, which makes it hard to identify what's part of the chunking and what's not. While I'm admittedly a victim of making this same mistake my PRs, it would be nice to isolate this to parallelization.

Could you point out what these unrelated changes are? All changes here are purely related to the parallelization.

@ovflowd
Copy link
Member Author

ovflowd commented Dec 4, 2025

It might be more memory efficient to use a batch promise for the chunks (i.e. p-limit), which keeps them on the same worker, and avoids the serialization.

That's exactly what this PR is doing, it has batch sizes. Also the parallelization will never run more workers than the allowed amount of threads. I have no idea where you got 100 "workers"

@avivkeller
Copy link
Member

Could you point out what these unrelated changes are? All changes here are purely related to the parallelization.

That's exactly what this PR is doing, it has batch sizes. Also the parallelization will never run more workers than the allowed amount of threads. I have no idea where you got 100 "workers"

Ahh, I got the chunk size with the worker thread confused (thus I did 5x20 = 100)

@ovflowd
Copy link
Member Author

ovflowd commented Dec 4, 2025

Appreciate you pointing these out, I acknowledge that, the issues of asking AI to simplify the changes, it touches unrelated things. Let me work on that.

@ovflowd
Copy link
Member Author

ovflowd commented Dec 4, 2025

@avivkeller re-review? 🙇

Copy link
Member

@avivkeller avivkeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still a fair amount of unrelated changes, but regarding the related changes: looks good with a few non-blocking notes

Almost there!

@avivkeller
Copy link
Member

  • #507 (files) contains a lot of refactors (i.e. adding the Promise.all)
  • And similar

Appreciate you pointing these out, I acknowledge that, the issues of asking AI to simplify the changes, it touches unrelated things. Let me work on that.

If and when I use AI, I usually tell it that if a simplification change can stand on its own, it's likely unrelated, and the model should reconsider whether or not it's needed.

@ovflowd
Copy link
Member Author

ovflowd commented Dec 5, 2025

  • #507 (files) contains a lot of refactors (i.e. adding the Promise.all)
  • And similar

Appreciate you pointing these out, I acknowledge that, the issues of asking AI to simplify the changes, it touches unrelated things. Let me work on that.

If and when I use AI, I usually tell it that if a simplification change can stand on its own, it's likely unrelated, and the model should reconsider whether or not it's needed.

Yeah, on my work MacBook I have system prompts and other settings for that, this work was done on my personal Desktop, gotta configure ol' VS Code to not do unrelated things.

@ovflowd ovflowd enabled auto-merge (squash) December 5, 2025 13:28
@ovflowd ovflowd merged commit 26d5760 into main Dec 5, 2025
19 checks passed
@ovflowd ovflowd deleted the feat/parallelization-worker branch December 5, 2025 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants