feat(search): Scalable Local Search Architecture (Worker offload + Binary Indexing) for large documentation sites

### Is your feature request related to a problem? Please describe.

Currently, the built-in local search (based on MiniSearch) faces significant performance bottlenecks when scaling to larger documentation sites (e.g., > 16,000 pages or large content volume).

Key Pain Points:

1.Main Thread Blocking: The search index construction and query execution happen on the main thread, causing UI freezes/jank during initialization or typing.

2.High Memory Usage: The index is loaded as a large JSON object. For large sites, this can consume 1GB+ RAM, leading to browser crashes on mobile devices.

3.Slow Initialization: Parsing huge JSON files and building the index at runtime results in a noticeable delay before the search becomes usable.

4.Network Overhead: Transferring full-text data as JSON is inefficient compared to optimized binary formats.

### Describe the solution you'd like

I propose (and have implemented a proof-of-concept for) an optimized local search architecture designed for high-performance scenarios. This could be an advanced configuration option or a potential future replacement for the current implementation.

Proposed Architecture:

1.Web Worker Offloading: Move the heavy lifting (index parsing and fuzzy matching) to a Web Worker. This ensures the UI thread remains 60fps responsive, regardless of the dataset size.

2.Static Pre-indexing: Instead of building the index at runtime in the browser, generate the index (e.g., using FlexSearch) during the Node.js build process.

3.Compact Binary/Array Format: Replace verbose JSON objects with an "Array of Arrays" structure (Row-based) and Dictionary Encoding for URLs.
Benchmark in my usage: Reduced memory usage from 1.7GB to ~510MB for the same dataset.

4.Artifact Splitting: Split the index into core (titles/headers) and content (full text). Load core immediately for instant interactivity, and lazy-load content in the background.

5.Native Intl.Segmenter: Use the browser's native Intl.Segmenter for CJK (Chinese/Japanese/Korean) tokenization to remove the dependency on heavy third-party libraries like jieba.

I have implemented this architecture in a private project with excellent results (0ms search latency, <50ms TTI). I am willing to share more technical details or contribute to a PR if the team is interested in this direction.

### Describe alternatives you've considered

1.Algolia / DocSearch: Excellent performance but requires a paid subscription for commercial closed-source projects and involves data privacy concerns (data must be uploaded).

2.Server-side Search (Elasticsearch/Meilisearch): Requires deploying and maintaining backend services, losing the "static site" deployment simplicity.

3.Optimization of MiniSearch: Tried tuning MiniSearch options, but the fundamental bottleneck of Main Thread JSON parsing remains hard to overcome for very large datasets.

### Additional context

_No response_

### Validations

- [x] Follow our [Code of Conduct](https://vuejs.org/about/coc.html)
- [x] Read the [docs](https://vitepress.dev).
- [x] Read the [Contributing Guidelines](https://github.com/vuejs/vitepress/blob/main/.github/contributing.md).
- [x] Check that there isn't already an issue that asks for the same feature to avoid creating a duplicate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(search): Scalable Local Search Architecture (Worker offload + Binary Indexing) for large documentation sites #5077

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Validations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

feat(search): Scalable Local Search Architecture (Worker offload + Binary Indexing) for large documentation sites #5077

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Validations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions