-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Is your feature request related to a problem? Please describe.
Currently, the built-in local search (based on MiniSearch) faces significant performance bottlenecks when scaling to larger documentation sites (e.g., > 16,000 pages or large content volume).
Key Pain Points:
1.Main Thread Blocking: The search index construction and query execution happen on the main thread, causing UI freezes/jank during initialization or typing.
2.High Memory Usage: The index is loaded as a large JSON object. For large sites, this can consume 1GB+ RAM, leading to browser crashes on mobile devices.
3.Slow Initialization: Parsing huge JSON files and building the index at runtime results in a noticeable delay before the search becomes usable.
4.Network Overhead: Transferring full-text data as JSON is inefficient compared to optimized binary formats.
Describe the solution you'd like
I propose (and have implemented a proof-of-concept for) an optimized local search architecture designed for high-performance scenarios. This could be an advanced configuration option or a potential future replacement for the current implementation.
Proposed Architecture:
1.Web Worker Offloading: Move the heavy lifting (index parsing and fuzzy matching) to a Web Worker. This ensures the UI thread remains 60fps responsive, regardless of the dataset size.
2.Static Pre-indexing: Instead of building the index at runtime in the browser, generate the index (e.g., using FlexSearch) during the Node.js build process.
3.Compact Binary/Array Format: Replace verbose JSON objects with an "Array of Arrays" structure (Row-based) and Dictionary Encoding for URLs.
Benchmark in my usage: Reduced memory usage from 1.7GB to ~510MB for the same dataset.
4.Artifact Splitting: Split the index into core (titles/headers) and content (full text). Load core immediately for instant interactivity, and lazy-load content in the background.
5.Native Intl.Segmenter: Use the browser's native Intl.Segmenter for CJK (Chinese/Japanese/Korean) tokenization to remove the dependency on heavy third-party libraries like jieba.
I have implemented this architecture in a private project with excellent results (0ms search latency, <50ms TTI). I am willing to share more technical details or contribute to a PR if the team is interested in this direction.
Describe alternatives you've considered
1.Algolia / DocSearch: Excellent performance but requires a paid subscription for commercial closed-source projects and involves data privacy concerns (data must be uploaded).
2.Server-side Search (Elasticsearch/Meilisearch): Requires deploying and maintaining backend services, losing the "static site" deployment simplicity.
3.Optimization of MiniSearch: Tried tuning MiniSearch options, but the fundamental bottleneck of Main Thread JSON parsing remains hard to overcome for very large datasets.
Additional context
No response
Validations
- Follow our Code of Conduct
- Read the docs.
- Read the Contributing Guidelines.
- Check that there isn't already an issue that asks for the same feature to avoid creating a duplicate.