Generate en docs

Milvus-doc-bot · Milvus-doc-bot · commit edd9c7f56ece · 2024-12-03T02:09:02.000Z
diff --git a/localization/v2.5.x/site/en/release_notes.md b/localization/v2.5.x/site/en/release_notes.md
@@ -45,12 +45,12 @@ title: Release Notes
 </table>
 <p>Milvus 2.5.0-beta brings significant advancements to enhance usability, scalability, and performance for users dealing with vector search and large-scale data management. With this release, Milvus integrates powerful new features like term-based search, clustering compaction for optimized queries, and versatile support for sparse and dense vector search methods. Enhancements in cluster management, indexing, and data handling introduce new levels of flexibility and ease of use, making Milvus an even more robust and user-friendly vector database.</p>
 <h3 id="Key-Features" class="common-anchor-header">Key Features</h3><h4 id="Full-Text-Search" class="common-anchor-header">Full Text Search</h4><p>Milvus 2.5 supports full text search implemented with Sparse-BM25! This feature is an important complement to Milvus’s strong semantic search capabilities, especially in scenarios involving rare words or technical terms. In previous versions, Milvus supported sparse vectors to assist with keyword search scenarios. These sparse vectors were generated outside of Milvus by neural models like SPLADEv2/BGE-M3 or statistical models such as the BM25 algorithm.</p>
-<p>Milvus 2.5 has built-in tokenization and sparse vector extraction, extending the API from only receiving vectors as input to directly accepting text. BM25 statistical information is updated in real time as data is inserted, enhancing usability and accuracy. Additionally, sparse vectors based on approximate nearest neighbor (ANN) algorithms offer more powerful performance than standard keyword search systems.</p>
-<p>For details, refer to <a href="/docs/full-text-search.md">Full Text Search</a>.</p>
+<p>Powered by <a href="https://github.com/quickwit-oss/tantivy">Tantivy</a>, Milvus 2.5 has built-in analyzers and sparse vector extraction, extending the API from only receiving vectors as input to directly accepting text. BM25 statistical information is updated in real time as data is inserted, enhancing usability and accuracy. Additionally, sparse vectors based on approximate nearest neighbor (ANN) algorithms offer more powerful performance than standard keyword search systems.</p>
+<p>For details, refer to <a href="/docs/analyzer-overview.md">Analyzer Overview</a> and <a href="/docs/full-text-search.md">Full Text Search</a>.</p>
 <h4 id="Cluster-Management-WebUI-Beta" class="common-anchor-header">Cluster Management WebUI (Beta)</h4><p>To better support massive data and rich features, Milvus’s sophisticated design includes various dependencies, numerous node roles, complex data structures, and more. These aspects can pose challenges for usage and maintenance.</p>
 <p>Milvus 2.5 introduces a built-in Cluster Management WebUI, reducing system maintenance difficulty by visualizing Milvus’s complex runtime environment information. This includes details of databases and collections, segments, channels, dependencies, node health status, task information, slow queries, and more.</p>
-<h4 id="Text-Match" class="common-anchor-header">Text Match</h4><p>Milvus 2.5 leverages analyzers and indexing from Tantivy for text preprocessing and index building, supporting precise natural language matching of text data based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.</p>
-<p>For details, refer to <a href="/docs/keyword-match.md">Text Match</a>.</p>
+<h4 id="Text-Match" class="common-anchor-header">Text Match</h4><p>Milvus 2.5 leverages analyzers and indexing from <a href="https://github.com/quickwit-oss/tantivy">Tantivy</a> for text preprocessing and index building, supporting precise natural language matching of text data based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.</p>
+<p>For details, refer to <a href="/docs/analyzer-overview.md">Analyzer Overview</a> and <a href="/docs/keyword-match.md">Text Match</a>.</p>
 <h4 id="Bitmap-Index" class="common-anchor-header">Bitmap Index</h4><p>A new scalar data index has been added to the Milvus family. The BitMap index uses an array of bits, equal in length to the number of rows, to represent the existence of values and accelerate searches.</p>
 <p>Bitmap indexes have traditionally been effective for low-cardinality fields, which have a modest number of distinct values—for example, a column containing gender information with only two possible values: male and female.</p>
 <p>For details, refer to <a href="/docs/bitmap.md">Bitmap Index</a>.</p>
diff --git a/localization/v2.5.x/site/en/userGuide/schema/analyzer/analyzer-overview.md b/localization/v2.5.x/site/en/userGuide/schema/analyzer/analyzer-overview.md
@@ -24,7 +24,7 @@ summary: >-
         ></path>
       </svg>
     </button></h1><p>In text processing, an <strong>analyzer</strong> is a crucial component that converts raw text into a structured, searchable format. Each analyzer typically consists of two core elements: <strong>tokenizer</strong> and <strong>filter</strong>. Together, they transform input text into tokens, refine these tokens, and prepare them for efficient indexing and retrieval.​</p>
-<p>In Milvus, analyzers are configured during collection creation when you add <code translate="no">VARCHAR</code> fields to the collection schema. Tokens produced by an analyzer can be used to build an index for text matching or converted into sparse embeddings for full text search. For more information, refer to <a href="/docs/keyword-match.md">Text Match</a> or <a href="/docs/full-text-search.md">​Full Text Search</a>.​</p>
+<p>Powered by <a href="https://github.com/quickwit-oss/tantivy">Tantivy</a>, analyzers in Milvus are configured during collection creation when you add <code translate="no">VARCHAR</code> fields to the collection schema. Tokens produced by an analyzer can be used to build an index for text matching or converted into sparse embeddings for full text search. For more information, refer to <a href="/docs/keyword-match.md">Text Match</a> or <a href="/docs/full-text-search.md">​Full Text Search</a>.​</p>
 <div class="alert note">
 <p>The use of analyzers may impact performance:​</p>
 <ul>
diff --git a/localization/v2.5.x/site/en/userGuide/search-query-get/full-text-search.md b/localization/v2.5.x/site/en/userGuide/search-query-get/full-text-search.md
@@ -45,7 +45,7 @@ summary: >-
     </button></h2><p>Full text search simplifies the process of text-based searching by eliminating the need for manual embedding. This feature operates through the following workflow:​</p>
 <ol>
 <li><p><strong>Text input</strong>: You insert raw text documents or provide query text without any need for manual embedding.​</p></li>
-<li><p><strong>Text analysis</strong>: Milvus uses an analyzer to tokenize input text into individual, searchable terms.​</p></li>
+<li><p><strong>Text analysis</strong>: Milvus uses an analyzer to tokenize input text into individual, searchable terms.​ For more information on analyzers, refer to <a href="/docs/analyzer-overview.md">Analyzer Overview</a>.</p></li>
 <li><p><strong>Function processing</strong>: The built-in function receives tokenized terms and converts them into sparse vector representations.​</p></li>
 <li><p><strong>Collection store</strong>: Milvus stores these sparse embeddings in a collection for efficient retrieval.​</p></li>
 <li><p><strong>BM25 scoring</strong>: During a search, Milvus applies the BM25 algorithm to calculate scores for the stored documents and ranks matched results based on relevance to the query text.​</p></li>
diff --git a/localization/v2.5.x/site/en/userGuide/search-query-get/metric.md b/localization/v2.5.x/site/en/userGuide/search-query-get/metric.md
@@ -253,7 +253,7 @@ title: Metric Types
 <ul>
 <li><p><code translate="no">​Q</code>: The query text provided by the user.​</p></li>
 <li><p><code translate="no">​D</code>: The document being evaluated.​</p></li>
-<li><p><code translate="no">​TF(qi​,D)</code>: Term frequency, representing how often term ​qi​appears in document ​D.​</p></li>
+<li><p><code translate="no">​TF(qi​,D)</code>: Term frequency, representing how often term <code translate="no">​qi</code> ​appears in document <code translate="no">​D</code>.​</p></li>
 <li><p><code translate="no">​IDF(qi​)</code>: Inverse document frequency, calculated as:​</p>
 <p>
   <span class="img-wrapper">

-Original file line number
+Diff line change
         ></path>
       </svg>
     </button></h1><p>In text processing, an <strong>analyzer</strong> is a crucial component that converts raw text into a structured, searchable format. Each analyzer typically consists of two core elements: <strong>tokenizer</strong> and <strong>filter</strong>. Together, they transform input text into tokens, refine these tokens, and prepare them for efficient indexing and retrieval.</p>
 -<p>In Milvus, analyzers are configured during collection creation when you add <code translate="no">VARCHAR</code> fields to the collection schema. Tokens produced by an analyzer can be used to build an index for text matching or converted into sparse embeddings for full text search. For more information, refer to <a href="/docs/keyword-match.md">Text Match</a> or <a href="/docs/full-text-search.md">Full Text Search</a>.</p>
 +<p>Powered by <a href="https://github.com/quickwit-oss/tantivy">Tantivy</a>, analyzers in Milvus are configured during collection creation when you add <code translate="no">VARCHAR</code> fields to the collection schema. Tokens produced by an analyzer can be used to build an index for text matching or converted into sparse embeddings for full text search. For more information, refer to <a href="/docs/keyword-match.md">Text Match</a> or <a href="/docs/full-text-search.md">Full Text Search</a>.</p>
 <div class="alert note">
 <p>The use of analyzers may impact performance:</p>
 <ul>
-Original file line number
+Diff line change
     </button></h2><p>Full text search simplifies the process of text-based searching by eliminating the need for manual embedding. This feature operates through the following workflow:</p>
 <ol>
 <li><p><strong>Text input</strong>: You insert raw text documents or provide query text without any need for manual embedding.</p></li>
 -<li><p><strong>Text analysis</strong>: Milvus uses an analyzer to tokenize input text into individual, searchable terms.</p></li>
 +<li><p><strong>Text analysis</strong>: Milvus uses an analyzer to tokenize input text into individual, searchable terms. For more information on analyzers, refer to <a href="/docs/analyzer-overview.md">Analyzer Overview</a>.</p></li>
 <li><p><strong>Function processing</strong>: The built-in function receives tokenized terms and converts them into sparse vector representations.</p></li>
 <li><p><strong>Collection store</strong>: Milvus stores these sparse embeddings in a collection for efficient retrieval.</p></li>
 <li><p><strong>BM25 scoring</strong>: During a search, Milvus applies the BM25 algorithm to calculate scores for the stored documents and ranks matched results based on relevance to the query text.</p></li>
-Original file line number
+Diff line change
 <ul>
 <li><p><code translate="no">Q</code>: The query text provided by the user.</p></li>
 <li><p><code translate="no">D</code>: The document being evaluated.</p></li>
 -<li><p><code translate="no">TF(qi,D)</code>: Term frequency, representing how often term qiappears in document D.</p></li>
 +<li><p><code translate="no">TF(qi,D)</code>: Term frequency, representing how often term <code translate="no">qi</code> appears in document <code translate="no">D</code>.</p></li>
 <li><p><code translate="no">IDF(qi)</code>: Inverse document frequency, calculated as:</p>
 <p>
   <span class="img-wrapper">