Release new docs to master

Milvus-doc-bot · Milvus-doc-bot · commit ecf20e43da65 · 2025-10-23T03:56:05.000Z
diff --git a/v2.6.x/assets/full-load-mode-vs-tiered-storage-mode.png b/v2.6.x/assets/full-load-mode-vs-tiered-storage-mode.png
diff --git a/v2.6.x/assets/querynode-load-workflow.png b/v2.6.x/assets/querynode-load-workflow.png
diff --git a/v2.6.x/scripts/config.json b/v2.6.x/scripts/config.json
@@ -514,6 +514,11 @@
                         "token": "IbpqwpTFghdA37bU6dDc5eCUnfh",
                         "type": "board",
                         "alt_text": "load-workflow"
+                    },
+                    {
+                        "token": "SAO6wxDUYhoqqtbRwYecjjFUnIf",
+                        "type": "board",
+                        "alt_text": "querynode-load-workflow"
                     }
                 ]
             }
diff --git a/v2.6.x/site/en/userGuide/storage-optimization/eviction.md b/v2.6.x/site/en/userGuide/storage-optimization/eviction.md
@@ -29,34 +29,36 @@ Milvus supports two complementary eviction modes (**sync** and **async**) that w
    </tr>
    <tr>
      <td><p>Trigger</p></td>
-     <td><p>During query or search when memory/disk usage exceeds internal limits.</p></td>
-     <td><p>Background thread periodically checks usage and triggers eviction when high watermark is exceeded.</p></td>
+     <td><p>Occurs during query or search when memory or disk usage exceeds internal limits.</p></td>
+     <td><p>Triggered by a background thread when usage exceeds the high watermark or when cached data reaches its time-to-live (TTL).</p></td>
    </tr>
    <tr>
      <td><p>Behavior</p></td>
-     <td><p>Query execution pauses while cache is reclaimed. Eviction continues until usage drops below the low watermark.</p></td>
-     <td><p>Runs continuously in the background; removes data when usage exceeds high watermark until it falls below the low watermark. Queries are not blocked.</p></td>
+     <td><p>Query or search operations pause temporarily while the QueryNode reclaims cache space. Eviction continues until usage drops below the low watermark or a timeout occurs. If timeout is reached and insufficient data can be reclaimed, the query or search may fail.</p></td>
+     <td><p>Runs periodically in the background, proactively evicting cached data when usage exceeds the high watermark or when data expires based on TTL. Eviction continues until usage drops below the low watermark. Queries are not blocked.</p></td>
    </tr>
    <tr>
      <td><p>Best For</p></td>
-     <td><p>Workloads that can tolerate brief latency spikes or when async eviction cannot reclaim space fast enough.</p></td>
-     <td><p>Latency-sensitive workloads requiring smooth performance. Ideal for proactive resource management.</p></td>
+     <td><p>Workloads that can tolerate brief latency spikes or temporary pauses during peak usage. Useful when async eviction cannot reclaim space fast enough.</p></td>
+     <td><p>Latency-sensitive workloads that require smooth and predictable query performance. Ideal for proactive resource management.</p></td>
    </tr>
    <tr>
      <td><p>Cautions</p></td>
-     <td><p>Adds latency to ongoing queries. May cause timeouts if insufficient reclaimable data.</p></td>
-     <td><p>Requires properly tuned watermarks. Slight background resource overhead.</p></td>
+     <td><p>Can cause short query delays or timeouts if insufficient evictable data is available.</p></td>
+     <td><p>Requires properly tuned high/low watermarks and TTL settings. Slight overhead from the background thread.</p></td>
    </tr>
    <tr>
      <td><p>Configuration</p></td>
      <td><p>Enabled via <code>evictionEnabled: true</code></p></td>
-     <td><p>Enabled via <code>backgroundEvictionEnabled: true</code> (requires <code>evictionEnabled: true</code>)</p></td>
+     <td><p>Enabled via <code>backgroundEvictionEnabled: true</code> (requires <code>evictionEnabled: true</code> at the same time)</p></td>
    </tr>
 </table>
 
 **Recommended setup**:
 
-Enable both modes for optimal balance. Async eviction manages cache usage proactively, while sync eviction acts as a safety fallback when resources are nearly exhausted.
+- Both eviction modes can be enabled together for optimal balance, provided your workload benefits from Tiered Storage and can tolerate eviction-related fetch latency.
+
+- For performance testing or latency-critical scenarios, consider disabling eviction entirely to avoid network fetch overhead after eviction.
 
 <div class="alert note">
 
@@ -104,7 +106,7 @@ queryNode:
 
 Watermarks define when cache eviction begins and ends for both memory and disk. Each resource type has two thresholds:
 
-- **High watermark**: Async eviction starts when usage exceeds this value.
+- **High watermark**: Eviction starts when usage exceeds this value.
 
 - **Low watermark**: Eviction continues until usage falls below this value.
 
@@ -212,59 +214,3 @@ queryNode:
      <td><p>Use a short TTL (hours) for highly dynamic data; use a long TTL (days) for stable datasets. Set 0 to disable time-based expiration.</p></td>
    </tr>
 </table>
-
-## Configure overcommit ratio
-
-Overcommit ratios define how much of the cache is reserved as evictable, allowing QueryNodes to temporarily exceed normal capacity before eviction intensifies.
-
-<div class="alert note">
-
-This configuration takes effect only when [eviction is enabled](eviction.md#Enable-eviction).
-
-</div>
-
-**Example YAML**:
-
-```yaml
-queryNode:
-  segcore:
-    tieredStorage:
-      evictionEnabled: true
-      # Evictable Memory Cache Ratio: 30%
-      # (30% of physical memory is reserved for storing evictable data)
-      evictableMemoryCacheRatio: 0.3
-      # Evictable Disk Cache Ratio: 30%
-      # (30% of disk capacity is reserved for storing evictable data)
-      evictableDiskCacheRatio: 0.3
-```
-
-<table>
-   <tr>
-     <th><p>Parameter</p></th>
-     <th><p>Type</p></th>
-     <th><p>Range</p></th>
-     <th><p>Description</p></th>
-     <th><p>Recommended use case</p></th>
-   </tr>
-   <tr>
-     <td><p><code>evictableMemoryCacheRatio</code></p></td>
-     <td><p>float</p></td>
-     <td><p>[0.0, 1.0]</p></td>
-     <td><p>Portion of memory cache allocated for evictable data.</p></td>
-     <td><p>Start at <code>0.3</code>. Increase (0.5–0.7) for lower eviction frequency; decrease (0.1–0.2) for higher segment capacity.</p></td>
-   </tr>
-   <tr>
-     <td><p><code>evictableDiskCacheRatio</code></p></td>
-     <td><p>float</p></td>
-     <td><p>[0.0, 1.0]</p></td>
-     <td><p>Portion of disk cache allocated for evictable data.</p></td>
-     <td><p>Use similar ratios to memory unless disk I/O becomes a bottleneck.</p></td>
-   </tr>
-</table>
-
-**Boundary behavior**:
-
-- `1.0`: All cache is evictable — eviction rarely triggers, but fewer segments fit per QueryNode.
-
-- `0.0`: No evictable cache — eviction occurs frequently; more segments fit, but latency may increase.
-
diff --git a/v2.6.x/site/en/userGuide/storage-optimization/tiered-storage-overview.md b/v2.6.x/site/en/userGuide/storage-optimization/tiered-storage-overview.md
@@ -9,7 +9,7 @@ beta: Milvus 2.6.4+
 
 In Milvus, the traditional *full-load* mode requires each QueryNode to load all data fields and indexes of a [segment](glossary.md#Segment) at initialization, even data that may never be accessed. This ensures immediate data availability but often leads to wasted resources, including high memory usage, heavy disk activity, and significant I/O overhead, especially when handling large-scale datasets.
 
-*Tiered Storage* addresses this challenge by decoupling data caching from segment loading. Instead of loading all data at once, Milvus introduces a caching layer that distinguishes between hot data (cached locally) and cold data (stored remotely). The QueryNode now loads only lightweight *metadata* initially and dynamically pulls or evicts data on demand. This significantly reduces load time, optimizes local resource utilization, and enables QueryNodes to process datasets that far exceed their physical memory or disk capacity.
+*Tiered Storage* addresses this challenge by decoupling data caching from segment loading. Instead of loading all data at once, Milvus introduces a caching layer that distinguishes between hot data (cached locally) and cold data (stored remotely). The QueryNode now loads only lightweight *metadata* initially and dynamically pulls or evicts field data on demand. This significantly reduces load time, optimizes local resource utilization, and enables QueryNodes to process datasets that far exceed their physical memory or disk capacity.
 
 Consider enabling Tiered Storage in scenarios such as:
 
@@ -47,7 +47,7 @@ The diagram below shows these differences.
 
 Under Tiered Storage, the workflow has these phases:
 
-![Load Workflow](../../../../assets/load-workflow.png)
+![Querynode Load Workflow](../../../../assets/querynode-load-workflow.png)
 
 #### Phase 1: Lazy load
 
@@ -59,17 +59,19 @@ Because field data and index files remain in remote storage until first accessed
 
 **Configuration**
 
-Automatically applied when Tiered Storage is enabled. No other manual setting is required.
+Automatically applied when Tiered Storage is enabled. No manual setting is required.
 
 #### Phase 2: Warm up
 
-To reduce the first-hit latency introduced by [lazy load](tiered-storage-overview.md#Phase-1-Lazy-load), Milvus provides a *Warm Up mechanism.
+To reduce the first-hit latency introduced by [lazy load](tiered-storage-overview.md#Phase-1-Lazy-load), Milvus provides a *Warm Up* mechanism.
 
 Before a segment becomes queryable, Milvus can proactively fetch and cache specific fields or indexes from object storage, ensuring that the first query directly hits cached data instead of triggering on-demand loading.
 
+During warmup, fields will be preloaded at the chunk level, while indexes will be preloaded at the segment level.
+
 **Configuration**
 
-Warm Up settings are defined in the Tiered Storage section of **milvus.yaml**. You can enable or disable preloading for each field or index type and specify the preferred strategy. See [Warm Up](warm-up.md) for configuration examples.
+Warm Up settings are defined in the Tiered Storage section of `milvus.yaml`. You can enable or disable preloading for each field or index type and specify the preferred strategy. See [Warm Up](warm-up.md) for detailed configurations.
 
 #### Phase 3: Partial load
 
@@ -85,7 +87,7 @@ Partial load is automatically applied when Tiered Storage is enabled. No manual
 
 #### Phase 4: Eviction
 
-To maintain healthy resource usage, Milvus automatically releases unused cached data when thresholds are reached.
+To maintain healthy resource usage, Milvus automatically releases unused cached data when specific thresholds are reached.
 
 Eviction follows a [Least Recently Used (LRU)](https://en.wikipedia.org/wiki/Cache_replacement_policies) policy, ensuring that infrequently accessed data is removed first while active data remains in cache.
 
@@ -95,8 +97,6 @@ Eviction is governed by the following configurable items:
 
 - **Cache TTL**: Removes stale cached data after a defined duration of inactivity.
 
-- **Overcommit ratio**: Allows temporary cache oversubscription before aggressive eviction begins, helping absorb short-term workload spikes.
-
 **Configuration**
 
 Enable and tune eviction parameters in **milvus.yaml**. See [Eviction](eviction.md) for detailed configuration.
@@ -147,10 +147,6 @@ queryNode:
       
       # Cache TTL (7 days)
       cacheTtl: 604800
-      
-      # Overcommit Ratios
-      evictableMemoryCacheRatio: 0.3
-      evictableDiskCacheRatio: 0.3
 ```
 
 ### Next steps
@@ -185,6 +181,8 @@ Two common causes:
 
 - QueryNode resources are shared with other workloads, so Tiered Storage cannot correctly assess actual available capacity.
 
+To resolve this, we recommend you allocate dedicated resources for QueryNodes.
+
 ### Why do some queries fail under high concurrency?
 
 If too many queries hit hot data at the same time, QueryNode resource limits may still be exceeded. Some threads may fail due to resource reservation timeouts. Retrying after the load decreases, or allocating more resources, can resolve this.
@@ -195,11 +193,5 @@ Possible causes include:
 
 - Frequent queries to cold data, which must be fetched from storage.
 
-- An overcommit ratio that is too high, leading to frequent eviction.
-
 - Watermarks set too close together, causing frequent synchronous eviction.
 
-### Can Tiered Storage handle unlimited data by overcommitting cache?
-
-No. Overcommit ratios allow QueryNodes to work with more segments than physical memory permits, but excessively high ratios can lead to frequent eviction, cache thrashing, or query failures.
-
diff --git a/v2.6.x/site/en/userGuide/storage-optimization/warm-up.md b/v2.6.x/site/en/userGuide/storage-optimization/warm-up.md
@@ -1,13 +1,13 @@
 ---
 id: warm-up.md
 title: "Warm Up"
-summary: "In Milvus, Warm Up complements Tiered Storage by alleviating first-hit latency that occurs when cold data is accessed for the first time. Once configured, Warm Up preloads selected fields or indexes into the cache before a segment becomes queryable, ensuring that frequently accessed data is available immediately after loading."
+summary: "In Milvus, Warm Up complements Tiered Storage by alleviating first-hit latency that occurs when cold data is accessed for the first time. Once configured, Warm Up preloads selected types of fields or indexes into the cache before a segment becomes queryable, ensuring that frequently accessed data is available immediately after loading."
 beta: Milvus 2.6.4+
 ---
 
 # Warm Up
 
-In Milvus, **Warm Up** complements Tiered Storage by alleviating first-hit latency that occurs when cold data is accessed for the first time. Once configured, Warm Up preloads selected fields or indexes into the cache before a segment becomes queryable, ensuring that frequently accessed data is available immediately after loading.
+In Milvus, **Warm Up** complements Tiered Storage by alleviating first-hit latency that occurs when cold data is accessed for the first time. Once configured, Warm Up preloads selected types of fields or indexes into the cache before a segment becomes queryable, ensuring that frequently accessed data is available immediately after loading.
 
 ## Why warm up
 
@@ -36,7 +36,7 @@ Warm Up is controlled under `queryNode.segcore.tieredStorage.warmup` in `milvus.
      <th><p>Typical scenario</p></th>
    </tr>
    <tr>
-     <td><p><code>sync</code> (default)</p></td>
+     <td><p><code>sync</code></p></td>
      <td><p>Preload before the segment becomes queryable. Load time increases slightly, but the first query incurs no latency.</p></td>
      <td><p>Use for performance-critical data that must be immediately available, such as high-frequency scalar indexes or key vector indexes used in search.</p></td>
    </tr>
@@ -56,9 +56,9 @@ queryNode:
       warmup:
         # options: sync, disable.
         # Specifies the timing for warming up the Tiered Storage cache.
-        # - "sync": data will be loaded into the cache before a segment is considered loaded.
-        # - "disable": data will not be proactively loaded into the cache, and loaded only if needed by search/query tasks.
-        # Defaults to "sync", except for vector field which defaults to "disable".
+        # - `sync`: data will be loaded into the cache before a segment is considered loaded.
+        # - `disable`: data will not be proactively loaded into the cache, and loaded only if needed by search/query tasks.
+        # Defaults to `sync`, except for vector field which defaults to `disable`.
         scalarField: sync
         scalarIndex: sync
         vectorField: disable # cache warmup for vector field raw data is by default disabled.

Original file line number	Diff line number	Diff line change
`@@ -514,6 +514,11 @@`
`514`	`514`	`"token": "IbpqwpTFghdA37bU6dDc5eCUnfh",`
`515`	`515`	`"type": "board",`
`516`	`516`	`"alt_text": "load-workflow"`
	`517`	`+ },`
	`518`	`+ {`
	`519`	`+ "token": "SAO6wxDUYhoqqtbRwYecjjFUnIf",`
	`520`	`+ "type": "board",`
	`521`	`+ "alt_text": "querynode-load-workflow"`
`517`	`522`	`}`
`518`	`523`	`]`
`519`	`524`	`}`