Skip to content

Commit ecf20e4

Browse files
Milvus-doc-botMilvus-doc-bot
authored andcommitted
Release new docs to master
1 parent ce30828 commit ecf20e4

File tree

6 files changed

+34
-91
lines changed

6 files changed

+34
-91
lines changed
115 KB
Loading
416 KB
Loading

v2.6.x/scripts/config.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -514,6 +514,11 @@
514514
"token": "IbpqwpTFghdA37bU6dDc5eCUnfh",
515515
"type": "board",
516516
"alt_text": "load-workflow"
517+
},
518+
{
519+
"token": "SAO6wxDUYhoqqtbRwYecjjFUnIf",
520+
"type": "board",
521+
"alt_text": "querynode-load-workflow"
517522
}
518523
]
519524
}

v2.6.x/site/en/userGuide/storage-optimization/eviction.md

Lines changed: 13 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -29,34 +29,36 @@ Milvus supports two complementary eviction modes (**sync** and **async**) that w
2929
</tr>
3030
<tr>
3131
<td><p>Trigger</p></td>
32-
<td><p>During query or search when memory/disk usage exceeds internal limits.</p></td>
33-
<td><p>Background thread periodically checks usage and triggers eviction when high watermark is exceeded.</p></td>
32+
<td><p>Occurs during query or search when memory or disk usage exceeds internal limits.</p></td>
33+
<td><p>Triggered by a background thread when usage exceeds the high watermark or when cached data reaches its time-to-live (TTL).</p></td>
3434
</tr>
3535
<tr>
3636
<td><p>Behavior</p></td>
37-
<td><p>Query execution pauses while cache is reclaimed. Eviction continues until usage drops below the low watermark.</p></td>
38-
<td><p>Runs continuously in the background; removes data when usage exceeds high watermark until it falls below the low watermark. Queries are not blocked.</p></td>
37+
<td><p>Query or search operations pause temporarily while the QueryNode reclaims cache space. Eviction continues until usage drops below the low watermark or a timeout occurs. If timeout is reached and insufficient data can be reclaimed, the query or search may fail.</p></td>
38+
<td><p>Runs periodically in the background, proactively evicting cached data when usage exceeds the high watermark or when data expires based on TTL. Eviction continues until usage drops below the low watermark. Queries are not blocked.</p></td>
3939
</tr>
4040
<tr>
4141
<td><p>Best For</p></td>
42-
<td><p>Workloads that can tolerate brief latency spikes or when async eviction cannot reclaim space fast enough.</p></td>
43-
<td><p>Latency-sensitive workloads requiring smooth performance. Ideal for proactive resource management.</p></td>
42+
<td><p>Workloads that can tolerate brief latency spikes or temporary pauses during peak usage. Useful when async eviction cannot reclaim space fast enough.</p></td>
43+
<td><p>Latency-sensitive workloads that require smooth and predictable query performance. Ideal for proactive resource management.</p></td>
4444
</tr>
4545
<tr>
4646
<td><p>Cautions</p></td>
47-
<td><p>Adds latency to ongoing queries. May cause timeouts if insufficient reclaimable data.</p></td>
48-
<td><p>Requires properly tuned watermarks. Slight background resource overhead.</p></td>
47+
<td><p>Can cause short query delays or timeouts if insufficient evictable data is available.</p></td>
48+
<td><p>Requires properly tuned high/low watermarks and TTL settings. Slight overhead from the background thread.</p></td>
4949
</tr>
5050
<tr>
5151
<td><p>Configuration</p></td>
5252
<td><p>Enabled via <code>evictionEnabled: true</code></p></td>
53-
<td><p>Enabled via <code>backgroundEvictionEnabled: true</code> (requires <code>evictionEnabled: true</code>)</p></td>
53+
<td><p>Enabled via <code>backgroundEvictionEnabled: true</code> (requires <code>evictionEnabled: true</code> at the same time)</p></td>
5454
</tr>
5555
</table>
5656

5757
**Recommended setup**:
5858

59-
Enable both modes for optimal balance. Async eviction manages cache usage proactively, while sync eviction acts as a safety fallback when resources are nearly exhausted.
59+
- Both eviction modes can be enabled together for optimal balance, provided your workload benefits from Tiered Storage and can tolerate eviction-related fetch latency.
60+
61+
- For performance testing or latency-critical scenarios, consider disabling eviction entirely to avoid network fetch overhead after eviction.
6062

6163
<div class="alert note">
6264

@@ -104,7 +106,7 @@ queryNode:
104106
105107
Watermarks define when cache eviction begins and ends for both memory and disk. Each resource type has two thresholds:
106108
107-
- **High watermark**: Async eviction starts when usage exceeds this value.
109+
- **High watermark**: Eviction starts when usage exceeds this value.
108110
109111
- **Low watermark**: Eviction continues until usage falls below this value.
110112
@@ -212,59 +214,3 @@ queryNode:
212214
<td><p>Use a short TTL (hours) for highly dynamic data; use a long TTL (days) for stable datasets. Set 0 to disable time-based expiration.</p></td>
213215
</tr>
214216
</table>
215-
216-
## Configure overcommit ratio
217-
218-
Overcommit ratios define how much of the cache is reserved as evictable, allowing QueryNodes to temporarily exceed normal capacity before eviction intensifies.
219-
220-
<div class="alert note">
221-
222-
This configuration takes effect only when [eviction is enabled](eviction.md#Enable-eviction).
223-
224-
</div>
225-
226-
**Example YAML**:
227-
228-
```yaml
229-
queryNode:
230-
segcore:
231-
tieredStorage:
232-
evictionEnabled: true
233-
# Evictable Memory Cache Ratio: 30%
234-
# (30% of physical memory is reserved for storing evictable data)
235-
evictableMemoryCacheRatio: 0.3
236-
# Evictable Disk Cache Ratio: 30%
237-
# (30% of disk capacity is reserved for storing evictable data)
238-
evictableDiskCacheRatio: 0.3
239-
```
240-
241-
<table>
242-
<tr>
243-
<th><p>Parameter</p></th>
244-
<th><p>Type</p></th>
245-
<th><p>Range</p></th>
246-
<th><p>Description</p></th>
247-
<th><p>Recommended use case</p></th>
248-
</tr>
249-
<tr>
250-
<td><p><code>evictableMemoryCacheRatio</code></p></td>
251-
<td><p>float</p></td>
252-
<td><p>[0.0, 1.0]</p></td>
253-
<td><p>Portion of memory cache allocated for evictable data.</p></td>
254-
<td><p>Start at <code>0.3</code>. Increase (0.5–0.7) for lower eviction frequency; decrease (0.1–0.2) for higher segment capacity.</p></td>
255-
</tr>
256-
<tr>
257-
<td><p><code>evictableDiskCacheRatio</code></p></td>
258-
<td><p>float</p></td>
259-
<td><p>[0.0, 1.0]</p></td>
260-
<td><p>Portion of disk cache allocated for evictable data.</p></td>
261-
<td><p>Use similar ratios to memory unless disk I/O becomes a bottleneck.</p></td>
262-
</tr>
263-
</table>
264-
265-
**Boundary behavior**:
266-
267-
- `1.0`: All cache is evictable — eviction rarely triggers, but fewer segments fit per QueryNode.
268-
269-
- `0.0`: No evictable cache — eviction occurs frequently; more segments fit, but latency may increase.
270-

v2.6.x/site/en/userGuide/storage-optimization/tiered-storage-overview.md

Lines changed: 10 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ beta: Milvus 2.6.4+
99

1010
In Milvus, the traditional *full-load* mode requires each QueryNode to load all data fields and indexes of a [segment](glossary.md#Segment) at initialization, even data that may never be accessed. This ensures immediate data availability but often leads to wasted resources, including high memory usage, heavy disk activity, and significant I/O overhead, especially when handling large-scale datasets.
1111

12-
*Tiered Storage* addresses this challenge by decoupling data caching from segment loading. Instead of loading all data at once, Milvus introduces a caching layer that distinguishes between hot data (cached locally) and cold data (stored remotely). The QueryNode now loads only lightweight *metadata* initially and dynamically pulls or evicts data on demand. This significantly reduces load time, optimizes local resource utilization, and enables QueryNodes to process datasets that far exceed their physical memory or disk capacity.
12+
*Tiered Storage* addresses this challenge by decoupling data caching from segment loading. Instead of loading all data at once, Milvus introduces a caching layer that distinguishes between hot data (cached locally) and cold data (stored remotely). The QueryNode now loads only lightweight *metadata* initially and dynamically pulls or evicts field data on demand. This significantly reduces load time, optimizes local resource utilization, and enables QueryNodes to process datasets that far exceed their physical memory or disk capacity.
1313

1414
Consider enabling Tiered Storage in scenarios such as:
1515

@@ -47,7 +47,7 @@ The diagram below shows these differences.
4747

4848
Under Tiered Storage, the workflow has these phases:
4949

50-
![Load Workflow](../../../../assets/load-workflow.png)
50+
![Querynode Load Workflow](../../../../assets/querynode-load-workflow.png)
5151

5252
#### Phase 1: Lazy load
5353

@@ -59,17 +59,19 @@ Because field data and index files remain in remote storage until first accessed
5959

6060
**Configuration**
6161

62-
Automatically applied when Tiered Storage is enabled. No other manual setting is required.
62+
Automatically applied when Tiered Storage is enabled. No manual setting is required.
6363

6464
#### Phase 2: Warm up
6565

66-
To reduce the first-hit latency introduced by [lazy load](tiered-storage-overview.md#Phase-1-Lazy-load), Milvus provides a *Warm Up mechanism.
66+
To reduce the first-hit latency introduced by [lazy load](tiered-storage-overview.md#Phase-1-Lazy-load), Milvus provides a *Warm Up* mechanism.
6767

6868
Before a segment becomes queryable, Milvus can proactively fetch and cache specific fields or indexes from object storage, ensuring that the first query directly hits cached data instead of triggering on-demand loading.
6969

70+
During warmup, fields will be preloaded at the chunk level, while indexes will be preloaded at the segment level.
71+
7072
**Configuration**
7173

72-
Warm Up settings are defined in the Tiered Storage section of **milvus.yaml**. You can enable or disable preloading for each field or index type and specify the preferred strategy. See [Warm Up](warm-up.md) for configuration examples.
74+
Warm Up settings are defined in the Tiered Storage section of `milvus.yaml`. You can enable or disable preloading for each field or index type and specify the preferred strategy. See [Warm Up](warm-up.md) for detailed configurations.
7375

7476
#### Phase 3: Partial load
7577

@@ -85,7 +87,7 @@ Partial load is automatically applied when Tiered Storage is enabled. No manual
8587

8688
#### Phase 4: Eviction
8789

88-
To maintain healthy resource usage, Milvus automatically releases unused cached data when thresholds are reached.
90+
To maintain healthy resource usage, Milvus automatically releases unused cached data when specific thresholds are reached.
8991

9092
Eviction follows a [Least Recently Used (LRU)](https://en.wikipedia.org/wiki/Cache_replacement_policies) policy, ensuring that infrequently accessed data is removed first while active data remains in cache.
9193

@@ -95,8 +97,6 @@ Eviction is governed by the following configurable items:
9597

9698
- **Cache TTL**: Removes stale cached data after a defined duration of inactivity.
9799

98-
- **Overcommit ratio**: Allows temporary cache oversubscription before aggressive eviction begins, helping absorb short-term workload spikes.
99-
100100
**Configuration**
101101

102102
Enable and tune eviction parameters in **milvus.yaml**. See [Eviction](eviction.md) for detailed configuration.
@@ -147,10 +147,6 @@ queryNode:
147147

148148
# Cache TTL (7 days)
149149
cacheTtl: 604800
150-
151-
# Overcommit Ratios
152-
evictableMemoryCacheRatio: 0.3
153-
evictableDiskCacheRatio: 0.3
154150
```
155151
156152
### Next steps
@@ -185,6 +181,8 @@ Two common causes:
185181

186182
- QueryNode resources are shared with other workloads, so Tiered Storage cannot correctly assess actual available capacity.
187183

184+
To resolve this, we recommend you allocate dedicated resources for QueryNodes.
185+
188186
### Why do some queries fail under high concurrency?
189187

190188
If too many queries hit hot data at the same time, QueryNode resource limits may still be exceeded. Some threads may fail due to resource reservation timeouts. Retrying after the load decreases, or allocating more resources, can resolve this.
@@ -195,11 +193,5 @@ Possible causes include:
195193

196194
- Frequent queries to cold data, which must be fetched from storage.
197195

198-
- An overcommit ratio that is too high, leading to frequent eviction.
199-
200196
- Watermarks set too close together, causing frequent synchronous eviction.
201197

202-
### Can Tiered Storage handle unlimited data by overcommitting cache?
203-
204-
No. Overcommit ratios allow QueryNodes to work with more segments than physical memory permits, but excessively high ratios can lead to frequent eviction, cache thrashing, or query failures.
205-

v2.6.x/site/en/userGuide/storage-optimization/warm-up.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
---
22
id: warm-up.md
33
title: "Warm Up"
4-
summary: "In Milvus, Warm Up complements Tiered Storage by alleviating first-hit latency that occurs when cold data is accessed for the first time. Once configured, Warm Up preloads selected fields or indexes into the cache before a segment becomes queryable, ensuring that frequently accessed data is available immediately after loading."
4+
summary: "In Milvus, Warm Up complements Tiered Storage by alleviating first-hit latency that occurs when cold data is accessed for the first time. Once configured, Warm Up preloads selected types of fields or indexes into the cache before a segment becomes queryable, ensuring that frequently accessed data is available immediately after loading."
55
beta: Milvus 2.6.4+
66
---
77

88
# Warm Up
99

10-
In Milvus, **Warm Up** complements Tiered Storage by alleviating first-hit latency that occurs when cold data is accessed for the first time. Once configured, Warm Up preloads selected fields or indexes into the cache before a segment becomes queryable, ensuring that frequently accessed data is available immediately after loading.
10+
In Milvus, **Warm Up** complements Tiered Storage by alleviating first-hit latency that occurs when cold data is accessed for the first time. Once configured, Warm Up preloads selected types of fields or indexes into the cache before a segment becomes queryable, ensuring that frequently accessed data is available immediately after loading.
1111

1212
## Why warm up
1313

@@ -36,7 +36,7 @@ Warm Up is controlled under `queryNode.segcore.tieredStorage.warmup` in `milvus.
3636
<th><p>Typical scenario</p></th>
3737
</tr>
3838
<tr>
39-
<td><p><code>sync</code> (default)</p></td>
39+
<td><p><code>sync</code></p></td>
4040
<td><p>Preload before the segment becomes queryable. Load time increases slightly, but the first query incurs no latency.</p></td>
4141
<td><p>Use for performance-critical data that must be immediately available, such as high-frequency scalar indexes or key vector indexes used in search.</p></td>
4242
</tr>
@@ -56,9 +56,9 @@ queryNode:
5656
warmup:
5757
# options: sync, disable.
5858
# Specifies the timing for warming up the Tiered Storage cache.
59-
# - "sync": data will be loaded into the cache before a segment is considered loaded.
60-
# - "disable": data will not be proactively loaded into the cache, and loaded only if needed by search/query tasks.
61-
# Defaults to "sync", except for vector field which defaults to "disable".
59+
# - `sync`: data will be loaded into the cache before a segment is considered loaded.
60+
# - `disable`: data will not be proactively loaded into the cache, and loaded only if needed by search/query tasks.
61+
# Defaults to `sync`, except for vector field which defaults to `disable`.
6262
scalarField: sync
6363
scalarIndex: sync
6464
vectorField: disable # cache warmup for vector field raw data is by default disabled.

0 commit comments

Comments
 (0)