You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: v2.6.x/site/en/userGuide/storage-optimization/eviction.md
+13-67Lines changed: 13 additions & 67 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,34 +29,36 @@ Milvus supports two complementary eviction modes (**sync** and **async**) that w
29
29
</tr>
30
30
<tr>
31
31
<td><p>Trigger</p></td>
32
-
<td><p>During query or search when memory/disk usage exceeds internal limits.</p></td>
33
-
<td><p>Background thread periodically checks usage and triggers eviction when high watermark is exceeded.</p></td>
32
+
<td><p>Occurs during query or search when memory or disk usage exceeds internal limits.</p></td>
33
+
<td><p>Triggered by a background thread when usage exceeds the high watermark or when cached data reaches its time-to-live (TTL).</p></td>
34
34
</tr>
35
35
<tr>
36
36
<td><p>Behavior</p></td>
37
-
<td><p>Query execution pauses while cache is reclaimed. Eviction continues until usage drops below the low watermark.</p></td>
38
-
<td><p>Runs continuously in the background; removes data when usage exceeds high watermark until it falls below the low watermark. Queries are not blocked.</p></td>
37
+
<td><p>Query or search operations pause temporarily while the QueryNode reclaims cache space. Eviction continues until usage drops below the low watermark or a timeout occurs. If timeout is reached and insufficient data can be reclaimed, the query or search may fail.</p></td>
38
+
<td><p>Runs periodically in the background, proactively evicting cached data when usage exceeds the high watermark or when data expires based on TTL. Eviction continues until usage drops below the low watermark. Queries are not blocked.</p></td>
39
39
</tr>
40
40
<tr>
41
41
<td><p>Best For</p></td>
42
-
<td><p>Workloads that can tolerate brief latency spikes or when async eviction cannot reclaim space fast enough.</p></td>
43
-
<td><p>Latency-sensitive workloads requiring smooth performance. Ideal for proactive resource management.</p></td>
42
+
<td><p>Workloads that can tolerate brief latency spikes or temporary pauses during peak usage. Useful when async eviction cannot reclaim space fast enough.</p></td>
43
+
<td><p>Latency-sensitive workloads that require smooth and predictable query performance. Ideal for proactive resource management.</p></td>
44
44
</tr>
45
45
<tr>
46
46
<td><p>Cautions</p></td>
47
-
<td><p>Adds latency to ongoing queries. May cause timeouts if insufficient reclaimable data.</p></td>
<td><p>Can cause short query delays or timeouts if insufficient evictable data is available.</p></td>
48
+
<td><p>Requires properly tuned high/low watermarks and TTL settings. Slight overhead from the background thread.</p></td>
49
49
</tr>
50
50
<tr>
51
51
<td><p>Configuration</p></td>
52
52
<td><p>Enabled via <code>evictionEnabled: true</code></p></td>
53
-
<td><p>Enabled via <code>backgroundEvictionEnabled: true</code> (requires <code>evictionEnabled: true</code>)</p></td>
53
+
<td><p>Enabled via <code>backgroundEvictionEnabled: true</code> (requires <code>evictionEnabled: true</code> at the same time)</p></td>
54
54
</tr>
55
55
</table>
56
56
57
57
**Recommended setup**:
58
58
59
-
Enable both modes for optimal balance. Async eviction manages cache usage proactively, while sync eviction acts as a safety fallback when resources are nearly exhausted.
59
+
- Both eviction modes can be enabled together for optimal balance, provided your workload benefits from Tiered Storage and can tolerate eviction-related fetch latency.
60
+
61
+
- For performance testing or latency-critical scenarios, consider disabling eviction entirely to avoid network fetch overhead after eviction.
60
62
61
63
<divclass="alert note">
62
64
@@ -104,7 +106,7 @@ queryNode:
104
106
105
107
Watermarks define when cache eviction begins and ends for both memory and disk. Each resource type has two thresholds:
106
108
107
-
- **High watermark**: Async eviction starts when usage exceeds this value.
109
+
- **High watermark**: Eviction starts when usage exceeds this value.
108
110
109
111
- **Low watermark**: Eviction continues until usage falls below this value.
110
112
@@ -212,59 +214,3 @@ queryNode:
212
214
<td><p>Use a short TTL (hours) for highly dynamic data; use a long TTL (days) for stable datasets. Set 0 to disable time-based expiration.</p></td>
213
215
</tr>
214
216
</table>
215
-
216
-
## Configure overcommit ratio
217
-
218
-
Overcommit ratios define how much of the cache is reserved as evictable, allowing QueryNodes to temporarily exceed normal capacity before eviction intensifies.
219
-
220
-
<div class="alert note">
221
-
222
-
This configuration takes effect only when [eviction is enabled](eviction.md#Enable-eviction).
223
-
224
-
</div>
225
-
226
-
**Example YAML**:
227
-
228
-
```yaml
229
-
queryNode:
230
-
segcore:
231
-
tieredStorage:
232
-
evictionEnabled: true
233
-
# Evictable Memory Cache Ratio: 30%
234
-
# (30% of physical memory is reserved for storing evictable data)
235
-
evictableMemoryCacheRatio: 0.3
236
-
# Evictable Disk Cache Ratio: 30%
237
-
# (30% of disk capacity is reserved for storing evictable data)
Copy file name to clipboardExpand all lines: v2.6.x/site/en/userGuide/storage-optimization/tiered-storage-overview.md
+10-18Lines changed: 10 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ beta: Milvus 2.6.4+
9
9
10
10
In Milvus, the traditional *full-load* mode requires each QueryNode to load all data fields and indexes of a [segment](glossary.md#Segment) at initialization, even data that may never be accessed. This ensures immediate data availability but often leads to wasted resources, including high memory usage, heavy disk activity, and significant I/O overhead, especially when handling large-scale datasets.
11
11
12
-
*Tiered Storage* addresses this challenge by decoupling data caching from segment loading. Instead of loading all data at once, Milvus introduces a caching layer that distinguishes between hot data (cached locally) and cold data (stored remotely). The QueryNode now loads only lightweight *metadata* initially and dynamically pulls or evicts data on demand. This significantly reduces load time, optimizes local resource utilization, and enables QueryNodes to process datasets that far exceed their physical memory or disk capacity.
12
+
*Tiered Storage* addresses this challenge by decoupling data caching from segment loading. Instead of loading all data at once, Milvus introduces a caching layer that distinguishes between hot data (cached locally) and cold data (stored remotely). The QueryNode now loads only lightweight *metadata* initially and dynamically pulls or evicts field data on demand. This significantly reduces load time, optimizes local resource utilization, and enables QueryNodes to process datasets that far exceed their physical memory or disk capacity.
13
13
14
14
Consider enabling Tiered Storage in scenarios such as:
15
15
@@ -47,7 +47,7 @@ The diagram below shows these differences.
47
47
48
48
Under Tiered Storage, the workflow has these phases:
@@ -59,17 +59,19 @@ Because field data and index files remain in remote storage until first accessed
59
59
60
60
**Configuration**
61
61
62
-
Automatically applied when Tiered Storage is enabled. No other manual setting is required.
62
+
Automatically applied when Tiered Storage is enabled. No manual setting is required.
63
63
64
64
#### Phase 2: Warm up
65
65
66
-
To reduce the first-hit latency introduced by [lazy load](tiered-storage-overview.md#Phase-1-Lazy-load), Milvus provides a *Warm Up mechanism.
66
+
To reduce the first-hit latency introduced by [lazy load](tiered-storage-overview.md#Phase-1-Lazy-load), Milvus provides a *Warm Up* mechanism.
67
67
68
68
Before a segment becomes queryable, Milvus can proactively fetch and cache specific fields or indexes from object storage, ensuring that the first query directly hits cached data instead of triggering on-demand loading.
69
69
70
+
During warmup, fields will be preloaded at the chunk level, while indexes will be preloaded at the segment level.
71
+
70
72
**Configuration**
71
73
72
-
Warm Up settings are defined in the Tiered Storage section of **milvus.yaml**. You can enable or disable preloading for each field or index type and specify the preferred strategy. See [Warm Up](warm-up.md) for configuration examples.
74
+
Warm Up settings are defined in the Tiered Storage section of `milvus.yaml`. You can enable or disable preloading for each field or index type and specify the preferred strategy. See [Warm Up](warm-up.md) for detailed configurations.
73
75
74
76
#### Phase 3: Partial load
75
77
@@ -85,7 +87,7 @@ Partial load is automatically applied when Tiered Storage is enabled. No manual
85
87
86
88
#### Phase 4: Eviction
87
89
88
-
To maintain healthy resource usage, Milvus automatically releases unused cached data when thresholds are reached.
90
+
To maintain healthy resource usage, Milvus automatically releases unused cached data when specific thresholds are reached.
89
91
90
92
Eviction follows a [Least Recently Used (LRU)](https://en.wikipedia.org/wiki/Cache_replacement_policies) policy, ensuring that infrequently accessed data is removed first while active data remains in cache.
91
93
@@ -95,8 +97,6 @@ Eviction is governed by the following configurable items:
95
97
96
98
-**Cache TTL**: Removes stale cached data after a defined duration of inactivity.
Enable and tune eviction parameters in **milvus.yaml**. See [Eviction](eviction.md) for detailed configuration.
@@ -147,10 +147,6 @@ queryNode:
147
147
148
148
# Cache TTL (7 days)
149
149
cacheTtl: 604800
150
-
151
-
# Overcommit Ratios
152
-
evictableMemoryCacheRatio: 0.3
153
-
evictableDiskCacheRatio: 0.3
154
150
```
155
151
156
152
### Next steps
@@ -185,6 +181,8 @@ Two common causes:
185
181
186
182
- QueryNode resources are shared with other workloads, so Tiered Storage cannot correctly assess actual available capacity.
187
183
184
+
To resolve this, we recommend you allocate dedicated resources for QueryNodes.
185
+
188
186
### Why do some queries fail under high concurrency?
189
187
190
188
If too many queries hit hot data at the same time, QueryNode resource limits may still be exceeded. Some threads may fail due to resource reservation timeouts. Retrying after the load decreases, or allocating more resources, can resolve this.
@@ -195,11 +193,5 @@ Possible causes include:
195
193
196
194
- Frequent queries to cold data, which must be fetched from storage.
197
195
198
-
- An overcommit ratio that is too high, leading to frequent eviction.
199
-
200
196
- Watermarks set too close together, causing frequent synchronous eviction.
201
197
202
-
### Can Tiered Storage handle unlimited data by overcommitting cache?
203
-
204
-
No. Overcommit ratios allow QueryNodes to work with more segments than physical memory permits, but excessively high ratios can lead to frequent eviction, cache thrashing, or query failures.
Copy file name to clipboardExpand all lines: v2.6.x/site/en/userGuide/storage-optimization/warm-up.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,13 @@
1
1
---
2
2
id: warm-up.md
3
3
title: "Warm Up"
4
-
summary: "In Milvus, Warm Up complements Tiered Storage by alleviating first-hit latency that occurs when cold data is accessed for the first time. Once configured, Warm Up preloads selected fields or indexes into the cache before a segment becomes queryable, ensuring that frequently accessed data is available immediately after loading."
4
+
summary: "In Milvus, Warm Up complements Tiered Storage by alleviating first-hit latency that occurs when cold data is accessed for the first time. Once configured, Warm Up preloads selected types of fields or indexes into the cache before a segment becomes queryable, ensuring that frequently accessed data is available immediately after loading."
5
5
beta: Milvus 2.6.4+
6
6
---
7
7
8
8
# Warm Up
9
9
10
-
In Milvus, **Warm Up** complements Tiered Storage by alleviating first-hit latency that occurs when cold data is accessed for the first time. Once configured, Warm Up preloads selected fields or indexes into the cache before a segment becomes queryable, ensuring that frequently accessed data is available immediately after loading.
10
+
In Milvus, **Warm Up** complements Tiered Storage by alleviating first-hit latency that occurs when cold data is accessed for the first time. Once configured, Warm Up preloads selected types of fields or indexes into the cache before a segment becomes queryable, ensuring that frequently accessed data is available immediately after loading.
11
11
12
12
## Why warm up
13
13
@@ -36,7 +36,7 @@ Warm Up is controlled under `queryNode.segcore.tieredStorage.warmup` in `milvus.
36
36
<th><p>Typical scenario</p></th>
37
37
</tr>
38
38
<tr>
39
-
<td><p><code>sync</code> (default)</p></td>
39
+
<td><p><code>sync</code></p></td>
40
40
<td><p>Preload before the segment becomes queryable. Load time increases slightly, but the first query incurs no latency.</p></td>
41
41
<td><p>Use for performance-critical data that must be immediately available, such as high-frequency scalar indexes or key vector indexes used in search.</p></td>
42
42
</tr>
@@ -56,9 +56,9 @@ queryNode:
56
56
warmup:
57
57
# options: sync, disable.
58
58
# Specifies the timing for warming up the Tiered Storage cache.
59
-
# - "sync": data will be loaded into the cache before a segment is considered loaded.
60
-
# - "disable": data will not be proactively loaded into the cache, and loaded only if needed by search/query tasks.
61
-
# Defaults to "sync", except for vector field which defaults to "disable".
59
+
# - `sync`: data will be loaded into the cache before a segment is considered loaded.
60
+
# - `disable`: data will not be proactively loaded into the cache, and loaded only if needed by search/query tasks.
61
+
# Defaults to `sync`, except for vector field which defaults to `disable`.
62
62
scalarField: sync
63
63
scalarIndex: sync
64
64
vectorField: disable # cache warmup for vector field raw data is by default disabled.
0 commit comments