Skip to content

Commit 5b94c89

Browse files
committed
applied reviews by Tim
1 parent ea4f475 commit 5b94c89

File tree

1 file changed

+56
-30
lines changed

1 file changed

+56
-30
lines changed

qdrant-landing/content/articles/indexing-optimization.md

Lines changed: 56 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ preview_dir: /articles_data/indexing-optimization/preview-4
66
social_preview_image: /articles_data/indexing-optimization/social-preview.png
77
weight: -155
88
author: Sabrina Aquino
9-
date: 2025-02-12T00:00:00.000Z
9+
date: 2025-02-13T00:00:00.000Z
1010
category: vector-search-manuals
1111
---
1212

@@ -16,28 +16,29 @@ Efficient memory management is a constant challenge when you’re dealing with *
1616

1717
Let’s take a look at the best practices and recommendations to help you optimize memory usage during bulk uploads in Qdrant. We'll cover scenarios with both **dense** and **sparse** vectors, helping your deployments remain performant even under high load and avoiding out-of-memory errors.
1818

19-
---
2019

21-
### Indexing for dense vs. sparse vectors
20+
## Indexing for dense vs. sparse vectors
2221

2322
**Dense vectors**
2423

25-
Qdrant employs an **HNSW-based index** for fast similarity search on dense vectors. By default, HNSW is built or updated once the number of **unindexed** vectors in a segment exceeds a set `indexing_threshold`. Although it delivers excellent query speed, building or updating the HNSW graph can be **memory-intensive** if it occurs frequently or across many small segments.
24+
Qdrant employs an **HNSW-based index** for fast similarity search on dense vectors. By default, HNSW is built or updated once the number of **unindexed** vectors in a segment exceeds a set `indexing_threshold`. Although it delivers excellent query speed, building or updating the HNSW graph can be **resource-intensive** if it occurs frequently or across many small segments.
2625

2726
**Sparse vectors**
2827

2928
Sparse vectors use an **inverted index**. This index is updated at the **time of upsertion**, meaning you cannot disable or postpone it for sparse vectors. In most cases, its overhead is smaller than that of building an HNSW graph, but you should still be aware that each upsert triggers a sparse index update.
3029

3130

32-
### Disabling vs. deferring dense indexing
31+
## Disabling vs. deferring dense indexing
3332

3433
**`indexing_threshold=0`**
3534

3635
Disables HNSW index creation for dense vectors. Qdrant will not build the HNSW graph for those vectors, letting you upload large volumes of data without incurring the memory cost of index creation.
3736

3837
**`indexing_threshold>0`**
3938

40-
A positive threshold tells Qdrant how many unindexed dense vectors can accumulate in a segment before building the HNSW graph. Small thresholds (e.g., 100) mean more frequent indexing with less data each time, which can still be costly if many segments exist. Larger thresholds (e.g., 10000) delay indexing to batch more vectors at once, potentially using more RAM at the moment of index build, but fewer builds overall.
39+
A positive threshold tells Qdrant how many kilobytes unindexed dense vectors can accumulate in a segment before building the HNSW graph. Small thresholds (e.g., 100 KB) mean more frequent indexing with less data each time, which can still be costly if many segments exist. Larger thresholds (e.g., 10000 KB) delay indexing to batch more vectors at once, potentially using more RAM at the moment of index build, but fewer builds overall.
40+
41+
The following operation can be used to [update](https://qdrant.tech/documentation/concepts/collections/#update-collection-parameters) the indexing threshold in your existing collection:
4142

4243
```json
4344
PATCH /collections/your_collection
@@ -52,27 +53,27 @@ PATCH /collections/your_collection
5253

5354
---
5455

55-
### The `"m"` parameter
56+
## The `"m"` parameter
5657

5758
For dense vectors, the `m` parameter defines how many edges each node in the HNSW graph can have. Setting `"m": 0` effectively **disables the HNSW graph**, so no dense vector index will be built, no matter the `indexing_threshold`. This can be helpful during massive ingestion if you don’t need immediate searchability.
5859

5960
---
6061

61-
## On-Disk Indexing in Qdrant
62+
## On-Disk storage in Qdrant
6263

63-
By default, Qdrant keeps **vectors and indexes** in **memory** to ensure low-latency queries. However, in large-scale or memory-constrained scenarios, you can configure some or all of those indexes to be stored on-disk. This helps reduce RAM usage at the cost of potential increases in query latency, particularly for cold reads.
64+
By default, Qdrant keeps **vectors**, **payload data**, and **indexes** in memory to ensure low-latency queries. However, in large-scale or memory-constrained scenarios, you can configure some or all of them to be stored on-disk. This helps reduce RAM usage at the cost of potential increases in query latency, particularly for cold reads.
6465

65-
**When to use on-disk indexing**:
66-
- You have **very large** or **rarely used** payload indexes, and freeing up RAM is worth potential I/O overhead.
66+
**When to use on-disk**:
67+
- You have **very large** or **rarely used** payload data or indexes, and freeing up RAM is worth potential I/O overhead.
6768
- Your dataset doesn’t fit comfortably in available memory.
69+
- You want to reduce memory pressure.
6870
- You can tolerate slower queries if it ensures the system remains stable under heavy loads.
6971

7072
---
7173

72-
7374
## Memmap storage and segmentation
7475

75-
Qdrant uses **memory-mapped files** (segments) to store data on-disk. Rather than loading all vectors into RAM, Qdrant maps each segment into its address space, paging data in and out on demand. This helps keep the active RAM footprint lower, but each segment still incurs overhead (metadata, page table entries, etc.).
76+
Qdrant uses **memory-mapped files** (segments) to store data on-disk. Rather than loading all vectors into RAM, Qdrant maps each segment into its address space, paging data in and out on demand. This helps keep the active RAM footprint lower, because data can be paged out if memory pressure is high. But each segment still incurs overhead (metadata, page table entries, etc.).
7677

7778
During **high-volume ingestion**, you can accumulate dozens of small segments. Qdrant’s **optimizer** can later merge these into fewer, larger segments, reducing per-segment overhead and lowering total memory usage.
7879

@@ -87,28 +88,33 @@ PATCH /collections/your_collection
8788
}
8889
```
8990

90-
This approach immediately places all incoming vectors on disk. If you want both dense and sparse vectors to be stored on disk, you need to enable `on_disk` for each type separately.
91+
This approach immediately places all incoming vectors on disk, which can be very efficient in case of bulk ingestion.
9192

92-
For dense vectors, set `on_disk: true` inside `hnsw_config`.
93+
However, **vector data and indexes are stored separately**, so enabling `on_disk` for vectors does not automatically store their indexes on disk. To fully optimize memory usage, you may need to configure **both vector storage and index storage** independently.
94+
95+
For dense vectors, you can enable on-disk storage for both the **vector data** and the **HNSW index**:
9396

9497
```json
9598
PATCH /collections/your_collection
9699
{
100+
"vectors": {
101+
"on_disk": true
102+
},
97103
"hnsw_config": {
98104
"on_disk": true
99105
}
100106
}
101107
```
102-
103-
For sparse vectors, configure `on_disk` inside the `index` section of each sparse vector field.
108+
For sparse vectors, you need to enable `on_disk` for both the vector data and the sparse index separately:
104109

105110
```json
106111
PATCH /collections/your_collection
107112
{
108113
"sparse_vectors": {
109114
"text": {
115+
"on_disk": true,
110116
"index": {
111-
"on_disk": false
117+
"on_disk": true
112118
}
113119
}
114120
}
@@ -117,18 +123,31 @@ PATCH /collections/your_collection
117123

118124
---
119125

120-
## Best practices for high-volume vector ingestion
121-
126+
## **Best practices for high-volume vector ingestion**
122127

123128
Bulk ingestion can lead to high memory consumption and even out-of-memory (OOM) errors. **If you’re experiencing out-of-memory errors with your current setup**, scaling up temporarily (increasing available RAM) will provide a buffer while you adjust Qdrant’s configuration for more a efficient data ingestion.
124129

125130
The key here is to control indexing overhead. Let’s walk through the best practices for high-volume vector ingestion in a constrained-memory environment.
126131

127-
**1. Disable HNSW for dense vectors (`m=0`)**
132+
### 1. Store vector data on disk immediately
128133

129-
During an **initial bulk load**, you can **disable** dense indexing by setting `"m"=0`. This ensures Qdrant won’t build an HNSW graph for incoming vectors.
134+
The most effective way to reduce memory usage is to store vector data on disk right from the start using `on_disk: true`. This prevents RAM from being overloaded with raw vectors before optimization kicks in.
130135

131-
Keep `indexing_threshold=10000` (or another large number) so that when you re-enable HNSW, you won’t trigger immediate, frequent index builds. This avoids memory spikes from HNSW building as data arrives, but leaves you the option to enable indexing later by setting `m` to a positive value.
136+
```json
137+
PATCH /collections/your_collection
138+
{
139+
"vectors": {
140+
"on_disk": true
141+
}
142+
}
143+
```
144+
145+
Previously, vector data had to be held in RAM until optimizers could move it to disk, which caused significant memory pressure. Now, by writing vectors to disk directly, memory overhead is significantly reduced, making bulk ingestion much more efficient.
146+
<aside role="status">If your collection already contains a large number of vectors, changing these parameters will trigger a full index reconstruction, potentially causing slight performance degradation.</aside>
147+
148+
### 2. Disable HNSW for dense vectors (`m=0`)
149+
150+
During an **initial bulk load**, you can **disable** dense indexing by setting `"m": 0.` This ensures Qdrant won’t build an HNSW graph for incoming vectors, avoiding unnecessary memory and CPU usage.
132151

133152
```json
134153
PATCH /collections/your_collection
@@ -142,15 +161,22 @@ PATCH /collections/your_collection
142161
}
143162
```
144163

145-
<aside role="status">If your collection already contains a large number of vectors, changing these parameters will trigger a full index reconstruction, potentially causing downtime or performance degradation.</aside>
164+
<aside role="status">If your collection already contains a large number of vectors, changing these parameters will trigger a full index reconstruction, potentially causing slight performance degradation.</aside>
165+
166+
167+
### 3. Let the optimizer run **after** bulk uploads
146168

169+
Qdrant’s optimizers continuously restructure data to improve search efficiency. However, during a bulk upload, this can lead to excessive data movement and overhead as segments are constantly reorganized while new data is still arriving.
147170

148-
**2. Wait for indexation to clear up memory**
171+
To avoid this, **upload all data first**, then allow the optimizer to process everything in one go. This minimizes redundant operations and ensures a more efficient segment structure.
149172

150-
Allow Qdrant to finish any ongoing indexing before doing more operations. Large indexing jobs can keep memory usage high until they fully complete. Watch Qdrant logs or metrics to confirm when indexing finishes. Once that happens, memory consumption should drop as intermediate data structures are freed.
173+
### **4. Wait for indexation to clear up memory**
151174

175+
Before performing additional operations, **allow Qdrant to finish any ongoing indexing**. Large indexing jobs can keep memory usage high until they fully complete.
152176

153-
**3. Re-enable HNSW post-ingestion**
177+
Monitor Qdrant logs or metrics to confirm when indexing finishes—once that happens, memory consumption should drop as intermediate data structures are freed.
178+
179+
### 5. Re-enable HNSW post-ingestion
154180

155181
After the ingestion phase is over and memory usage has stabilized, re-enable HNSW for dense vectors by setting `m` back to a production value (commonly `16` or `32`):
156182

@@ -162,11 +188,11 @@ PATCH /collections/your_collection
162188
}
163189
}
164190
```
191+
<aside role="status"> If you're planning to use quantization, it’s best to enable it before re-enabling indexing, to avoid running additional optimizations later. Ideally, you can set both indexing and quantization in the same update call for efficiency.</aside>
165192

193+
### 5. Enable quantization
166194

167-
**4. Enable quantization**
168-
169-
If you had planned to store all dense vectors on disk, be aware that searches can slow down drastically due to frequent disk I/O. A more balanced approach is **scalar quantization**: compress vectors (e.g., to `int8`) so they fit in RAM without occupying as much space as full floating-point values.
195+
If you had planned to store all dense vectors on disk, be aware that searches can slow down drastically due to frequent disk I/O while memory pressure is high. A more balanced approach is **scalar quantization**: compress vectors (e.g., to `int8`) so they fit in RAM without occupying as much space as full floating-point values.
170196

171197
```json
172198
PATCH /collections/your_collection

0 commit comments

Comments
 (0)