Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
7bedece
needed to add info about sizing but realized to do that we needed a b…
john-wagster Nov 21, 2025
4583290
Merge branch 'main' into diskbbq_updates
john-wagster Nov 21, 2025
862a91d
tried to clarify / ground the DiskBBQ memory usage and fixed a *2 bug…
john-wagster Nov 21, 2025
51a3d9b
Merge branch 'diskbbq_updates' of github.com:elastic/docs-content int…
john-wagster Nov 21, 2025
a4bbee5
Update solutions/search/vector/knn.md
john-wagster Nov 24, 2025
494cff4
Update solutions/search/vector/knn.md
john-wagster Nov 24, 2025
c1f575f
Update solutions/search/vector/knn.md
john-wagster Nov 24, 2025
932e654
Update deploy-manage/production-guidance/optimize-performance/approxi…
john-wagster Nov 24, 2025
4d48690
Update deploy-manage/production-guidance/optimize-performance/approxi…
john-wagster Nov 24, 2025
f7bfc84
Update deploy-manage/production-guidance/optimize-performance/approxi…
john-wagster Nov 24, 2025
63848b0
Update deploy-manage/production-guidance/optimize-performance/approxi…
john-wagster Nov 24, 2025
0f439c9
Update deploy-manage/production-guidance/optimize-performance/approxi…
john-wagster Nov 24, 2025
ac1f1c7
Update deploy-manage/production-guidance/optimize-performance/approxi…
john-wagster Nov 25, 2025
74c05cc
reword
john-wagster Nov 25, 2025
a48779d
added context
john-wagster Nov 25, 2025
223ab9b
Merge branch 'main' into diskbbq_updates
john-wagster Nov 25, 2025
bccb811
added latex equations attempt
john-wagster Nov 26, 2025
cc7facd
Update deploy-manage/production-guidance/optimize-performance/approxi…
john-wagster Nov 26, 2025
ac19151
Update deploy-manage/production-guidance/optimize-performance/approxi…
john-wagster Nov 26, 2025
13ed394
Update deploy-manage/production-guidance/optimize-performance/approxi…
john-wagster Nov 26, 2025
bec850a
added latex equations; remove newlines
john-wagster Nov 26, 2025
cb58619
adding in suggestions to clean up some lang
john-wagster Nov 26, 2025
11fed20
switched to table
john-wagster Nov 26, 2025
f5718a8
Merge branch 'main' into diskbbq_updates
john-wagster Nov 26, 2025
d749699
latex iter
john-wagster Nov 26, 2025
a79adb2
latex iter
john-wagster Nov 26, 2025
9ea1f2c
latex iter
john-wagster Nov 26, 2025
ffb69e4
latex iter
john-wagster Nov 26, 2025
535040e
latex iter
john-wagster Nov 26, 2025
eb0def0
latex iter
john-wagster Nov 26, 2025
115bd30
latex iter
john-wagster Nov 26, 2025
f688cb0
latex iter
john-wagster Nov 26, 2025
68eb0c0
latex iter
john-wagster Nov 26, 2025
a4920a6
Update deploy-manage/production-guidance/optimize-performance/approxi…
john-wagster Nov 30, 2025
7eae37f
Update deploy-manage/production-guidance/optimize-performance/approxi…
john-wagster Nov 30, 2025
366ecf4
Update deploy-manage/production-guidance/optimize-performance/approxi…
john-wagster Nov 30, 2025
18c1c5a
Merge branch 'main' into diskbbq_updates
john-wagster Nov 30, 2025
fc63931
wording
john-wagster Nov 30, 2025
d3072e3
Merge branch 'main' into diskbbq_updates
john-wagster Dec 1, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,13 @@

## Ensure data nodes have enough memory [_ensure_data_nodes_have_enough_memory]

{{es}} uses the [HNSW](https://arxiv.org/abs/1603.09320) algorithm for approximate kNN search. HNSW is a graph-based algorithm which only works efficiently when most vector data is held in memory. You should ensure that data nodes have at least enough RAM to hold the vector data and index structures. To check the size of the vector data, you can use the [Analyze index disk usage](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-disk-usage) API.
{{es}} uses either the [HNSW](https://arxiv.org/abs/1603.09320) algorithm or the [DiskBBQ](https://www.elastic.co/search-labs/blog/diskbbq-elasticsearch-introduction) algorithm for approximate kNN search.

Check notice on line 49 in deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md

View workflow job for this annotation

GitHub Actions / vale

Elastic.Acronyms: 'HNSW' has no definition.

HNSW is a graph-based algorithm which only works efficiently when most vector data is held in memory. You should ensure that data nodes have at least enough RAM to hold the vector data and index structures.

Check notice on line 51 in deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md

View workflow job for this annotation

GitHub Actions / vale

Elastic.Acronyms: 'HNSW' has no definition.

DiskBBQ is a clustering algorithm which can scale effeciently often on less memory than HNSW. Where HNSW will typically experience poor performance without sufficient memory to fit the entire structure in RAM instead DiskBBQ will scale linearly when using less available memory than the total index size. You can start with enough RAM to hold the vector data and index structures but it's likely you will be able to use less than this and still maintain good performance. In testing we find this will be between 1-5% of the index structure size (centroids and quantized vector data) per unique query where unique queries access non-overlapping clusters.

Check notice on line 53 in deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md

View workflow job for this annotation

GitHub Actions / vale

Elastic.FutureTense: 'will be' might be in future tense. Write in the present tense to describe the state of the product as it is now.

Check notice on line 53 in deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md

View workflow job for this annotation

GitHub Actions / vale

Elastic.FutureTense: 'will be' might be in future tense. Write in the present tense to describe the state of the product as it is now.

Check notice on line 53 in deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md

View workflow job for this annotation

GitHub Actions / vale

Elastic.FutureTense: 'will scale' might be in future tense. Write in the present tense to describe the state of the product as it is now.

Check notice on line 53 in deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md

View workflow job for this annotation

GitHub Actions / vale

Elastic.FutureTense: 'will typically' might be in future tense. Write in the present tense to describe the state of the product as it is now.

Check notice on line 53 in deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md

View workflow job for this annotation

GitHub Actions / vale

Elastic.Acronyms: 'HNSW' has no definition.

Check notice on line 53 in deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md

View workflow job for this annotation

GitHub Actions / vale

Elastic.Acronyms: 'HNSW' has no definition.

To check the size of the vector data, you can use the [Analyze index disk usage](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-disk-usage) API.

Here are estimates for different element types and quantization levels:

Expand All @@ -59,6 +65,8 @@

If utilizing HNSW, the graph must also be in memory, to estimate the required bytes use `num_vectors * 4 * HNSW.m`. The default value for `HNSW.m` is 16, so by default `num_vectors * 4 * 16`.

If utilizing DiskBBQ, a fraction of the clusters and centroids will need to be in memory. When doing this estimation it makes more sense to include both the index structure and the quantized vectors together as the structures are dependent. To estimate the total bytes we compute the cost of the centroids as `num_clusters * num_dimensions * 4 + num_clusters * (num_dimensions + 14)` plus the cost of the quantized vectors within the clusters as `num_vectors * ((num_dimensions/8 + 14 + 2) * 2)` where `num_clusters` is defined as `num_vectors / vectors_per_cluster` which by default will be `num_vectors / 384`

Check notice on line 68 in deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md

View workflow job for this annotation

GitHub Actions / vale

Elastic.FutureTense: 'will be' might be in future tense. Write in the present tense to describe the state of the product as it is now.

Check notice on line 68 in deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md

View workflow job for this annotation

GitHub Actions / vale

Elastic.FutureTense: 'will need' might be in future tense. Write in the present tense to describe the state of the product as it is now.
Copy link
Collaborator

@shainaraskas shainaraskas Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a lot of inline math. consider breaking these calculations into sub-categories with their own headings, and introducing them in the opening of this section (with links down). e.g.

To calculate your memory needs, perform the following calculations based on your element type, algorithm, and vector data size:

  • Base RAM for your element type and quantization method
  • HNSW only: additional RAM for the HNSW graph
  • DiskBBQ only: RAM for clusters and centroids in memory
  • A buffer for additional RAM needs

that will allow you to break down your math in a more readable way ... something like ...

Suggested change
If utilizing DiskBBQ, a fraction of the clusters and centroids will need to be in memory. When doing this estimation it makes more sense to include both the index structure and the quantized vectors together as the structures are dependent. To estimate the total bytes we compute the cost of the centroids as `num_clusters * num_dimensions * 4 + num_clusters * (num_dimensions + 14)` plus the cost of the quantized vectors within the clusters as `num_vectors * ((num_dimensions/8 + 14 + 2) * 2)` where `num_clusters` is defined as `num_vectors / vectors_per_cluster` which by default will be `num_vectors / 384`
### DiskBBQ only: RAM for clusters and centroids in memory
If you're using DiskBBQ, a fraction of the clusters and centroids need to be in memory. When doing this estimation, it makes more sense to include both the index structure and the quantized vectors together, as the structures are dependent. To estimate the total bytes, compute the following:
* The cost of the centroids: `num_clusters * num_dimensions * 4 + num_clusters * (num_dimensions + 14)`
* The cost of the quantized vectors within the clusters: `num_vectors * ((num_dimensions/8 + 14 + 2) * 2)`
Then add them together: `centroids`+`quantized_vectors`
`num_clusters` is defined as `num_vectors / vectors_per_cluster`, which by default will be `num_vectors / 384`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by math related FYI, you can now do LaTeX in our docs: https://elastic.github.io/docs-builder/syntax/math/

Might be overkill for arithmetic but just sharing for info

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good comment is good; when I get a min I'll clean up with Latex both calcs and see if making them more readable in that sense helps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made an attempt to make this easier to read; let me know what ya'll think.


Note that the required RAM is for the filesystem cache, which is separate from the Java heap.

The data nodes should also leave a buffer for other ways that RAM is needed. For example your index might also include text fields and numerics, which also benefit from using filesystem cache. It’s recommended to run benchmarks with your specific dataset to ensure there’s a sufficient amount of memory to give good search performance. You can find [here](https://elasticsearch-benchmarks.elastic.co/#tracks/so_vector) and [here](https://elasticsearch-benchmarks.elastic.co/#tracks/dense_vector) some examples of datasets and configurations that we use for our nightly benchmarks.
Expand All @@ -72,16 +80,53 @@
Loading data into the filesystem cache eagerly on too many indices or too many files will make search *slower* if the filesystem cache is not large enough to hold all the data. Use with caution.
::::


The following file extensions are used for the approximate kNN search: Each extension is broken down by the quantization types.

* {applies_to}`stack: ga 9.3` `cenivf` for DiskBBQ to store centroids
* {applies_to}`stack: ga 9.3` `clivf` for DiskBBQ to store clusters of quantized vectors
* `vex` for the HNSW graph
* `vec` for all non-quantized vector values. This includes all element types: `float`, `byte`, and `bit`.
* `veq` for quantized vectors indexed with [`quantization`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization): `int4` or `int8`
* `veb` for binary vectors indexed with [`quantization`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization): `bbq`
* `vem`, `vemf`, `vemq`, and `vemb` for metadata, usually small and not a concern for preloading

Generally, if you are using a quantized index, you should only preload the relevant quantized values and the HNSW graph. Preloading the raw vectors is not necessary and might be counterproductive.
Generally, if you are using a quantized index, you should only preload the relevant quantized values and index structures such as the HNSW graph. Preloading the raw vectors is not necessary and might be counterproductive.

Additional detail can be gleened on the specific files by using the [stats endpoint](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-stats) which will display information about the index and fields for example for DiskBBQ you might see something like this:

[source,console]
----
GET my_index/_stats?filter_path=indices.my_index.primaries.dense_vector

Example Response:
{
"indices": {
"my_index": {
"primaries": {
"dense_vector": {
"value_count": 3,
"off_heap": {
"total_size_bytes": 249,
"total_veb_size_bytes": 0,
"total_vec_size_bytes": 36,
"total_veq_size_bytes": 0,
"total_vex_size_bytes": 0,
"total_cenivf_size_bytes": 111,
"total_clivf_size_bytes": 102,
"fielddata": {
"my_vector": {
"cenivf_size_bytes": 111,
"clivf_size_bytes": 102,
"vec_size_bytes": 36
}
}
}
}
}
}
}
}
----


## Reduce the number of index segments [_reduce_the_number_of_index_segments]
Expand Down
11 changes: 8 additions & 3 deletions solutions/search/vector/knn.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ Approximate kNN offers low latency and good accuracy, while exact kNN guarantees
## Approximate kNN search [approximate-knn]

::::{warning}
Approximate kNN search has specific resource requirements. All vector data must fit in the node’s page cache for efficient performance. Refer to the [approximate kNN tuning guide](/deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md) for configuration tips.
Approximate kNN search has specific resource requirements. For instance, for HNSW all vector data must fit in the node’s page cache for efficient performance. Refer to the [approximate kNN tuning guide](/deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md) for configuration tips.
::::

To run an approximate kNN search:
Expand Down Expand Up @@ -132,9 +132,10 @@ Support for approximate kNN search was added in version 8.0. Before 8.0, `dense_

### Indexing considerations for approximate kNN search [knn-indexing-considerations]

For approximate kNN, {{es}} stores dense vector values per segment as an [HNSW graph](https://arxiv.org/abs/1603.09320). Building HNSW graphs is compute-intensive, so indexing vectors can take time; you may need to increase client request timeouts for index and bulk operations. The [approximate kNN tuning guide](/deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md) covers indexing performance, sizing, and configuration trade-offs that affect search performance.

In addition to search-time parameters, HNSW exposes index-time settings that balance graph build cost, search speed, and accuracy. When defining your `dense_vector` mapping, use [`index_options`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options) to set these parameters:
For approximate kNN, {{es}} stores dense vector values per segment as an [HNSW graph](https://arxiv.org/abs/1603.09320) or per segment as clusters using [DiskBBQ](https://www.elastic.co/search-labs/blog/diskbbq-elasticsearch-introduction). Building these approximate kNN structures is compute-intensive, so indexing vectors can take time; you may need to increase client request timeouts for index and bulk operations. The [approximate kNN tuning guide](/deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md) covers indexing performance, sizing, and configuration trade-offs that affect search performance.

In addition to search-time parameters, HNSW and DiskBBQ expose index-time settings that balance graph build cost, search speed, and accuracy. When defining your `dense_vector` mapping, use [`index_options`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options) to set these parameters:

```console
PUT image-index
Expand All @@ -156,6 +157,10 @@ PUT image-index
}
```

::::{note}
Support for DisKBBQ was introduced in version 9.2.0
::::

### Tune approximate kNN for speed or accuracy [tune-approximate-knn-for-speed-accuracy]

To gather results, the kNN API first finds a `num_candidates` number of approximate neighbors per shard, computes similarity to the query vector, selects the top `k` per shard, and merges them into the global top `k` nearest neighbors.
Expand Down
Loading