Skip to content

Commit

Permalink
feat: Update benchmarks (#386)
Browse files Browse the repository at this point in the history
* feat: Update benchmarks

* fix: Improve blog

* feat: Add result files with 1 and 100 parallel clients

* fix: Add elasticsearch benchmark numbers from dbpedia 1M openai embeddings

* feat: Make mean time (latency) the default metric for the 2nd plot

* feat: Improve conclusions based on es results

* fix: Make charts work on changing dataset

* fix: Typo in table

* feat: Update results with quantization

* fix: Show a single graph to keep things simple

* fix: Improve words

* review fixes

* add link to open-source

* feat: Improve units to make results more readable

* fix: Small grammatical mistake

* fix: Make search threads value constant based on plot metric

* feat: Make dataset num vectors more readable

* fix: Spacing in table

* feat: Add results from the latest Redis benchmarks

* feat: Update benchmarks page

* feat: Update date and some of the points

* feat: Update benchmarks and content

* chores: Improve observations

* feat: Improve observations and put it after the graph

---------

Co-authored-by: generall <[email protected]>
  • Loading branch information
KShivendu and generall authored Jan 11, 2024
1 parent 1ac5118 commit 80f2b6f
Show file tree
Hide file tree
Showing 12 changed files with 42,501 additions and 147 deletions.
43 changes: 20 additions & 23 deletions qdrant-landing/content/benchmarks/benchmark-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,15 @@ title: Benchmarks F.A.Q.
weight: 10
---


# Benchmarks F.A.Q.

## Are we biased?

Of course, we are! Even if we try to be objective, we are not experts in using all the existing vector databases.
We develop Qdrant and try to make it stand out from the crowd.
Due to that, we could have missed some important tweaks in different engines.
Probably, yes. Even if we try to be objective, we are not experts in using all the existing vector databases.
We build Qdrant and know the most about it.
Due to that, we could have missed some important tweaks in different vector search engines.

We tried our best, kept scrolling the docs up and down, and experimented with different configurations to get the most out of the tools. However, we believe you can do it better than us, so all **benchmarks are fully open-sourced, and contributions are welcome**!
However, we tried our best, kept scrolling the docs up and down, experimented with combinations of different configurations, and gave all of them an equal chance to stand out. If you believe you can do it better than us, our **benchmarks are fully [open-sourced](https://github.com/qdrant/vector-db-benchmark), and contributions are welcome**!


## What do we measure?
Expand All @@ -23,16 +22,13 @@ There are several factors considered while deciding on which database to use.
Of course, some of them support a different subset of functionalities, and those might be a key factor to make the decision.
But in general, we all care about the search precision, speed, and resources required to achieve it.

There is one important thing - **the speed of the engines has to be compared only if they achieve the same precision**. Otherwise, they could maximize the speed factors by providing inaccurate results, which everybody would rather avoid. Thus, our benchmark results are compared only at a specific search precision threshold.

We currently have planned measurements in several scenarios, from the most standard - single node deployment to a distributed cluster.

There is one important thing - **the speed of the vector databases should to be compared only if they achieve the same precision**. Otherwise, they could maximize the speed factors by providing inaccurate results, which everybody would rather avoid. Thus, our benchmark results are compared only at a specific search precision threshold.

## How we select hardware?

In our experiments, we are not focusing on the absolute values of the metrics but rather on a relative comparison of different engines.
What is important is the fact we used the same machine for all the tests.
It was just wiped off between launching different engines.
It was just wiped off between launching different engines.

We selected an average machine, which you can easily rent from almost any cloud provider. No extra quota or custom configuration is required.

Expand All @@ -42,19 +38,22 @@ We selected an average machine, which you can easily rent from almost any cloud
Libraries like FAISS provide a great tool to do experiments with vector search. But they are far away from real usage in production environments.
If you are using FAISS in production, in the best case, you never need to update it in real-time. In the worst case, you have to create your custom wrapper around it to support CRUD, high availability, horizontal scalability, concurrent access, and so on.

Some vector search engines even use FAISS under the hood, but the search engine is much more than just an indexing algorithm.
Some vector search engines even use FAISS under the hood, but a search engine is much more than just an indexing algorithm.

We do, however, use the same benchmark datasets as the famous [ann-benchmarks project](https://github.com/erikbern/ann-benchmarks), so you can align your expectations for any practical reasons.
We do, however, use the same benchmark datasets as the famous [ann-benchmarks project](https://github.com/erikbern/ann-benchmarks), so you can align your expectations for any practical reasons.


## Why are you using Python client?

There is no consensus in the world of vector databases when it comes to the best technology to implement such a tool.
You’re free to choose Go, Java or Rust-based systems.
But you’re most likely to generate your embeddings using Python with PyTorch or Tensorflow, as according to stats it is the most commonly used language for Deep Learning.
Thus, you’re probably going to use Python to put the created vectors in the database of your choice either way.
For that reason, using Go, Java or Rust clients will rarely happen in the typical pipeline - although, we encourage you to adopt Rust stack if you care about the performance of your application.
Python clients are also the most popular clients among all the engines, just by looking at the number of GitHub stars.
### Why we decided to test with the Python client

There is no consensus when it comes to the best technology to run benchmarks. You’re free to choose Go, Java or Rust-based systems. But there are two main reasons for us to use Python for this:
1. While generating embeddings you're most likely going to use Python and python based ML frameworks.
2. Based on GitHub stars, python clients are one of the most popular clients across all the engines.

From the user’s perspective, the crucial thing is the latency perceived while using a specific library - in most cases a Python client.
Nobody can and even should redefine the whole technology stack, just because of using a specific search tool.
That’s why we decided to focus primarily on official Python libraries, provided by the database authors.
Those may use some different protocols under the hood, but at the end of the day, we do not care how the data is transferred, as long as it ends up in the target location.


## What about closed-source SaaS platforms?
Expand All @@ -63,13 +62,11 @@ There are some vector databases available as SaaS only so that we couldn’t tes
That makes the comparison unfair. That’s why we purely focused on testing the Open Source vector databases, so everybody may reproduce the benchmarks easily.

This is not the final list, and we’ll continue benchmarking as many different engines as possible.
Some applications do not support the full list of features needed for any particular benchmark, in which case we will exclude them from the list.


## How to reproduce the benchmark?

The source code is available on [Github](https://github.com/qdrant/vector-db-benchmark) and has a README file describing the process of running the benchmark for a specific engine.
The source code is available on [Github](https://github.com/qdrant/vector-db-benchmark) and has a `README.md` file describing the process of running the benchmark for a specific engine.

## How to contribute?

We made the benchmark Open Source because we believe that it has to be transparent. We could have misconfigured one of the engines or just done it inefficiently. If you feel like you could help us out, check out the [benchmark repository](https://github.com/qdrant/vector-db-benchmark).
We made the benchmark Open Source because we believe that it has to be transparent. We could have misconfigured one of the engines or just done it inefficiently. If you feel like you could help us out, check out our [benchmark repository](https://github.com/qdrant/vector-db-benchmark).
34 changes: 18 additions & 16 deletions qdrant-landing/content/benchmarks/benchmarks-intro.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,32 @@
---
draft: false
id: 2
title: How vector search databases should be tested?
title: How vector search should be benchmarked?
weight: 1
---

# Benchmarking Vector Search Engines
# Benchmarking Vector Databases

As an Open Source vector search engine, we are often compared to the competitors and asked about our performance vs the other tools.
But the answer was never simple, as the world of vector databases lacked a unified open benchmark that would show the differences.
So we created one, making some bold assumptions about how it should be done.
Here we describe why we think that’s the best way.
At Qdrant, performance is the top-most priority. We always make sure that we use system resources efficiently so you get the **fastest and most accurate results at the cheapest cloud costs**. So all of our decisions from [choosing Rust](/articles/why-rust), [io optimisations](/articles/io_uring), [serverless support](/articles/serverless), [binary quantization](/articles/binary-quantization), to our [fastembed library](/articles/fastembed) are all based on our principle. In this article, we will compare how Qdrant performs against the other vector search engines.

That is why we perform our benchmarks on exactly the same hardware, which you can rent from any cloud provider.
It does not guarantee the best performance, making the whole process affordable and reproducible, so you can easily repeat it yourself.
So in our benchmarks, we **focus on the relative numbers**, so it is possible to **compare** the performance of different engines given equal resources.
Here are the principles we followed while designing these benchmarks:

The list will be updated:
- We do comparative benchmarks, which means we focus on **relative numbers** rather than absolute numbers.
- We use affordable hardware, so that you can reproduce the results easily.
- We run benchmarks on the same exact machines to avoid any possible hardware bias.
- All the benchmarks are [open-sourced](https://github.com/qdrant/vector-db-benchmark), so you can contribute and improve them.

* Upload & Search speed on single node - [Benchmark](/benchmarks/single-node-speed-benchmark/)
* Filtered search benchmark - [Benchmark](/benchmarks/#filtered-search-benchmark)
* Memory consumption benchmark - TBD
* Cluster mode benchmark - TBD
<details>
<summary> Scenarios we tested </summary>

Some of our experiment design decisions are described at [F.A.Q Section](/benchmarks/#benchmarks-faq).
1. Upload & Search benchmark on single node [Benchmark](/benchmarks/single-node-speed-benchmark/)
2. Filtered search benchmark - [Benchmark](/benchmarks/#filtered-search-benchmark)
3. Memory consumption benchmark - Coming soon
4. Cluster mode benchmark - Coming soon

Suggest your variants of what you want to test in our [Discord channel](https://qdrant.to/discord)!
</details>

</br>

Some of our experiment design decisions are described in the [F.A.Q Section](/benchmarks/#benchmarks-faq).
Reach out to us on our [Discord channel](https://qdrant.to/discord) if you want to discuss anything related Qdrant or these benchmarks.
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
draft: false
id: 5
title:
description:
title:
description: '<b> Updated: Feb 2023 </b>'

filter_data: /benchmarks/filter-result-2023-02-03.json
date: 2023-02-13
Expand Down
8 changes: 4 additions & 4 deletions qdrant-landing/content/benchmarks/filtered-search-intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
draft: false
id: 4
title: Filtered search benchmark
description:
description:

date: 2023-02-13
weight: 3
Expand All @@ -13,7 +13,7 @@ weight: 3
Applying filters to search results brings a whole new level of complexity.
It is no longer enough to apply one algorithm to plain data. With filtering, it becomes a matter of the _cross-integration_ of the different indices.

To measure how well different engines perform in this scenario, we have prepared a set of **Filtered ANN Benchmark Datasets** -
To measure how well different search engines perform in this scenario, we have prepared a set of **Filtered ANN Benchmark Datasets** -
https://github.com/qdrant/ann-filtering-benchmark-datasets


Expand All @@ -27,8 +27,8 @@ HNSW is one of the few of them, but search engines approach its integration in d
- Some use **post-filtering**, which applies filters after ANN search. It doesn't scale well as it either loses results or requires many candidates on the first stage.
- Others use **pre-filtering**, which requires a binary mask of the whole dataset to be passed into the ANN algorithm. It is also not scalable, as the mask size grows linearly with the dataset size.

On top of it, there is also a problem with search accuracy.
On top of it, there is also a problem with search accuracy.
It appears if too many vectors are filtered out, so the HNSW graph becomes disconnected.

Qdrant uses a different approach, not requiring pre- or post-filtering while addressing the accuracy problem.
Read more about the Qdrant approach in our [Filtrable HNSW](/articles/filtrable-hnsw/) article.
Read more about the Qdrant approach in our [Filtrable HNSW](/articles/filtrable-hnsw/) article.
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
draft: false
id: 1
title: Single node benchmarks (2022)
single_node_title: Single node benchmarks
single_node_data: /benchmarks/result-2022-08-10.json
preview_image: /benchmarks/benchmark-1.png
date: 2022-08-23
weight: 2
Unlisted: true
---

This is an archived version of Single node benchmarks. Please refer to the new version [here](/benchmarks/single-node-speed-benchmark/).
Loading

0 comments on commit 80f2b6f

Please sign in to comment.