Hi Qdrant team,
I'm running performance benchmarks on Qdrant using a LAION dataset with 10 million vectors. I wanted to report an issue (or unexpected behavior) regarding QPS when reducing the number of segments.
Benchmark Setup:
Qdrant version: qdrant:v1.14.0-gpu-nvidia docker image
Deployment: Docker container with access to 16 CPU cores and 64 GB RAM
Distance metric: Cosine
Quantization: Not used
Search params: top_k = 10
Client load: 1000 queries using Python's ThreadPoolExecutor with 10 threads
Index configuration: m=32, index; tested with both 16 segments and 2 segments
Observations:
With 16 segments, the system achieves ~200 QPS, and all 16 CPU cores are utilized during the benchmark.
With 2 segments, the QPS remains roughly the same (~200), but only 2–4 CPU cores are utilized during the benchmark.
I also tested 8 segments but the QPS remained the same
I tested QDRANT__STORAGE__PERFORMANCE__ASYNC_SCORER=true config but no enhancement observed.
Concern:
Although Qdrant documentation and community guidance suggest that fewer segments should improve performance, in my case reducing the segment count had no gain in QPS.
In addition, not the entire CPU resources are being utilized and there is something like a bottleneck somewhere.
Questions:
Is this behavior expected under the current architecture?
Can you recommend any solution to use the whole CPU power in searches to lead to a higher QPS?
Thanks for your great work on Qdrant!
Hi Qdrant team,
I'm running performance benchmarks on Qdrant using a LAION dataset with 10 million vectors. I wanted to report an issue (or unexpected behavior) regarding QPS when reducing the number of segments.
Benchmark Setup:
Qdrant version: qdrant:v1.14.0-gpu-nvidia docker image
Deployment: Docker container with access to 16 CPU cores and 64 GB RAM
Distance metric: Cosine
Quantization: Not used
Search params: top_k = 10
Client load: 1000 queries using Python's ThreadPoolExecutor with 10 threads
Index configuration: m=32, index; tested with both 16 segments and 2 segments
Observations:
With 16 segments, the system achieves ~200 QPS, and all 16 CPU cores are utilized during the benchmark.
With 2 segments, the QPS remains roughly the same (~200), but only 2–4 CPU cores are utilized during the benchmark.
I also tested 8 segments but the QPS remained the same
I tested QDRANT__STORAGE__PERFORMANCE__ASYNC_SCORER=true config but no enhancement observed.
Concern:
Although Qdrant documentation and community guidance suggest that fewer segments should improve performance, in my case reducing the segment count had no gain in QPS.
In addition, not the entire CPU resources are being utilized and there is something like a bottleneck somewhere.
Questions:
Is this behavior expected under the current architecture?
Can you recommend any solution to use the whole CPU power in searches to lead to a higher QPS?
Thanks for your great work on Qdrant!