Feature: add Turbo Quant #8509

1995chen · 2026-03-26T07:37:53Z

1995chen
Mar 26, 2026

Google Research just posted a blog and paper about a new algorithm that allows quantizing the KV cache down to under 3 bits with close to 0 accuracy loss.

Blog: https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/

Paper: https://arxiv.org/pdf/2504.19874

Web Site: https://turboquant.net

This could be huge if their claims are true and MLX developers are already jumping on this

https://x.com/Prince_Canuma/status/2036611007523512397

Thought I'd share the news here to see if qdrant developers would be interested in adding this feature.

@timvisee @generall

smqd19 · 2026-04-20T10:09:08Z

smqd19
Apr 20, 2026

This is an interesting development, especially for systems like Qdrant that rely heavily on vector search and large embeddings. If Turbo Quant delivers on its claims—3-bit quantization of KV caches with nearly zero accuracy loss—it could significantly reduce memory overhead for ANN search, making in-memory databases much more efficient.

That said, implementing something like this would require careful evaluation. First, the paper and blog focus on transformer models' KV caches, which may not directly translate to the embeddings used in vector search. We'd need to assess whether the same quantization approach applies to static embeddings or if it's limited to dynamic attention-based scenarios.

From a production standpoint, ultra-low-bit quantization sometimes introduces hardware-specific constraints. For instance, 3-bit values might require custom CUDA kernels or hardware optimizations since most GPUs are optimized for 8-bit or 16-bit operations. If we were to integrate this into Qdrant, we’d need to explore whether it's compatible with common SIMD optimizations or libraries like Faiss that rely on AVX instructions.

It could be worth testing this in a controlled experiment. You could try quantizing embeddings to 3 bits using their method and benchmarking Qdrant’s recall and query latency. If this really achieves minimal accuracy loss and significant memory savings, we can then evaluate its integration as a configurable feature. Curious if anyone else here has tested Turbo Quant in an embedding-heavy workflow yet?

0 replies

timvisee · 2026-04-20T10:22:37Z

timvisee
Apr 20, 2026
Maintainer

We're actively working on implementing TurboQuant along with some Qdrant specific enhancements.

You can find a tracking issue here: #8670

More discussion here: #8524

1 reply

timvisee May 11, 2026
Maintainer

We've released Qdrant 1.18.0 which adds support for TurboQuant. You can read more about it in our release blog here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qdrant

Feature: add Turbo Quant #8509

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Qdrant

Feature: add Turbo Quant #8509

Uh oh!

Uh oh!

1995chen Mar 26, 2026

Replies: 2 comments · 1 reply

Uh oh!

smqd19 Apr 20, 2026

Uh oh!

timvisee Apr 20, 2026 Maintainer

Uh oh!

timvisee May 11, 2026 Maintainer

1995chen
Mar 26, 2026

Replies: 2 comments 1 reply

smqd19
Apr 20, 2026

timvisee
Apr 20, 2026
Maintainer

timvisee May 11, 2026
Maintainer