Skip to content

Inconsistent Results with Cosine LSH KNN Algorithm on Large Indices #746

@akhil-bot

Description

@akhil-bot

Support guidelines

Background

I am running a 3-node Elasticsearch cluster on AWS servers. Each index contains between 100,000 to 1 million documents, with potential for further growth.

Bug

I am encountering an issue where identical queries return different sets of documents on repeated executions, leading to inconsistent results for end-users. This inconsistency is negatively impacting the user experience.

Configuration:
I am currently using Cosine LSH for dense vector search with the following mapping:

"chunkVector_1024": {
  "type": "elastiknn_dense_float_vector",
  "elastiknn": {
    "model": "lsh",
    "similarity": "angular",
    "dims": 1024,
    "L": 99,
    "k": 1
  }
}

Query:

{
  "elastiknn_nearest_neighbors": {
    "field": "chunkVector_1024",
    "vec": {"values": {{vector_values}}},
    "model": "lsh",
    "similarity": "angular",
    "candidates": 100
  }
}

Observed Behavior:
The results for the same query vary with each attempt, making the responses unpredictable.

Investigation and Benchmarking:
Switching to the exact kNN approach (as documented in the ElastiKNN plugin) resolves the inconsistency but results in increased latency—approximately double that of the Cosine LSH method.

Latency Comparison:
The below benchmarks are ran on a index with 1 shard, 1 replica, contains ~15k documents

Query Avg Response Time (Cosine LSH) Avg Response Time (Exact kNN)
What is mutual fund? 10.97 ms 20.38 ms
How can I invest in NPS? 10.29 ms 18.70 ms
Advantages of mutual funds? 8.24 ms 19.58 ms
How to open savings account? 10.27 ms 19.41 ms
What are debt funds? 10.96 ms 18.22 ms

Request for Recommendations:
Given the large indices and the need for low latency, how can I optimize the Cosine LSH setup to ensure consistent results while maintaining performance? Are there any adjustments or alternative configurations you would recommend? I would be happy to provide more details if needed.
@alexklibisz

Elastiknn Version

7.17.7

Platform

AWS servers

Steps to reproduce

No response

Additional info

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions