Skip to content

Integrate vector databases with RetrievalResults #586

@AlekseySh

Description

@AlekseySh

We already have an example where a gallery set is fixed, but queries come online in batches: link.

It would be great to improve its performance with vector databases.
The reworked example may look somehow like:

...

# gallery is huge and fixed, so we only process it once
dataset_gallery = ImageBaseDataset(galleries, transform=transform)
embeddings_gallery = inference(extractor, dataset_gallery, batch_size=4, num_workers=0)

# ONE OF:
index = SklearnKNNIndex(embeddings_gallery)  # a child of IVectorIndex
index = FaissIndex(embeddings_gallery)  # a child of IVectorIndex
index = QdrantIndex(embeddings_gallery)  # a child of IVectorIndex

for queries in [queries1, queries2]:
    dataset_query = ImageBaseDataset(queries, transform=transform)
    embeddings_query = inference(extractor, dataset_query, batch_size=4, num_workers=0)

    rr = RetrievalResults.from_index(
        index = index, embeddings_query=embeddings_query,
        dataset_query=dataset_query, dataset_gallery=dataset_gallery
    )
    rr = ConstantThresholding(th=80).process(rr)
    rr.visualize_qg([0, 1], dataset_query=dataset_query, dataset_gallery=dataset_gallery, show=True)
    print(rr)

I think we should start here with understanding of what IVectorIndex interface should include so it can handle different backends.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    To do

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions