-
Notifications
You must be signed in to change notification settings - Fork 73
Open
Labels
Description
We already have an example where a gallery set is fixed, but queries come online in batches: link.
It would be great to improve its performance with vector databases.
The reworked example may look somehow like:
...
# gallery is huge and fixed, so we only process it once
dataset_gallery = ImageBaseDataset(galleries, transform=transform)
embeddings_gallery = inference(extractor, dataset_gallery, batch_size=4, num_workers=0)
# ONE OF:
index = SklearnKNNIndex(embeddings_gallery) # a child of IVectorIndex
index = FaissIndex(embeddings_gallery) # a child of IVectorIndex
index = QdrantIndex(embeddings_gallery) # a child of IVectorIndex
for queries in [queries1, queries2]:
dataset_query = ImageBaseDataset(queries, transform=transform)
embeddings_query = inference(extractor, dataset_query, batch_size=4, num_workers=0)
rr = RetrievalResults.from_index(
index = index, embeddings_query=embeddings_query,
dataset_query=dataset_query, dataset_gallery=dataset_gallery
)
rr = ConstantThresholding(th=80).process(rr)
rr.visualize_qg([0, 1], dataset_query=dataset_query, dataset_gallery=dataset_gallery, show=True)
print(rr)I think we should start here with understanding of what IVectorIndex interface should include so it can handle different backends.
deepslug
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
To do