Introducing container for storing Retrieval Results #544

AlekseySh · 2024-04-22T12:27:30Z

No description provided.

dapladoc · 2024-04-24T16:12:51Z

oml/functional/knn.py

+    assert (ids_query.ndim == 1) and (ids_gallery.ndim == 1) and (embeddings.ndim == 2)
+    assert len(embeddings) <= len(ids_query) + len(ids_gallery)
+    assert (sequence_ids is None) or (len(sequence_ids) == len(embeddings) and (sequence_ids.ndim == 1))
+    assert (labels_gt is None) or (len(labels_gt) <= len(ids_query) + len(ids_gallery) and (labels_gt.ndim == 1))


or len(labels_gt) == embeddings.shape[0]?

dapladoc · 2024-04-24T16:18:44Z

oml/functional/knn.py

+        if labels_gt is not None:
+            mask_gt_b = labels_gt[ids_query_b][..., None] == labels_gt[ids_gallery][None, ...]
+            mask_gt_b[mask_to_ignore_b] = False
+            gt_ids.extend([LongTensor(row.nonzero()).view(-1) for row in mask_gt_b])  # type: ignore


If labels_gt is not None we can allocate memory for gt_ids before the for-loop.

i think we cannot because every query has arbitorary number of gt, so we have list of tensors as a result

dapladoc · 2024-04-24T16:21:40Z

oml/retrieval/retrieval_results.py

+
+        if gt_ids is not None:
+            assert distances.shape[0] == len(gt_ids)
+            if not all(len(x) > 0 for x in gt_ids):


Wouldn't it be faster to evaluate any(len(x) == 0 for x in gt_ids)?

i think you are right

dapladoc · 2024-04-24T16:24:26Z

oml/retrieval/retrieval_results.py

+        cls,
+        embeddings: FloatTensor,
+        dataset: IQueryGalleryDataset,
+        n_items_to_retrieve: int = 1_000,


I'd suggest to set default value for n_items_to_retrieve to 10 or even 5. 1000 is looking too big.

lets it be in the middle, i set 100 :)

dapladoc · 2024-04-24T16:30:21Z

oml/retrieval/retrieval_results.py

+
+        return RetrievalResults(distances=distances, retrieved_ids=retrieved_ids, gt_ids=gt_ids)
+
+    def __repr__(self) -> str:


For me it looks like a violation of the agreement about the purpose of __repr__ function.

Called by the repr() built-in function to compute the “official” string representation of an object. If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). If this is not possible, a string of the form <...some useful description...> should be returned.

https://docs.python.org/3/reference/datamodel.html#object.__repr__

Maybe it would be better to use __str__ for this message?

DaloroAT · 2024-04-24T15:48:27Z

Makefile

+	find . -type d -name "__pycache__" -exec rm -r {} +
+	find . -type f -name "*.log" -exec rm {} +
+	find . -type f -name "*.predictions.json" -exec rm {} +
+	rm -rf docs/build


Poor docs ((

it's annoying when u do search in the full repo and find some trash in docs html files
so yep

DaloroAT · 2024-04-24T15:51:17Z

oml/functional/knn.py

+    """
+
+    Args:
+        embeddings: Matrix with the shape of ``[n, dim]``


Usage of n here and in top_n confuse. Maybe L (refer to len) instead of lowercase n would be better

DaloroAT · 2024-04-24T15:52:06Z

oml/functional/knn.py

+
+    Args:
+        embeddings: Matrix with the shape of ``[n, dim]``
+        ids_query:  Tensor with the size of ``Q``, where ``Q <= n``. Each element is withing the range ``(0, n - 1)``.


typo: withing
navigate over repo to find the same errors

done, removed everywhere

DaloroAT · 2024-04-24T16:04:31Z

oml/functional/knn.py

+from oml.utils.misc_torch import pairwise_dist
+
+
+def batched_knn(


I think this func has nothing with KNN. It's exact calculation of distance matrix with truncating. However, I don't know how to call it...

it's the same as kNN from sklearn:

from sklearn.neighbors import NearestNeighbors knn = NearestNeighbors(algorithm="auto", p=2) knn.fit(features_galleries) dists, ii_closest = knn.kneighbors(features_queries, n_neighbors=top_k, return_distance=True)

By the way, did not you mean aNN (approximate NN)? But it's not what we are doing here

DaloroAT · 2024-04-24T16:11:40Z

oml/functional/knn.py

+            gt_ids.extend([LongTensor(row.nonzero()).view(-1) for row in mask_gt_b])  # type: ignore
+
+        distances_b[mask_to_ignore_b] = float("inf")
+        distances[i : i + bs], retrieved_ids[i : i + bs] = torch.topk(distances_b, k=top_n, largest=False, sorted=True)


add the second dimension for clarity

DaloroAT · 2024-04-24T16:18:21Z

oml/retrieval/retrieval_results.py

+
+        """
+        if not isinstance(dataset, (IVisualizableDataset, IQueryGalleryDataset)):
+            raise ValueError(


DaloroAT · 2024-04-24T16:20:51Z

oml/retrieval/retrieval_results.py

+        ii_gallery = dataset.get_gallery_ids()
+
+        n_galleries_to_show = min(n_galleries_to_show, self.n_retrieved_items)
+        n_gt_to_show = N_GT_SHOW_EMBEDDING_METRICS if (self.gt_ids is not None) else 0


What about adding this as an argument with the default N_GT_SHOW_EMBEDDING_METRICS? 2 is ok for exp logging, but might be not enough for developing

DaloroAT · 2024-04-24T16:25:47Z

oml/retrieval/retrieval_results.py

+        n_rows, n_cols = len(query_ids), n_galleries_to_show + 1 + n_gt_to_show
+
+        # iterate over queries
+        for j, query_idx in enumerate(query_ids):


j for rows and i for cols, like [j,i]? My entire life it is usually [i,j] 😁

in mine as well :) swaped

DaloroAT · 2024-04-24T16:35:13Z

tests/test_oml/test_retrieval_results/test_retrieval_results.py

+    fig.show()
+    plt.close(fig=fig)
+
+    print(rr)


DaloroAT · 2024-04-24T16:35:26Z

tests/test_oml/test_retrieval_results/test_retrieval_results.py

+
+    print(rr)
+
+    assert True


IT IS WHAT IT IS :)

you know I like keeping assert True in the end
it shows that I did not forget to complete the test implementation :)

ini

6a7e27d

AlekseySh added the rework label Apr 22, 2024

AlekseySh self-assigned this Apr 22, 2024

AlekseySh linked an issue Apr 22, 2024 that may be closed by this pull request

[EPIC] Release OML 3.0 #522

Closed

AlekseySh added 2 commits April 24, 2024 05:40

upd

0748596

upd

b2758fe

AlekseySh requested review from DaloroAT and dapladoc April 23, 2024 23:51

AlekseySh added 2 commits April 24, 2024 07:02

upd

77ea093

upd

54ec7a9

dapladoc reviewed Apr 24, 2024

View reviewed changes

DaloroAT reviewed Apr 24, 2024

View reviewed changes

update

9c8a96d

AlekseySh merged commit 0855c80 into main Apr 25, 2024
8 checks passed

AlekseySh deleted the retrieval_prediction branch April 25, 2024 07:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introducing container for storing Retrieval Results #544

Introducing container for storing Retrieval Results #544

AlekseySh commented Apr 22, 2024

dapladoc Apr 24, 2024

AlekseySh Apr 25, 2024

dapladoc Apr 24, 2024

AlekseySh Apr 25, 2024

dapladoc Apr 24, 2024

AlekseySh Apr 25, 2024

dapladoc Apr 24, 2024

AlekseySh Apr 25, 2024

dapladoc Apr 24, 2024

AlekseySh Apr 25, 2024

DaloroAT Apr 24, 2024

AlekseySh Apr 25, 2024

DaloroAT Apr 24, 2024

AlekseySh Apr 25, 2024

DaloroAT Apr 24, 2024

AlekseySh Apr 25, 2024

DaloroAT Apr 24, 2024

AlekseySh Apr 25, 2024

DaloroAT Apr 24, 2024

AlekseySh Apr 25, 2024

DaloroAT Apr 24, 2024

AlekseySh Apr 25, 2024

DaloroAT Apr 24, 2024

AlekseySh Apr 25, 2024

DaloroAT Apr 24, 2024

AlekseySh Apr 25, 2024

DaloroAT Apr 24, 2024

AlekseySh Apr 25, 2024

DaloroAT Apr 24, 2024

AlekseySh Apr 25, 2024

AlekseySh Apr 25, 2024


		return RetrievalResults(distances=distances, retrieved_ids=retrieved_ids, gt_ids=gt_ids)

		def __repr__(self) -> str:

		from oml.utils.misc_torch import pairwise_dist


		def batched_knn(

Introducing container for storing Retrieval Results #544

Introducing container for storing Retrieval Results #544

Conversation

AlekseySh commented Apr 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment