Rework retrieval metrics #525

AlekseySh · 2024-04-10T15:16:49Z

Check list:

Add a few new tests for the case of clipped predictions
Don't forget to call topological metrics
Find in code "todo 522" and check if they can be solved now

AlekseySh · 2024-04-12T01:30:21Z

Changelog:

Changed inputs of calc_retrieval_metrics
PCF is moved outside of calc_retrieval_metrics
FNMR & PCF now return list of floats (for consistency with other metrics)
Precomputed metrics are reused for getting category-based metrics
Some typings got more specific types of Tensors

Fissium · 2024-04-14T11:51:15Z

oml/functional/metrics.py

-    gt_tops = take_2d(mask_gt, ii_top_k)
-    n_gt = mask_gt.sum(dim=1)
+    # let's mark every correctly retrieved item as True and vice versa
+    gt_tops = stack([isin(retrieved_ids[i], tensor(gt_ids[i])) for i in range(n_queries)]).bool()


Remove redundant .bool() call, torch.isin dtype already defaults to bool

you are right, but type checkers and PyCharm are not as smart, so this way I explicitly annotate types

Fissium · 2024-04-14T12:03:35Z

oml/functional/metrics.py

+    output: TMetricsDict = {}
+
+    for k, v in metrics.items():
+        if isinstance(v, (Tensor, np.ndarray)):


Remove redundant check for np.ndarray due to TMetricsDict definition: TMetricsDict = Dict[str, Dict[Union[int, float], Union[float, FloatTensor]]] or change TMetricsDict definition

agree, I prefer to remove the checking

Fissium · 2024-04-14T12:31:55Z

oml/metrics/embeddings.py

+        max_k_arg = max([*self.cmc_top_k, *self.precision_top_k, *self.map_top_k])
+        k = min(self.distance_matrix.shape[1], max_k_arg)  # type: ignore
+        _, retrieved_ids = torch.topk(self.distance_matrix, largest=False, k=k)
+        gt_ids = [torch.nonzero(row, as_tuple=True)[0].tolist() for row in self.mask_gt]  # type: ignore


Why retrieve gt_ids as List[List[int]] instead of List[Tensor]? I propose updating the calc_retrieval_metrics function to accept gt_ids of type List[Tensor] to eliminate the need for double conversion between tensor and list.

Fissium · 2024-04-14T12:33:06Z

oml/utils/misc.py

+            if isinstance(d2[k], torch.Tensor) and isinstance(v, torch.Tensor):
+                is_equal = torch.all(torch.isclose(d2[k], v))
+            elif isinstance(d2[k], float) and isinstance(v, float):
+                is_equal = math.isclose(d2[k], v, rel_tol=1e-6)


Let's ensure consistency in tolerance values for numerical comparisons. Currently, torch.isclose defaults to rtol=1e-05 and atol=1e-08, while math.isclose uses rel_tol=1e-09 and abs_tol=0.0. I suggest updating both functions to use the same values, specifically rtol=1e-06 and atol=1e-08.

DaloroAT · 2024-04-15T17:18:27Z

oml/functional/metrics.py

-    gt_tops = take_2d(mask_gt, ii_top_k)
-    n_gt = mask_gt.sum(dim=1)
+    # let's mark every correctly retrieved item as True and vice versa
+    gt_tops = stack([isin(retrieved_ids[i], gt_ids[i]) for i in range(n_queries)]).bool()


AlekseySh added 2 commits April 10, 2024 22:10

upd

fac3178

upd

e9a81e0

AlekseySh added the rework label Apr 10, 2024

AlekseySh self-assigned this Apr 10, 2024

AlekseySh mentioned this pull request Apr 10, 2024

[EPIC] Release OML 3.0 #522

Closed

AlekseySh linked an issue Apr 10, 2024 that may be closed by this pull request

[EPIC] Release OML 3.0 #522

Closed

AlekseySh added 10 commits April 11, 2024 03:31

upd

367197b

upd

c28a517

upd

dc2d820

upd

43c4e82

minor

bcfb286

rm

3e9d487

upd

90a0057

upd

5c28d51

minor_fix

b35a93e

fixed_docs

a278c44

AlekseySh requested a review from DaloroAT April 12, 2024 01:24

Fissium reviewed Apr 14, 2024

View reviewed changes

AlekseySh added 2 commits April 15, 2024 00:55

upd

b04ed2c

upd

646f3b4

AlekseySh merged commit a018e62 into main_rework_validation Apr 14, 2024
8 checks passed

AlekseySh deleted the rework_retrieval_metrics branch April 14, 2024 18:17

DaloroAT reviewed Apr 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework retrieval metrics #525

Rework retrieval metrics #525

AlekseySh commented Apr 10, 2024 •

edited

Loading

AlekseySh commented Apr 12, 2024

Fissium Apr 14, 2024

AlekseySh Apr 14, 2024

Fissium Apr 14, 2024

AlekseySh Apr 14, 2024

Fissium Apr 14, 2024

AlekseySh Apr 14, 2024

Fissium Apr 14, 2024

AlekseySh Apr 14, 2024

DaloroAT Apr 15, 2024

Rework retrieval metrics #525

Rework retrieval metrics #525

Conversation

AlekseySh commented Apr 10, 2024 • edited Loading

AlekseySh commented Apr 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlekseySh commented Apr 10, 2024 •

edited

Loading