-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some changes for OML 3.0 #549
Conversation
…rning into rework_reranking_before_merge
…pen-metric-learning into refactoring_integration
…rning into refactoring_integration
…tric-learning into continue_refactoring
dataset = ImageQueryGalleryLabeledDataset(df_val, transform=transform) | ||
|
||
# you can optionally provide categories to have category wise metrics | ||
query_categories = np.array(dataset.extra_data["category"])[dataset.get_query_ids()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it seem okay as an example?
Is it possible to keep this metric available as an option if user wants to use it, because this metric is considered the most important metric in the field of biometrics (e.g., face/fingerprint recoginition) where score thresholding is often employed? Having said that, it is also super helpful if this metric (or any other "the-lower-the-better" metrics) can be specified to metric_for_checkpointing like "OVERALL/fnmr@fmr/0.001" with the "mode" as "min" via YAML. The mode is hard-coded as "max" in the current implementation:
|
@@ -162,8 +161,9 @@ def __len__(self) -> int: | |||
return len(self._paths) | |||
|
|||
def visualize(self, item: int, color: TColor = BLACK) -> np.ndarray: | |||
bbox = torch.tensor(self._bboxes[item]) if (self._bboxes is not None) else torch.tensor([torch.nan] * 4) | |||
image = get_img_with_bbox(im_path=self._paths[item], bbox=bbox, color=color) | |||
img = np.array(imread_pillow(self.read_bytes(self._paths[item]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can read with cv2
directly to np
. In addition it's faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it to support paths as urls later on
@@ -57,51 +61,30 @@ def calc_retrieval_metrics( | |||
metrics["precision"] = dict(zip(precision_top_k, precision)) | |||
|
|||
if map_top_k: | |||
map = calc_map(gt_tops, n_gts, map_top_k) | |||
metrics["map"] = dict(zip(map_top_k, map)) | |||
map_ = calc_map(gt_tops, n_gts, map_top_k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
respect built-in
metrics["map"] = dict(zip(map_top_k, map_)) | ||
|
||
if query_categories is not None: | ||
metrics_cat = {c: take_unreduced_metrics_by_mask(metrics, query_categories == c) for c in query_categories} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit strange to have a simple dict metrics
but metrics keys are mixed with some categories keys on the same top level
metrics["map"][5] = 0.5
metrics["cmc"][3] = 0.4
metrics["OVERALL"]["cmc"][3] = 0.4
metrics["cats"]["cmc"][3] = 0.1
metrics["pigs"]["cmc"][3] = 0.2
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it's:
{
"cat": {"cmc": {1: 1.0}, "precision": {3: 2 / 3, 5: 2 / 3}},
"dog": {"cmc": {1: 1.0}, "precision": {3: 1 / 2, 5: 1 / 2}},
OVERALL_CATEGORIES_KEY: ...,
}
|
||
# todo 522: put back fnmr metric | ||
def compute_metrics(self) -> TMetricsDict: # type: ignore | ||
self.acc = self.acc.sync() # gathering data from devices happens here if DDP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong comment
|
||
mask_dataset_sz = categories == category | ||
metrics[category].update(calc_topological_metrics(embeddings[mask_dataset_sz], self.pcf_variance)) | ||
self.metrics_unreduced = {cat: {**metrics_r[cat], **metrics_t[cat]} for cat in metrics_r.keys()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this line metrics_r
have all retrieval metrics and categories as top-level keys because of this, so
- You can't do
metrics_t[cat]
for retrieval metric keys self.metrics_unreduced
won't includemetrics_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline
@deepslug okay, I got your point |
map_top_k=tuple(), | ||
) | ||
|
||
assert math.isclose(metrics["cat"]["cmc"][1], 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rework this assers: compare dicts instead (to check if there are unwilling extra keys)
Changes from this PR have been moved to other PRs: |
CHANGELOG
EmbeddingMetrics
to functional metrics.fnmr@fmr
metric fromEmbeddingMetrics
because we cannot guarantee correctness of its behaviour when postprocessor is presented and the metric is computationally heavy. [decided not to remove this metric]calc_retrieval_metrics_on_full
,calc_gt_mask
,calc_mask_to_ignore
,apply_mask_to_ignore
finally moved to tests to serve as adapters between the old and the new ways of computing metricsshow
argument toRetrievalResults.visualise()