Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some changes for OML 3.0 #549

Closed
wants to merge 48 commits into from
Closed

Conversation

AlekseySh
Copy link
Contributor

@AlekseySh AlekseySh commented Apr 27, 2024

CHANGELOG

  • Moved category wise metrics calculation logic from EmbeddingMetrics to functional metrics.
  • Removed fnmr@fmr metric from EmbeddingMetrics because we cannot guarantee correctness of its behaviour when postprocessor is presented and the metric is computationally heavy. [decided not to remove this metric]
  • Reworked handling empty bboxes (use one None instead of 4 Nones)
  • calc_retrieval_metrics_on_full, calc_gt_mask, calc_mask_to_ignore, apply_mask_to_ignore finally moved to tests to serve as adapters between the old and the new ways of computing metrics
  • pipelines: a bit of refactoring and improved type hints
  • added show argument to RetrievalResults.visualise()

@AlekseySh AlekseySh changed the base branch from main to refactoring_integration April 28, 2024 16:00
@AlekseySh AlekseySh changed the base branch from refactoring_integration to oml_3.0_release April 28, 2024 16:06
AlekseySh added 2 commits May 10, 2024 22:08
…pen-metric-learning into refactoring_integration
@AlekseySh AlekseySh changed the base branch from oml_3.0_release to refactoring_integration May 10, 2024 16:30
@AlekseySh AlekseySh changed the base branch from refactoring_integration to oml_3.0_release May 10, 2024 16:32
@AlekseySh AlekseySh changed the title Refactoring polishing Refactoring, moved matrix functions to tests, rm visualization.ipynb, reworked bboxes handling May 10, 2024
@AlekseySh AlekseySh changed the title Refactoring, moved matrix functions to tests, rm visualization.ipynb, reworked bboxes handling Changes for OML 3.0 May 11, 2024
dataset = ImageQueryGalleryLabeledDataset(df_val, transform=transform)

# you can optionally provide categories to have category wise metrics
query_categories = np.array(dataset.extra_data["category"])[dataset.get_query_ids()]
Copy link
Contributor Author

@AlekseySh AlekseySh May 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it seem okay as an example?

@AlekseySh AlekseySh changed the title Changes for OML 3.0 Some changes for OML 3.0 May 11, 2024
@deepslug
Copy link

@AlekseySh

Removed fnmr@fmr metric from EmbeddingMetrics because we cannot guarantee correctness of its behaviour when postprocessor is presented and the metric is computationally heavy.

Is it possible to keep this metric available as an option if user wants to use it, because this metric is considered the most important metric in the field of biometrics (e.g., face/fingerprint recoginition) where score thresholding is often employed?

Having said that, it is also super helpful if this metric (or any other "the-lower-the-better" metrics) can be specified to metric_for_checkpointing like "OVERALL/fnmr@fmr/0.001" with the "mode" as "min" via YAML. The mode is hard-coded as "max" in the current implementation:

def parse_ckpt_callback_from_config(cfg: TCfg) -> ModelCheckpoint:
    return ModelCheckpoint(
        dirpath=Path.cwd() / "checkpoints",
        monitor=cfg["metric_for_checkpointing"],
        mode="max",
        save_top_k=1,
        verbose=True,
        filename="best",
    )

@@ -162,8 +161,9 @@ def __len__(self) -> int:
return len(self._paths)

def visualize(self, item: int, color: TColor = BLACK) -> np.ndarray:
bbox = torch.tensor(self._bboxes[item]) if (self._bboxes is not None) else torch.tensor([torch.nan] * 4)
image = get_img_with_bbox(im_path=self._paths[item], bbox=bbox, color=color)
img = np.array(imread_pillow(self.read_bytes(self._paths[item])))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can read with cv2 directly to np. In addition it's faster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it to support paths as urls later on

@@ -57,51 +61,30 @@ def calc_retrieval_metrics(
metrics["precision"] = dict(zip(precision_top_k, precision))

if map_top_k:
map = calc_map(gt_tops, n_gts, map_top_k)
metrics["map"] = dict(zip(map_top_k, map))
map_ = calc_map(gt_tops, n_gts, map_top_k)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

respect built-in

metrics["map"] = dict(zip(map_top_k, map_))

if query_categories is not None:
metrics_cat = {c: take_unreduced_metrics_by_mask(metrics, query_categories == c) for c in query_categories}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit strange to have a simple dict metrics but metrics keys are mixed with some categories keys on the same top level

metrics["map"][5] = 0.5
metrics["cmc"][3] = 0.4
metrics["OVERALL"]["cmc"][3] = 0.4
metrics["cats"]["cmc"][3] = 0.1
metrics["pigs"]["cmc"][3] = 0.2
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's:

{
        "cat": {"cmc": {1: 1.0}, "precision": {3: 2 / 3, 5: 2 / 3}},
        "dog": {"cmc": {1: 1.0}, "precision": {3: 1 / 2, 5: 1 / 2}},
        OVERALL_CATEGORIES_KEY: ...,
    }


# todo 522: put back fnmr metric
def compute_metrics(self) -> TMetricsDict: # type: ignore
self.acc = self.acc.sync() # gathering data from devices happens here if DDP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong comment


mask_dataset_sz = categories == category
metrics[category].update(calc_topological_metrics(embeddings[mask_dataset_sz], self.pcf_variance))
self.metrics_unreduced = {cat: {**metrics_r[cat], **metrics_t[cat]} for cat in metrics_r.keys()}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this line metrics_r have all retrieval metrics and categories as top-level keys because of this, so

  1. You can't do metrics_t[cat] for retrieval metric keys
  2. self.metrics_unreduced won't include metrics_t

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline

@AlekseySh
Copy link
Contributor Author

@deepslug okay, I got your point

map_top_k=tuple(),
)

assert math.isclose(metrics["cat"]["cmc"][1], 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rework this assers: compare dicts instead (to check if there are unwilling extra keys)

@AlekseySh AlekseySh marked this pull request as draft May 22, 2024 12:41
@AlekseySh
Copy link
Contributor Author

@AlekseySh AlekseySh closed this May 22, 2024
@AlekseySh AlekseySh deleted the continue_refactoring branch May 22, 2024 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[EPIC] Release OML 3.0
3 participants