Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made inference modality agnostic in re-ranking and other parts of the repo #542

Merged
merged 27 commits into from
Apr 28, 2024

Conversation

AlekseySh
Copy link
Contributor

@AlekseySh AlekseySh commented Apr 19, 2024

Changelog (all the functions and classes on the right side are modality agnostic):

  • EmbeddingPairsDataset, ImagePairsDataset -> PairDataset
  • pairwise_inference_on_images, pairwise_inference_on_embeddings -> pairwise_inference
  • IDistancesPostprocessor -> (mostly renamed) -> IRetrievalPostprocessor
  • PairwisePostprocessor, PairwiseEmbeddingsPostprocessor, PairwiseImagesPostprocessor -> PairwiseReranker
  • inference_on_images -> inference
  • inference_on_dataframe -> inference_cached

Also:

  • EmbeddingMetrics takes optional dataset argument in order to perform postprocessing.
  • Made postprocessing tests a bit more informative via making dummy models a bit less trivial (added bias to their outputs)

Examples changed:

  • train + val and prediction for postprocessor
  • retrieval usage
    • added global_paths parameter to download_mock_dataset so it looks nicer

@AlekseySh AlekseySh self-assigned this Apr 19, 2024
@AlekseySh AlekseySh changed the base branch from main to refactor_datasets April 19, 2024 09:20
@AlekseySh AlekseySh linked an issue Apr 19, 2024 that may be closed by this pull request
@AlekseySh AlekseySh changed the title Made inference modality agnostic in re-ranking and other parts of the repo [WIP] Made inference modality agnostic in re-ranking and other parts of the repo Apr 19, 2024
@AlekseySh AlekseySh changed the base branch from refactor_datasets to main April 20, 2024 23:54
return self.distances_to_return


@pytest.mark.long
Copy link
Contributor Author

@AlekseySh AlekseySh Apr 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this test because it was too tricky and it's hard to support it when changing interfaces.

@AlekseySh AlekseySh requested a review from DaloroAT April 21, 2024 00:10
@AlekseySh AlekseySh changed the title [WIP] Made inference modality agnostic in re-ranking and other parts of the repo Made inference modality agnostic in re-ranking and other parts of the repo Apr 21, 2024
return distances_upd
distances_top = distances_top.view(distances.shape[0], top_n)

distances_upd, ii_rerank = distances_top.sort()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This inconsistent function for ii_rerank... 😁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do u mean?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added
# todo 522: explain what's going on here

so, when all interfaces are settled i will add more explanations there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added examples, I hope it helps

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sry, it's just sort that returns random indices for the same values, like in metrics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh, got it, you are right
the same problem
I hope we will solve it at some time...

@AlekseySh AlekseySh requested review from Fissium and DaloroAT April 22, 2024 00:07
@AlekseySh
Copy link
Contributor Author

AlekseySh commented Apr 22, 2024

REPRODUCING OUR POSTPROCESSING PAPER:

InShop validation with pp:

     Validate metric           DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      OVERALL/cmc/1         0.9480938911437988
     OVERALL/cmc/10         0.9850190281867981
     OVERALL/cmc/100        0.9967646598815918
     OVERALL/cmc/20          0.99057537317276
     OVERALL/cmc/30         0.9929666519165039
     OVERALL/map/10          0.910080075263977
      OVERALL/map/5         0.9475616216659546

SOP:

    Validate metric           DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      OVERALL/cmc/1         0.8812931776046753
     OVERALL/cmc/10         0.9525800943374634
     OVERALL/cmc/100        0.9812898635864258
     OVERALL/cmc/20         0.9645466208457947
     OVERALL/cmc/30         0.9699679613113403
     OVERALL/map/10          0.865265429019928
      OVERALL/map/5         0.8939490914344788

I've used models we trained before:

import gdown

gdown.download(id='13Y6BWkj7Y9fwTcD3ON1_hdqzKAmUCl23', quiet=False)
gdown.download(id='13ixeiusxYOYNfQ1nslWbPRRO0YMX1Dv4', quiet=False) 
gdown.download(id='1L8TmHToKZJiogmEZWNExU0dzI8jh_JIk', quiet=False) 
gdown.download(id='1KBJwqIaa39foEqn271_GSkZOda1lE9Wp', quiet=False)

@@ -1,7 +1,7 @@
postfix: "postprocessing"

seed: 42
precision: 16
precision: 32
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i checked that there is no precision 16 for cpu, so, that value was confusing

@@ -72,7 +72,7 @@ def cat_two_sorted_tensors_and_keep_it_sorted(x1: Tensor, x2: Tensor, eps: float
assert eps >= 0
assert x1.shape[0] == x2.shape[0]

scale = (x2[:, 0] / x1[:, -1]).view(-1, 1)
scale = (x2[:, 0] / x1[:, -1]).view(-1, 1).type_as(x1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error apeared in half precision, so i added type_as

@@ -11,7 +11,7 @@
def main_hydra(cfg: DictConfig) -> None:
cfg = dictconfig_to_dict(cfg)
download_mock_dataset(MOCK_DATASET_PATH)
cfg["data_dir"] = MOCK_DATASET_PATH
cfg["data_dir"] = str(MOCK_DATASET_PATH)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea here and in other similar changes below is to have the same types as in real runs of our pipelines

@AlekseySh
Copy link
Contributor Author

AlekseySh commented Apr 25, 2024

OTHER CHECKS:

  • running predict in pipelines on InShop
  • running vanilla training on SOP
  • running postprocessor training on InShop, see below
Screenshot 2024-04-25 at 1 53 03 PM

@AlekseySh AlekseySh changed the base branch from main to oml_3.0 April 28, 2024 07:08
@AlekseySh AlekseySh merged commit 8f7023a into oml_3.0 Apr 28, 2024
8 checks passed
This was referenced Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[EPIC] Release OML 3.0
3 participants