Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

group shuffle helpers #52

Open
idavydov opened this issue Sep 4, 2024 · 0 comments
Open

group shuffle helpers #52

idavydov opened this issue Sep 4, 2024 · 0 comments

Comments

@idavydov
Copy link
Collaborator

idavydov commented Sep 4, 2024

Hi @julianesiebourg, related to your group shuffle use-case and #51 .

I wanted to capture directions on which we can improve the library to accommodate for this case.

  1. Do you think this is a common problem? Intuitively I see that keeping replicates distributed across batches is usually a better strategy.
  2. We can introduce a simple scoring function which will penalize separated samples and use strict improvement. The downside is that it will make shuffling very difficult. I.e., if you want samples 1 and 2 moved from batch X to batch Y, a) we would need to shuffle at least two samples in one iteration (n_shuffle >= 2), b) 1 and 2 should be chosen together at random (quite unlikely) c) destination should be the same batch (probability 1/n_batches). So probably shuffling will be very slow.
  3. Some shuffle with constraints procedure could be a solution. We could try to generalize the example I shared with you. Basically by specifying what's the sample group column. The difficulty is that we could run into a pathological configuration from which there is no way back (without breaking the constraint).

At this stage I think we should just capture this and maybe if you already have an idea of how frequent this is and what other types of group shuffle we might need that would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant