You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanted to capture directions on which we can improve the library to accommodate for this case.
Do you think this is a common problem? Intuitively I see that keeping replicates distributed across batches is usually a better strategy.
We can introduce a simple scoring function which will penalize separated samples and use strict improvement. The downside is that it will make shuffling very difficult. I.e., if you want samples 1 and 2 moved from batch X to batch Y, a) we would need to shuffle at least two samples in one iteration (n_shuffle >= 2), b) 1 and 2 should be chosen together at random (quite unlikely) c) destination should be the same batch (probability 1/n_batches). So probably shuffling will be very slow.
Some shuffle with constraints procedure could be a solution. We could try to generalize the example I shared with you. Basically by specifying what's the sample group column. The difficulty is that we could run into a pathological configuration from which there is no way back (without breaking the constraint).
At this stage I think we should just capture this and maybe if you already have an idea of how frequent this is and what other types of group shuffle we might need that would be great.
The text was updated successfully, but these errors were encountered:
Hi @julianesiebourg, related to your group shuffle use-case and #51 .
I wanted to capture directions on which we can improve the library to accommodate for this case.
n_shuffle
>= 2), b) 1 and 2 should be chosen together at random (quite unlikely) c) destination should be the same batch (probability1/n_batches
). So probably shuffling will be very slow.At this stage I think we should just capture this and maybe if you already have an idea of how frequent this is and what other types of group shuffle we might need that would be great.
The text was updated successfully, but these errors were encountered: