Skip to content

fix: make random iteration over ComparisonIndexer fair across leaf indexers#2431

Open
mvanhorn wants to merge 2 commits into
TimefoldAI:mainfrom
mvanhorn:fix/2325-fix-make-random-iteration-over-compariso
Open

fix: make random iteration over ComparisonIndexer fair across leaf indexers#2431
mvanhorn wants to merge 2 commits into
TimefoldAI:mainfrom
mvanhorn:fix/2325-fix-make-random-iteration-over-compariso

Conversation

@mvanhorn

Copy link
Copy Markdown
Contributor

Summary

Makes random iteration over a ComparisonIndexer fair across all elements in the query range. When the comparison map holds several buckets (leaf sub-indexers), a tuple now has the same probability of being visited next regardless of which bucket it lives in.

Why this matters

ComparisonIndexer.RandomIterator extended DefaultIterator, which walks the in-range buckets sequentially and only randomizes within each leaf indexer's own randomIterator. As noted in #2325, the iterator picks buckets without regard to how many items each leaf holds, so a tuple in a small bucket is over- or under-represented relative to one in a large bucket. For random move selection this skews the distribution away from uniform-over-elements.

core/.../bavet/common/index/ComparisonIndexer.java now draws each in-range leaf indexer with probability proportional to its size(queryCompositeKey), so every element within the range is equally likely to be visited next. The ordered iterator() / forEach() / size() paths and the single-bucket and empty paths are unchanged.

For the filtered overload, selection draws from each leaf's unfiltered iterator and applies the predicate during selection, removing rejected tuples as they are drawn. This keeps the bucket weights exact, so the result is fair over the surviving elements rather than over the raw bucket sizes. remove() and its IllegalStateException semantics are preserved.

Testing

Added tests in ComparisonIndexerTest exercising the random path directly (constructing the indexer with a random-access leaf backend):

  • randomIteratorIsFairAcrossLeafIndexersOfDifferentSizes: one 100-element bucket plus three single-element buckets, sampled 200k times; every element's selection frequency is approximately uniform (the small-bucket tuples were never selected under the old behavior).
  • randomIteratorWithFilterIsFairOverSurvivingElements: a 100-tuple bucket with a single match against a one-tuple bucket; both survivors are selected about equally often.
  • Single-bucket delegation, out-of-range query, remove() before next(), and predicate-respecting completeness for the filtered overload.

./mvnw -pl core -am test -Dtest=ComparisonIndexerTest -> Tests run: 10, Failures: 0, Errors: 0.

Fixes #2325

@mvanhorn mvanhorn requested a review from triceo as a code owner June 28, 2026 12:31
@mvanhorn mvanhorn changed the title Make random iteration over ComparisonIndexer fair across leaf indexers fix: make random iteration over ComparisonIndexer fair across leaf indexers Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Random iteration over ComparisonIndexer is not fair

1 participant