Skip to content

Commit b85c6b9

Browse files
jackcookmarkurtz
andauthored
Fix max number of tokens for synthetic data generator (#170)
When using `prompt_tokens_max` (and not using `prompt_tokens_stdev`), there will occasionally be one token more than the maximum number specified. This can be tested as follows: ``` from guidellm.utils import IntegerRangeSampler MIN_VALUE = 5 MAX_VALUE = 15 irs = IntegerRangeSampler(average=(MAX_VALUE - MIN_VALUE) // 2, variance=None, min_value=MIN_VALUE, max_value=MAX_VALUE, random_seed=None) it = iter(irs) for _ in range(10000): assert next(it) != 16 ``` The assertion will fire, despite the max being set to 15. This happens because `random.randint`, which is used by `IntegerRangeSampler`, generates numbers up to and including the max value it is given. This PR fixes that. Co-authored-by: Mark Kurtz <[email protected]>
1 parent 6d8f10c commit b85c6b9

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

src/guidellm/utils/random.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ def __iter__(self) -> Iterator[int]:
3737
if calc_min == calc_max:
3838
yield calc_min
3939
elif not self.variance:
40-
yield self.rng.randint(calc_min, calc_max + 1)
40+
yield self.rng.randint(calc_min, calc_max)
4141
else:
4242
rand = self.rng.gauss(self.average, self.variance)
4343
yield round(max(calc_min, min(calc_max, rand)))

0 commit comments

Comments
 (0)