You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix max number of tokens for synthetic data generator (#170)
When using `prompt_tokens_max` (and not using `prompt_tokens_stdev`),
there will occasionally be one token more than the maximum number
specified. This can be tested as follows:
```
from guidellm.utils import IntegerRangeSampler
MIN_VALUE = 5
MAX_VALUE = 15
irs = IntegerRangeSampler(average=(MAX_VALUE - MIN_VALUE) // 2, variance=None, min_value=MIN_VALUE, max_value=MAX_VALUE, random_seed=None)
it = iter(irs)
for _ in range(10000):
assert next(it) != 16
```
The assertion will fire, despite the max being set to 15. This happens
because `random.randint`, which is used by `IntegerRangeSampler`,
generates numbers up to and including the max value it is given. This PR
fixes that.
Co-authored-by: Mark Kurtz <[email protected]>
0 commit comments