I used the Makefile to run the BERT example. Where can a user Increase expected QPS ? Thanks.
make run_pytorch_performance
Loading BERT configs...
Loading PyTorch model...
Constructing SUT...
Finished constructing SUT.
Constructing QSL...
No cached features at 'eval_features.pickle'... converting from examples...
Creating tokenizer...
Reading examples...
Converting examples to features...
Caching features at 'eval_features.pickle'...
Finished constructing QSL.
Running LoadGen test...
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Offline
Mode : PerformanceOnly
Samples per second: 113.38
Result is : INVALID
Min duration satisfied : NO
Min queries satisfied : Yes
Early stopping satisfied: Yes
Recommendations:
* Increase expected QPS so the loadgen pre-generates a larger (coalesced) query.