fix(model): fix bugs of batch generation & support min_new_tokens for inference #972
Job | Run time |
---|---|
2m 26s | |
2m 26s | |
4m 14s | |
5m 5s | |
3m 12s | |
3m 12s | |
2m 25s | |
2m 25s | |
2m 49s | |
2m 49s | |
31m 3s |
Job | Run time |
---|---|
2m 26s | |
2m 26s | |
4m 14s | |
5m 5s | |
3m 12s | |
3m 12s | |
2m 25s | |
2m 25s | |
2m 49s | |
2m 49s | |
31m 3s |