You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 30, 2024. It is now read-only.
> Note: Not every model supports multi-batching inference and most of them are under construction, please refer to [Supported Models](#supported-models).
48
48
49
+
You can use below codes to get the `token/second` metric if you care about the throughput of batching inference.
50
+
```python
51
+
from transformers import AutoTokenizer
52
+
from neural_speed import Model
53
+
54
+
model_name ="meta-llama/Llama-2-7b-hf"
55
+
prompts = [
56
+
"Tell me an interesting fact about llamas.",
57
+
"What is the best way to cook a steak?",
58
+
"Are you familiar with the Special Theory of Relativity and can you explain it to me?",
0 commit comments