Skip to content
This repository was archived by the owner on Aug 30, 2024. It is now read-only.

Commit d355d70

Browse files
authored
[DOC ] Fix cont-batching doc (#280)
* fix cont-batching doc Signed-off-by: Yu, Zhentao <[email protected]> * update Signed-off-by: Yu, Zhentao <[email protected]> --------- Signed-off-by: Yu, Zhentao <[email protected]>
1 parent acfbc40 commit d355d70

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

docs/continuous_batching.md

+3
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ You can use below codes to get the `token/second` metric if you care about the t
5050
```python
5151
from transformers import AutoTokenizer
5252
from neural_speed import Model
53+
import time
5354

5455
model_name = "meta-llama/Llama-2-7b-hf"
5556
prompts = [
@@ -64,6 +65,8 @@ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, pa
6465
# if the tokenizer has no pad_token, you can specify it.
6566
tokenizer.pad_token = tokenizer.eos_token
6667
pad_token_id = tokenizer.pad_token_id
68+
bs = len(prompts)
69+
print("batch_size is {}.".format(bs))
6770
inputs = tokenizer(prompts, padding=True, return_tensors='pt').input_ids
6871

6972
model = Model()

0 commit comments

Comments
 (0)