We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent 5d1632d commit 400e2e2Copy full SHA for 400e2e2
docs/shortfin/llm/user/llama_serving.md
@@ -383,7 +383,8 @@ python -m sharktank.examples.export_paged_llm_v1 \
383
--irpa-file /path/to/output/llama3.1-405b.irpa \
384
--output-mlir /path/to/output/llama3.1-405b.mlir \
385
--output-config /path/to/output/llama3.1-405b.config.json \
386
- --bs 4
+ --bs-prefill 4 \
387
+ --bs-decode 4
388
```
389
390
### Compiling to VMFB
0 commit comments