Skip to content

Commit 400e2e2

Browse files
authored
Update sharded export command (#1078)
We missed a spot when we updated the `llama_serving` guide after enabling `prefill` and `decode` to export with different batch sizes
1 parent 5d1632d commit 400e2e2

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

docs/shortfin/llm/user/llama_serving.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -383,7 +383,8 @@ python -m sharktank.examples.export_paged_llm_v1 \
383383
--irpa-file /path/to/output/llama3.1-405b.irpa \
384384
--output-mlir /path/to/output/llama3.1-405b.mlir \
385385
--output-config /path/to/output/llama3.1-405b.config.json \
386-
--bs 4
386+
--bs-prefill 4 \
387+
--bs-decode 4
387388
```
388389
389390
### Compiling to VMFB

0 commit comments

Comments
 (0)