Skip to content

Commit 3c4bbe4

Browse files
mikekgfbmalfet
authored andcommitted
Update README.md
Update README.md Add Android tutorial and screenshot Update README.md Add iOS instructions
1 parent 48da10f commit 3c4bbe4

File tree

1 file changed

+34
-13
lines changed

1 file changed

+34
-13
lines changed

Diff for: README.md

+34-13
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ python utils/tokenizer.py --tokenizer-model=/path/to/tokenizer/tokenizer.model
5454

5555
## Eager Execution
5656

57-
Model definition in model.py, generation code in generate.py.
57+
Model definition in model.py, generation code in generate.py. The model checkpoint extension may have either the extension pth or pt.
5858

5959
```
6060
python generate.py --compile --checkpoint_path checkpoints/$MODEL_REPO/model.pth --prompt "Hello, my name is" --device {cuda,cpu,mps}
@@ -66,8 +66,9 @@ To squeeze out a little bit more performance, you can also compile the prefill w
6666
python aoti_export.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --device {cuda,cpu} --out-path ./${MODEL_REPO}.so
6767
```
6868

69-
When you have exported the model,
70-
Note to self: sopath is missing in the current version. Copy the reported path to ./${MODEL_REPO}.so
69+
When you have exported the model, you can test the model with the sequence generator by importing the compiled DSO model with the `-sopath ./{modelname}.so` option.
70+
This gives users the ability to test their model, run any pre-existing model tests against the exported model with the same interface,
71+
and support additional experiments to confirm model quality and speed.
7172

7273
```
7374
python generate.py --device {cuda,cpu} --dso ./${MODEL_REPO}.so --prompt "Hello my name is"
@@ -82,17 +83,20 @@ Note to self: --dso does not currently take an argument, and always loads storie
8283
Use a small model like stories15M.pt to test the instructions in the following section. You must first have ExecuTorch installed before running this command, see the installation instructions in the sections [here](#installation-instructions).
8384

8485
The environment variable MODEL_REPO should point to a directory with the `model.pth` file and `tokenizer.model` file.
85-
The command below will add the file "llama-fast.pte" to your MODEL_REPO directory.
86+
The command below will add the file "${MODEL_REPO}.pte" to your current directory.
8687

8788
```
88-
python et_export.py --checkpoint_path $MODEL_REPO/model.pth -d fp32 --out-path ${MODEL_REPO}
89+
python et_export.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth -d fp32 --out-path ${MODEL_REPO}.pte
8990
```
9091

9192
TODO(fix this): the export command works with "--xnnpack" flag, but the next generate.py command will not run it so we do not set it right now.
93+
When you have exported the model, you can test the model with the sequence generator by importing the compiled DSO model with the `---ptepath ./{modelname}.pte` option.
94+
This gives users the ability to test their model, run any pre-existing model tests against the exported model with the same interface,
95+
and support additional experiments to confirm model quality and speed.
9296

93-
To run the pte file, run this. Note that this is very slow at the moment.
97+
To run the pte file in s. Note that this is very slow at the moment.
9498
```
95-
python generate.py --checkpoint_path $MODEL_REPO/model.pth --pte $MODEL_REPO/llama-fast.pte --prompt "Hello my name is" --device cpu
99+
python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --pte ${MODEL_REPO}.pte --prompt "Hello my name is" --device cpu
96100
```
97101
but *that requires xnnpack to work in python!*
98102

@@ -106,12 +110,12 @@ memory of a mobile device, and optimize execution speed -- both using quantizati
106110
The simplest way to quantize is with int8 quantization, where each value is represented by an 8 bit integer, and a
107111
floating point scale:
108112
```
109-
python et_export.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth -d fp32 --quant int8 {-xnnpack|-coreml|--mps} --out-path ./${MODEL_REPO}_int8.pte
113+
python et_export.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth -d fp32 --quant int8 {-xnnpack|-coreml|--mps} --out-path ${MODEL_REPO}_int8.pte
110114
```
111115

112116
Now you can run your model with the same command as before:
113117
```
114-
python generate.py --ptr ./${MODEL_REPO}_int8.pte --prompt "Hello my name is"
118+
python generate.py --pte ${MODEL_REPO}_int8.pte --prompt "Hello my name is"
115119
```
116120

117121
#### 4 bit integer quantization (8da4w)
@@ -134,10 +138,9 @@ TBD.
134138
# Standalone Execution
135139

136140
## Desktop and Server Execution
137-
This has been tested with Linux and x86 (using CPU ~and GPU~), and MacOS and ARM/Apple Silicon.
141+
This has been tested with Linux and x86 (using CPU ~and GPU~), and MacOS and ARM/Apple Silicon.
138142

139-
In addition to running with the generate.py driver in Python, you can also run PyTorch models without the Python runtime, based on Andrej's magnificent llama2.c code.
140-
(Installation instructions courtesy of @Bert Maher's llama2.so)
143+
The runner-* directories show how to integrate AOTI- and ET-exported models in a C/C++ application when no Python environment is available. Integrate it with your own applications and adapt it to your own application and model needs!
141144

142145
Build the runner like this
143146
```
@@ -151,7 +154,7 @@ To run, use the following command (assuming you already generated the tokenizer.
151154
LD_LIBRARY_PATH=$CONDA_PREFIX/lib ./build/run ../${MODEL_REPO}.so -z ../${MODEL_REPO}.bin
152155
```
153156

154-
## Mobile and Edge Execution
157+
## Mobile and Edge Execution Test (x86)
155158
This has been shown to run on x86. with the proper IDE environment, you can compile for your specific target.
156159
For a GUI integration in iOS and Android, please refer to...
157160

@@ -167,6 +170,24 @@ To run your pte model, use the following command (assuming you already generated
167170
./build/run ../${MODEL_REPO}{,_int8,_8da4w}.pte -z ../${MODEL_REPO}.bin
168171
```
169172

173+
## Running on a mobile/edge system
174+
175+
### Android
176+
177+
Check out the [tutorial on how to build an Android app running your PyTorch models with Executorch](https://pytorch.org/executorch/main/llm/llama-demo-android.html), and give your llama-fast models a spin.
178+
179+
![Screenshot](https://pytorch.org/executorch/main/_static/img/android_llama_app.png "Android app running Llama model")
180+
181+
### iOS
182+
183+
Open the ios Llama Xcode project at https://github.com/pytorch/executorch/tree/main/examples/demo-apps/apple_ios/LLaMA/LLaMA.xcodeproj in Xcode and click Run.
184+
You will need to provide a provisioning profile (similar to what's expected for any iOS dev).
185+
186+
Once you can run the app on you device,
187+
1 - connect the device to you Mac,
188+
2 - copy the model and tokenizer.bin to the iOS Llama app
189+
3 - select the tokenizer and model with the `(...)` control (bottom left of screen, to the left of the text entrybox)
190+
170191
# Supported Systems
171192

172193
PyTorch and the mobile Executorch backend support a broad range fo devices for running PyTorch with Python (using either eager or eager + torch.compile) or using a Python-free environment with AOT Inductor , as well as runtimes for executing exported models.

0 commit comments

Comments
 (0)