Refactor and Fix the Readme (#563)

byjlw · mikekgfb · web-flow · commit f54680eb472f · 2024-05-06T08:35:09.000-07:00
* refactoring the readme

* continued refining

* more cleanup

* more cleanup

* more cleanup

* more cleanup

* more cleanup

* more refining

* Update README.md

Update README.md

* move the discaimer down

* remove torchtune from main readme
Fix pathing issues for runner commands

* don't use pybindings for et setup

---------

Co-authored-by: Michael Gschwind &lt;61328285+mikekgfb@users.noreply.github.com&gt;
diff --git a/.gitignore b/.gitignore
@@ -8,6 +8,7 @@ __pycache__/
 
 .model-artifacts/
 .venv
+.torchchat
 
 # Build directories
 build/android/*
diff --git a/README.md b/README.md
@@ -93,7 +93,6 @@ You can also remove downloaded models with the remove command:
 `python3 torchchat.py remove llama3`
 
 
-
 ## Running via PyTorch / Python
 [Follow the installation steps if you haven't](#installation)
 
@@ -199,7 +198,7 @@ export TORCHCHAT_ROOT=${PWD}
 ### Export for mobile
 The following example uses the Llama3 8B Instruct model.
 
-[#shell default]: echo '{"embedding": {"bitwidth": 4, "groupsize" : 32}, "linear:a8w4dq": {"groupsize" : 32}}' >./config/data/mobile.json
+[comment default]: echo '{"embedding": {"bitwidth": 4, "groupsize" : 32}, "linear:a8w4dq": {"groupsize" : 32}}' >./config/data/mobile.json
 
 ```
 # Export
@@ -250,8 +249,11 @@ Now, follow the app's UI guidelines to pick the model and tokenizer files from t
   <img src="https://pytorch.org/executorch/main/_static/img/llama_ios_app.png" width="600" alt="iOS app running a LlaMA model">
 </a>
 
+
 ### Deploy and run on Android
 
+
+
 MISSING. TBD.
 
 
@@ -262,6 +264,8 @@ Uses the lm_eval library to evaluate model accuracy on a variety of
 tasks. Defaults to wikitext and can be manually controlled using the
 tasks and limit args.
 
+See [Evaluation](docs/evaluation.md)
+
 For more information run `python3 torchchat.py eval --help`
 
 **Examples**
@@ -317,6 +321,7 @@ you can perform the example commands with any of these models.
 **CERTIFICATE_VERIFY_FAILED**
 Run `pip install --upgrade certifi`.
 
+
 **Access to model is restricted and you are not in the authorized
 list** Some models require an additional step to access. Follow the
 link provided in the error to get access.
@@ -338,6 +343,22 @@ third-party models, weights, data, or other technologies, and you are
 solely responsible for complying with all such obligations.
 
 
+### Disclaimer
+The torchchat Repository Content is provided without any guarantees about 
+performance or compatibility. In particular, torchchat makes available 
+model architectures written in Python for PyTorch that may not perform 
+in the same manner or meet the same standards as the original versions 
+of those models. When using the torchchat Repository Content, including 
+any model architectures, you are solely responsible for determining the 
+appropriateness of using or redistributing the torchchat Repository Content 
+and assume any risks associated with your use of the torchchat Repository Content 
+or any models, outputs, or results, both alone and in combination with 
+any other technologies. Additionally, you may have other legal obligations 
+that govern your use of other content, such as the terms of service for 
+third-party models, weights, data, or other technologies, and you are 
+solely responsible for complying with all such obligations.
+
+
 ## Acknowledgements
 Thank you to the [community](docs/ACKNOWLEDGEMENTS.md) for all the
 awesome libraries and tools you've built around local LLM inference.
diff --git a/docs/torchtune.md b/docs/torchtune.md
@@ -0,0 +1,30 @@
+# Fine-tuned models from torchtune
+
+torchchat supports running inference with models fine-tuned using [torchtune](https://github.com/pytorch/torchtune). To do so, we first need to convert the checkpoints into a format supported by torchchat.
+
+Below is a simple workflow to run inference on a fine-tuned Llama3 model. For more details on how to fine-tune Llama3, see the instructions [here](https://github.com/pytorch/torchtune?tab=readme-ov-file#llama3)
+
+```bash
+# install torchtune
+pip install torchtune
+
+# download the llama3 model
+tune download meta-llama/Meta-Llama-3-8B \
+    --output-dir ./Meta-Llama-3-8B \
+    --hf-token <ACCESS TOKEN>
+
+# Run LoRA fine-tuning on a single device. This assumes the config points to <checkpoint_dir> above
+tune run lora_finetune_single_device --config llama3/8B_lora_single_device
+
+# convert the fine-tuned checkpoint to a format compatible with torchchat
+python3 build/convert_torchtune_checkpoint.py \
+  --checkpoint-dir ./Meta-Llama-3-8B \
+  --checkpoint-files meta_model_0.pt \
+  --model-name llama3_8B \
+  --checkpoint-format meta
+
+# run inference on a single GPU
+python3 torchchat.py generate \
+  --checkpoint-path ./Meta-Llama-3-8B/model.pth \
+  --device cuda
+```