Add more sh tags

turboderp · turboderp · commit de05ac696baf · 2024-06-08T20:41:34.000+02:00
diff --git a/README.md b/README.md
@@ -86,7 +86,7 @@ python test_inference.py -m <path_to_model> -p "Once upon a time,"
 
 A simple console chatbot is included. Run it with:
 
-```
+```sh
 python examples/chat.py -m <path_to_model> -mode llama -gs auto
 ```
 
@@ -115,7 +115,7 @@ and **exllamav2_HF** loaders.
 
 To install the current dev version, clone the repo and run the setup script:
 
-```
+```sh
 git clone https://github.com/turboderp/exllamav2
 cd exllamav2
 pip install -r requirements.txt
@@ -125,7 +125,7 @@ pip install .
 By default this will also compile and install the Torch C++ extension (`exllamav2_ext`) that the library relies on. 
 You can skip this step by setting the `EXLLAMA_NOCOMPILE` environment variable:
 
-```
+```sh
 EXLLAMA_NOCOMPILE= pip install .
 ```
 
@@ -142,7 +142,7 @@ PyTorch.
 
 Either download an appropriate wheel or install directly from the appropriate URL:
 
-```
+```sh
 pip install https://github.com/turboderp/exllamav2/releases/download/v0.0.12/exllamav2-0.0.12+cu121-cp311-cp311-linux_x86_64.whl
 ```
 
@@ -153,7 +153,7 @@ can also be installed this way, and it will build the extension while installing
 
 A PyPI package is available as well. This is the same as the JIT version (see above). It can be installed with:
 
-```
+```sh
 pip install exllamav2
 ```
 
diff --git a/doc/convert.md b/doc/convert.md
@@ -94,7 +94,7 @@ measurement pass on subsequent quants of the same model.
 
 Convert a model and create a directory containing the quantized version with all of its original files:
 
-```
+```sh
 python convert.py \
     -i /mnt/models/llama2-7b-fp16/ \
     -o /mnt/temp/exl2/ \
@@ -104,7 +104,7 @@ python convert.py \
 
 Run just the measurement pass on a model, clearing the working directory first:
 
-```
+```sh
 python convert.py \
     -i /mnt/models/llama2-7b-fp16/ \
     -o /mnt/temp/exl2/ \
@@ -114,7 +114,7 @@ python convert.py \
 
 Use that measurement to quantize the model at two different bitrates:
 
-```
+```sh
 python convert.py \
     -i /mnt/models/llama2-7b-fp16/ \
     -o /mnt/temp/exl2/ \
diff --git a/doc/eval.md b/doc/eval.md
@@ -29,7 +29,7 @@ in which ExLlama runs out of system memory when loading large models.
 This is the standard [HumanEval](https://github.com/openai/human-eval) test implemented for ExLlamaV2 with
 dynamic batching.
 
-```
+```sh
 pip install human-eval
 python eval/humaneval.py -m <model_dir> -o humaneval_output.json
 evaluate-functional-correctness humaneval_output.json
@@ -64,7 +64,7 @@ performance.
 This is the standard [MMLU](https://github.com/hendrycks/test) test implemented for ExLlamaV2 with
 dynamic batching.
 
-```
+```sh
 pip install datasets
 python eval/mmlu.py -m <model_dir>
 ```