Our implementation of a LLM finetuning is inspired from this tutorial, but we ensured that everything is running locally using ollama.
To install all Python dependencies that are required for this module activate your environment ad run the following command:
bash ./scripts/setup_env.sh
Moreover, make sure that the dataset is ready for use; for further details on how to prepare the dataset for training, refer to the dataset.md
guide.
The simplest way to get going is utilising the bundled Makefile
, as described in the README, by calling
make dataset
The finetuning process of an ollama-based LLM is performed with the finetuning/train.py
script.
Since such executable may require lots of parameters to be specified for a proper training, it is more convenient to rely on the start_finetuning.sh
shell script by calling
bash ./scripts/start_finetuning.sh
Within that file you may find multiple configuration arguments; some of the most relevant ones are:
model_name_or_path
: name of the Ollama-compatible model. A list of models may be find here;dataset_name
: path to the dataset directory. Since we rely on thedataset/process.py
script, this file shall default totmp/final
;splits
max_seq_len
: TODO;max_steps
: number of learning iterations;save_steps
:learning_rate
: learning rate of the fine-tuning process;output_dir
: directory where generated models are stored;fim_rate
fim_spm_rate
use_peft_lora
lora_r
lora_alpha
lora_dropout
lora_target_modules
use_4bit_quantization
use_nested_quant
bnb_4bit_compute_dtype
use_flash_attn
When calling the start_finetuning.sh
script, once the training is completed, the finetuning/save_complete_model.py
python script gets called to export the production-ready weights into a specified directory.
Once the weights of the networks have been updated, we must convert the outcome of the training process to Ollama compatible files.
To this extent, download from the base model's repository the files tokenizer_config.json
, tokenizer.json
, and tokenizer.model
, and put them in the folder with the trained weights.
As example, if the base model was TinyLLama/TinyLlama-1.1B
, you may find the requested configuration files on Hugging Face.
Finally, run the following command to save the model with .gguf
format which is required by Ollama:
python3 llama.cpp/convert_hf_to_gguf.py \
<model_dir_path> \
--outfile <model_name>.gguf \
--outtype f32
To test the generated model on a user-friendly environment, we rely on Open WebUI, an open-source software that emulate OpenAI's chatgpt web interface but exploits local models running on Ollama.
To integrate the .gguf
compiled model on Open WebUI, it is required to define a modelfile
of the finetuned LLM.
Once that file is ready, we may call the ollama create
command to add the fine-tuned LLM to the list of executable models:
ollama create <model_name> \
-f ./<modelfile_name> \
--outtype f32