LLM Finetuning

Our implementation of a LLM finetuning is inspired from this tutorial, but we ensured that everything is running locally using ollama.

To install all Python dependencies that are required for this module activate your environment ad run the following command:

bash ./scripts/setup_env.sh

Moreover, make sure that the dataset is ready for use; for further details on how to prepare the dataset for training, refer to the dataset.md guide. The simplest way to get going is utilising the bundled Makefile, as described in the README, by calling

make dataset

Finetuning Process

The finetuning process of an ollama-based LLM is performed with the finetuning/train.py script. Since such executable may require lots of parameters to be specified for a proper training, it is more convenient to rely on the start_finetuning.sh shell script by calling

bash ./scripts/start_finetuning.sh

Within that file you may find multiple configuration arguments; some of the most relevant ones are:

model_name_or_path: name of the Ollama-compatible model. A list of models may be find here;
dataset_name: path to the dataset directory. Since we rely on the dataset/process.py script, this file shall default to tmp/final;
splits
max_seq_len: TODO;
max_steps: number of learning iterations;
save_steps:
learning_rate: learning rate of the fine-tuning process;
output_dir: directory where generated models are stored;
fim_rate
fim_spm_rate
use_peft_lora
lora_r
lora_alpha
lora_dropout
lora_target_modules
use_4bit_quantization
use_nested_quant
bnb_4bit_compute_dtype
use_flash_attn

When calling the start_finetuning.sh script, once the training is completed, the finetuning/save_complete_model.py python script gets called to export the production-ready weights into a specified directory.

Ollama model

Once the weights of the networks have been updated, we must convert the outcome of the training process to Ollama compatible files.

To this extent, download from the base model's repository the files tokenizer_config.json, tokenizer.json, and tokenizer.model, and put them in the folder with the trained weights.
As example, if the base model was TinyLLama/TinyLlama-1.1B, you may find the requested configuration files on Hugging Face.

Finally, run the following command to save the model with .gguf format which is required by Ollama:

python3 llama.cpp/convert_hf_to_gguf.py \
    <model_dir_path> \
    --outfile <model_name>.gguf \
    --outtype f32

Using the Final Model

To test the generated model on a user-friendly environment, we rely on Open WebUI, an open-source software that emulate OpenAI's chatgpt web interface but exploits local models running on Ollama.

To integrate the .gguf compiled model on Open WebUI, it is required to define a modelfile of the finetuned LLM. Once that file is ready, we may call the ollama create command to add the fine-tuned LLM to the list of executable models:

ollama create <model_name> \
    -f ./<modelfile_name> \
    --outtype f32

Other references references

LLM Finetuning with Hugging Face's autotrain;
The ultimate guide to Fine-tuning LLMs
Training transformers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!