This repository contains the YARP plugin for Llama2.
🚧 This repository is currently work in progress. 🚧 🚧 The software contained is this repository is currently under testing. 🚧 APIs may change without any warning. 🚧 This code should be not used before its first official release 🚧
Documentation of the individual devices is provided in the official Yarp documentation page:
To install Llama2Device we simply need to clone this repository and run setup_submodule.sh:
# Clone the correct version of llama.cpp library
./setup_submodule.sh
this file clones llama.cpp library to a specific working version. It is recommended to stick to this version and not to use a newer one since the functioning is not guaranteed.
Before compilation move inside build folder and type:
ccmake ..
The following flags must be enabled:
ALLOW_DEVICE_PARAM_PARSER_GENE ON
BUILD_SHARED_LIBS ON
LLAMA_ALL_WARNINGS ON
LLAMA_BUILD_COMMON ON
LLAMA_BUILD_EXAMPLES ON
LLAMA_BUILD_SERVER ON
It is also fundamental to add the /build/bin folder to system PATH, it can be done by running these commands:
echo 'export PATH="path_to_build/bin_folder::$PATH"' >> ~/.bashrc
source ~/.bashrc
Curl must be installed to use the device, it can be installed by executing these commands:
sudo apt update
sudo apt install curl
# Configure, compile and install
cmake -S. -Bbuild -DCMAKE_INSTALL_PREFIX=<install_prefix>
cmake --build build
cmake --build build --target install
These commands will compile and install llama.cpp library and Llama2Device.
In order to allow llama.cpp to use the gpu power to compute the answers it is mandatory to follow these steps. Open project build folder, then type:
ccmake ..
Scroll down until you find the flag "GGML_CUDA ", turn it on, press "c" to configure and then "g" to generate. Now compile the project again using these instructions:
# Configure, compile and install
cmake -S. -Bbuild -DCMAKE_INSTALL_PREFIX=<install_prefix>
cmake --build build
cmake --build build --target install
The library will automatically detect the gpu as a device and will use it to compute results faster. The GGML_CUDA flag automatically turns on other related flags, double check that all of the following flags have been enabled:
GGML_ACCELERATE ON
GGML_CCACHE ON
GGML_CUDA ON
GGML_CUDA_GRAPHS ON
GGML_LASX ON
GGML_LLAMAFILE ON
GGML_LSX ON
GGML_NATIVE ON
GGML_OPENMP ON
Note: it is mandatory to have an Nvidia gpu and have CUDA toolkit version > 10 installed. To install CUDA toolkit follow this official guide.
The device is able to run every .gguf LLM model. The models can be downloaded from huggingface. The downloaded model must be placed inside models folder.
Assuming that the user has already installed YARP and the related LLM devices, one can use this basic configuration example:
yarpserver
yarprobotinterface --config assets/llama2Device_full.xml
yarp rpc /LLM_nws/rpc/rpc:i
>>help
Responses:
*** Available commands:
setPrompt
readPrompt
ask
getConversation
deleteConversation
refreshConversation
help
>>
>>setPrompt "You are a geography expert who is very concise in its answers"
Response: [ok]
>>ask "What is the capital of Germany?"
Response: [ok] Berlin.
🚧 This repository is currently work in progress. 🚧
🚧 This repository is currently work in progress. 🚧
This repository is maintained by:
| @randaz81 |
